qgallouedec / panda-gym Goto Github PK
View Code? Open in Web Editor NEWSet of robotic environments based on PyBullet physics engine and gymnasium.
License: MIT License
Set of robotic environments based on PyBullet physics engine and gymnasium.
License: MIT License
raise ValueError(f"Observation spaces do not match: {observation_space} != {env.observation_space}")
ValueError: Observation spaces do not match: Dict(achieved_goal:Box([-10. -10. -10.], [10. 10. 10.], (3,), float32), desired_goal:Box([-10. -10. -10.], [10. 10. 10.], (3,), float32), observation:Box([-10. -10. -10. -10. -10. -10.], [10. 10. 10. 10. 10. 10.], (6,), float32)) != Dict(achieved_goal:Box([-10. -10. -10.], [10. 10. 10.], (3,), float32), desired_goal:Box([-10. -10. -10.], [10. 10. 10.], (3,), float32), observation:Box([-10. -10. -10. -10. -10. -10. -10. -10. -10. -10. -10. -10. -10. -10.
-10. -10. -10. -10.], [10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10.], (18,), float32))
Hi,
I am curious about the task observations used in the environment. I am very sorry if my question is very trivial, I am new to reinforcement learning.
The observation states of pick and place tasks are the objects kinematics (position, velocity, etc):
# position, rotation of the object
object_position = self.sim.get_base_position("object")
object_rotation = self.sim.get_base_rotation("object")
object_velocity = self.sim.get_base_velocity("object")
object_angular_velocity = self.sim.get_base_angular_velocity("object")
observation = np.concatenate([object_position, object_rotation, object_velocity, object_angular_velocity])
Even in the PandaReach, the task observation is empty:
def get_obs(self) -> np.ndarray:
return np.array([]) # no tasak-specific observation
Why is the target position not included in the observation? Such as:
target_position = self.sim.get_base_position("target")
object_position = self.sim.get_base_position("object")
...
observation = np.concatenate([target_position, object_position, ...])
Does this mean that the critic networks in the RL algorithms (SAC or TQC) basically also learning to predict the random target location?
If it is not, for example in pick and place task, does the agent still need to randomly search position with maximum reward after successfully picking the object, when testing the trained model?
Thank you very much.
Hello, first of all, thank you for the work, but I have a question. When I create the robot and make robot grasp a object, the gripper will pass through the object, and the contact point detect is zero, could you please tell me why I met this?
1.hello,Why 'panda_joint8'(joint_indices=7) is disappeared in JOINT_INDICES = [0, 1, 2, 3, 4, 5, 6, 9, 10]?
2.I have a .sdf file robot,how can i change your code to input my robot?
Thanks a lot!
A clear and concise description of what the bug is.
Provide a minimal code :
import gymnasium as gym
import panda_gym
env = gym.make(
"PandaSlide-v3",
render_mode="rgb_array",
renderer="OpenGL",
render_width=480,
render_height=480,
render_target_position=[0.2, 0, 0],
render_distance=1.0,
render_yaw=90,
render_pitch=-70,
render_roll=0,
)
env.reset()
image = env.render() # RGB rendering of shape (480, 480, 3)
env.close()
...
python --version
): Python 3.9pip list | grep panda-gym
): 3.0.5Hi there, I got a problem. I wanted to set a different viewpoint during training and test, so I followed the instruction written down in document. I changed the parameters related to render: render_yaw and render_pitch. However, when I ran the training, the viewpoint had no change at all. It's totally the same as the default. How can I change the viewpoint?
Best regards,
Dao
I am trying to do a one-step lookahead on the environment's action space by calling deepcopy()
on the environment to try out different actions. In pseudo-code, it would look something like this:
actions = [env.action_space.sample() for _ in range(10)]
for action in actions:
env_copy = deepcopy(env)
obs, rew, done, info = env_copy.step(action)
# do stuff with the one step lookahead for each action
However, I cannot quite do this because env_copy
gets garbage collected after the first iteration of the loop. The garbage collector closes the physics_client._client
in the PyBullet
object, which is shared by env
and env_copy
. As a result, the second iteration of the loop fails at env_copy.step(action)
because env_copy
was copied from env
which has lost its connection to the physics client.
To verify this, you may use the following code on any panda_gym environment.
import deepcopy
env_copy = deepcopy(env)
assert(id(env.sim.physics_client._client) == id(env_copy.sim.physics_client._client))
del env_copy
env.reset() # error: Not connected to physics server.
Is it possible to close the connection based on a reference count implementation? Or would you recommend any other way to do this one-step lookahead?
Hello,when I want to stack in manual:`env = gym.make("PandaStack-v3", render=True)
observation, info = env.reset()
for _ in range(1000):
current_position = observation["observation"][0:3]
desired_position = observation["desired_goal"][0:3]
action = 5.0 * (desired_position - current_position)
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close() I met an error:
ValueError: operands could not be broadcast together with shapes (3,) (4,) (4,)`
Thanks!
In a previous answer (#53) you said: "First, the raw action is scaled by 0.05. The result is added to the current joint position to obtain the target angles of the joints".
Can you explain how you obtain the target angles of the joint from a displacement on the end-effector? I can't find anywhere in the code.
thank you in advance!
I cannot simply reproduce what you described in installation tutorial. Each time I run, I will get the error in the image.
It seems the latest Gym has removed something about GoalEnv which leads to the error. It seems to be related with the latest update. Can you fix it? Or add some requirement of the gym version.
Dear author:
I tested the Reach/PickAndPlace/Slide/Push tasks, only the PandaReach_v2 task converged,and the others failed to converge as in the paper(rXiv:2106.13687v2 [cs.LG] 19 Dec 2021).
Could you please show the relevant parameters for training these tasks?
Here is a part of my test code in PickAndPlace:
model = DDPG(policy=“MultiInputPolicy”, env=env,batch_size= 2048, replay_buffer_class=HerReplayBuffer, verbose=1, buffer_size=1000000)
model.learn(total_timesteps=10000000)
The result shows that the success rate is only 0.05
When resetting environment, robot joint angle seems not back to predefined neutral position. Is that because by default velocity control mode is used? Instead of using resetJointState function, how about using setJointMotorControlArray function, setting control mode to be POSITION_CONTROL and set the desired position?
Gym 0.26+ requires a render_mode
argument in the constructor.
Provide a minimal code :
import gymnasium as gym
gym.make("PandaPush-v3", render_mode="human")
I found that the author pubulished the newest version 1.1.0. In the release introduction, it shows that the newest version accomplish correct implementation of a random seed for reproducibility. I do not get what the author means? And Comparing to the 1.0.1, which exact problem can the 1.1.0 fix based on this new advantage?
I cannot make a gym environment anymore.
I get an error stating
Traceback (most recent call last):
File "test.py", line 26, in <module>
env = gym.make('PandaReach-v2', render=True)
File "/home/jakob/anaconda3/envs/panda/lib/python3.7/site-packages/gym/envs/registration.py", line 235, in make
return registry.make(id, **kwargs)
File "/home/jakob/anaconda3/envs/panda/lib/python3.7/site-packages/gym/envs/registration.py", line 129, in make
env = spec.make(**kwargs)
File "/home/jakob/anaconda3/envs/panda/lib/python3.7/site-packages/gym/envs/registration.py", line 90, in make
env = cls(**_kwargs)
File "/home/jakob/Promotion/code/panda-gym/panda_gym/envs/panda_tasks/panda_reach.py", line 20, in __init__
sim = PyBullet(render=render)
File "/home/jakob/Promotion/code/panda-gym/panda_gym/pybullet.py", line 34, in __init__
self.connection_mode = p.GUI if render else p.DIRECT
AttributeError: module 'pybullet' has no attribute 'GUI'
Starting on a clear plate
conda create -n panda python=3.7
conda activate panda
~/panda-gym$ pip install -e .
and running the quick start code
import gym
import panda_gym
env = gym.make('PandaReach-v2', render=True)
leads to the aforementioned error.
I also tried python 3.8
python --version
): 3.7, 3.8pip list | grep panda-gym
): panda-gym 2.0.0 | pybullet 3.2.1I got the following error when creating and closing multiple environments.
Perhaps, at close() function in PyBullet class, you need to call self.physics_client.disconnect() to specify the client id, instead of p.disconnect().
Hi I saw that in the recent release there is a option for dense reward, however i doesn't seems to be avaliable just yet for any of the environments.
In the envs/core.py
script, some the goals are defined inconsistently.
desired_goal_shape = observation["achieved_goal"].shape
=>
desired_goal_shape = observation["desired_goal"].shape
desired_goal=spaces.Box(-10.0, 10.0, shape=achieved_goal_shape, dtype=np.float32),
achieved_goal=spaces.Box(-10.0, 10.0, shape=desired_goal_shape, dtype=np.float32),
=>
desired_goal=spaces.Box(-10.0, 10.0, shape=desired_goal_shape, dtype=np.float32),
achieved_goal=spaces.Box(-10.0, 10.0, shape=achieved_goal_shape, dtype=np.float32),
I found to fix this is very important, especially if we customize some environments, thanks a lot!
I am conducting an experiment where I test different starting points for the completion of the task and I want to visualize these starting points afterwards. Is there any way to visualize these points by using the observation?
Hi, I'm trying to get a SB3 model to train the harder tasks (so far I've failed with SAC+HER), so I went to SB3-zoo to see some examples of successful models. I can't get them to load, and it looks like its because zoo has trained models for the v1 versions, but only has the v2 environments registered. Do you have successful trained V2 models you can push to zoo / did you successfully train any SB3 models out of the box on the v2 versions of the task?
thanks for making this environment!
The environments created via env = gym.make('PandaPush-v3', render_mode='rgb_array')
(also with 'human'
) are missing the render_mode
attribute, so env.render_mode
returns None
.
This attribute is required e.g. by gymnasium's PixelObservationWrapper.
import gymnasium as gym
from gymnasium.wrappers import PixelObservationWrapper
import panda_gym
env = gym.make('PandaPush-v3', render_mode='rgb_array')
env = PixelObservationWrapper(env)
AttributeError: env.render_mode must be specified to use PixelObservationWrapper:`gymnasium.make(env_name, render_mode='rgb_array')`.
python --version
): 3.8.10pip list | grep panda-gym
): 3.0.3Currently, when creating an environment, a whole bunch of stuff is printed to the console:
argv[0]=--background_color_red=0.8745098114013672
argv[1]=
argv[2]=
argv[3]=
argv[4]=
argv[5]=
argv[6]=
argv[7]=
argv[8]=
argv[9]=
argv[10]=
argv[11]=
argv[12]=
argv[13]=
argv[14]=
argv[15]=
argv[16]=
argv[17]=
argv[18]=
argv[19]=
argv[20]=
argv[21]=--background_color_green=0.21176470816135406
argv[22]=
argv[23]=
argv[24]=
argv[25]=
argv[26]=
argv[27]=
argv[28]=
argv[29]=
argv[30]=
argv[31]=
argv[32]=
argv[33]=
argv[34]=
argv[35]=
argv[36]=
argv[37]=
argv[38]=
argv[39]=
argv[40]=
argv[41]=
argv[42]=--background_color_blue=0.1764705926179886
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=45
argv[0] = --unused
argv[1] = --background_color_red=0.8745098114013672
argv[2] =
argv[3] =
argv[4] =
argv[5] =
argv[6] =
argv[7] =
argv[8] =
argv[9] =
argv[10] =
argv[11] =
argv[12] =
argv[13] =
argv[14] =
argv[15] =
argv[16] =
argv[17] =
argv[18] =
argv[19] =
argv[20] =
argv[21] =
argv[22] = --background_color_green=0.21176470816135406
argv[23] =
argv[24] =
argv[25] =
argv[26] =
argv[27] =
argv[28] =
argv[29] =
argv[30] =
argv[31] =
argv[32] =
argv[33] =
argv[34] =
argv[35] =
argv[36] =
argv[37] =
argv[38] =
argv[39] =
argv[40] =
argv[41] =
argv[42] =
argv[43] = --background_color_blue=0.1764705926179886
argv[44] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
Creating context
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.5 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.5 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
ven = Mesa/X.org
ven = Mesa/X.org
I couldn't find a way to block this output from outside, since it doesn't seem to use the Python stdout. Otherwise with contextlib.redirect_stdout(os.devnull):
should work. I also noticed that in IPython only the argvs are output but I have no idea how to achieve this effect via a .py script.
This output seems to partially be PyBullet's fault, the argvs are initialized via panda-gym, though. All empty argvs could be removed here.
Provide a minimal code :
import gymnasium as gym
import panda_gym
env = gym.make('PandaPush-v3')
python --version
): 3.8.10pip list | grep panda-gym
): 3.0.0Hi,
I am unable to learn a policy for the PandaPickAndPlace task using RL Zoo. I am trying to get the results shared in the experimental results section of the Panda-gym paper. Here are my hyperparameters for the SAC, DDPG and the TQC algo:
PandaPush-v2: &her-defaults
env_wrapper: sb3_contrib.common.wrappers.TimeFeatureWrapper
n_timesteps: !!float 1e6
policy: 'MultiInputPolicy'
buffer_size: 1000000
batch_size: 2048
gamma: 0.95
learning_rate: !!float 1e-3
tau: 0.05
replay_buffer_class: HerReplayBuffer
replay_buffer_kwargs: "dict(
online_sampling=True,
goal_selection_strategy='future',
n_sampled_goal=4,
)"
policy_kwargs: "dict(net_arch=[512, 512, 512], n_critics=2)"
PandaPickAndPlace-v2:
<<: *her-defaults
learning_rate: !!float 2e-4
Can you please help me with the hyperparams that you used for your experiments?
Hello, I try to run the test file test_env using command pytest envs_test.py.
However, it shows some error:
FAILED envs_test.py::test_env[PandaFlip-v3] - pybullet.error: Error loading texture
FAILED envs_test.py::test_env[PandaFlipJoints-v3] - pybullet.error: Error loading texture
FAILED envs_test.py::test_env[PandaFlipDense-v3] - pybullet.error: Error loading texture
FAILED envs_test.py::test_env[PandaFlipJointsDense-v3] - pybullet.error: Error loading texture
Do you know how to fix that?
Thank you so much
hello, I have another question is that Why the Panda Robot Env can only be trained by tqc, Are there any other RL algorithms in the rl-baselines3-zoo that can be used to train the Panda? thanks.
Ok.pandareach env is running max 50 step and after that the env is returning done true. I intent to run the env for longer say 300 but can not find a way to do that. Can anybody provide a solution for that?
After i=14, the mechanical arm does not continue to move in the render screen, and the screen stops. But render continues to render, and Render's picture is not consistent with the actual robotic arm motion.
Provide a minimal code :
import gymnasium as gym
import panda_gym
import matplotlib.pyplot as plt
from matplotlib import animation
import numpy as np
def display_frames_as_gif(frames):
patch = plt.imshow(frames[0])
plt.axis('off')
def animate(i):
patch.set_data(frames[i])
anim = animation.FuncAnimation(plt.gcf(), animate, frames=len(frames), interval=5)
anim.save('panda.gif', writer='pillow', fps=30)
frames = []
# env = gym.make("PandaReach-v3", render_mode="human")
env = gym.make("PandaReach-v3", render_mode="rgb_array", renderer="OpenGL")
observation, info = env.reset()
# 生成圆的参数
theta = np.linspace(0, 2 * np.pi, 1000)
r = 0.1 # 圆的半径
# 计算圆的坐标
x = r * np.cos(theta)
y = r * np.sin(theta)
z = np.zeros_like(theta) + observation["observation"][2] # 所有点都在z=0的平面上
pre_tarj = np.zeros((1000, 3))
pre_tarj[:, 0] = x
pre_tarj[:, 1] = y
pre_tarj[:, 2] += observation["observation"][2] # 所有点都在z=0的平面上
tarj = np.zeros_like(pre_tarj)
# 初始化位置
desired_position = pre_tarj[0, :]
for i in range(100):
# Render into buffer.
image = env.render()
frames.append(image)
print(i)
current_position = observation["observation"][0:3]
desired_position = pre_tarj[i, :]
tarj[i, :] = current_position
action = 5.0 * (desired_position - current_position)
observation, reward, terminated, truncated, info = env.step(action)
# if terminated or truncated:
# observation, info = env.reset()
env.close()
display_frames_as_gif(frames)
python --version
): python 3.11pip list | grep panda-gym
):I found a gym environment on GitHub for robotics, I tried running it on collab without rendering with the following code
import gym
import panda_gym
env = gym.make('PandaReach-v2', render=True)
obs = env.reset()
done = False
while not done:
action = env.action_space.sample() # random action
obs, reward, done, info = env.step(action)
env.close()
I got the following error
```
import gym
import panda_gym
env = gym.make('PandaReach-v2', render=True)
obs = env.reset()
done = False
while not done:
action = env.action_space.sample() # random action
obs, reward, done, info = env.step(action)
env.close()
```https://docs.github.com/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax
The following error occurs on trying to run train_push
assert not hasattr(obs_space, "spaces"), f"Unsupported structured space '{type(obs_space)}'"
AssertionError: Unsupported structured space '<class 'gymnasium.spaces.dict.Dict'>'
Provide a minimal code :
import gymnasium as gym
from stable_baselines3 import DDPG, HerReplayBuffer
import panda_gym
env = gym.make("PandaPush-v3")
model = DDPG(policy="MultiInputPolicy", env=env, replay_buffer_class=HerReplayBuffer, verbose=1)
...
python --version
): 3.8pip list | grep panda-gym
): current masterI am working to implement reach task (and trajectory tracking then) on a real Panda robot. I am afraid that I am quite new to this field and am the first in my lab working on learning (my lab focuses on design and control theory). I was following this paper - https://arxiv.org/abs/1803.07067
It is recommended here to use Velocity control for low level joint control.
I was also reading up your post - https://gallouedec.com/post/panda-gymv0/ which outlines the limitations on sim to real transfer.
It would be great if you would be able to guide me on this path.
Hi, there.
I'm using panda-gym to train a reinforcement learning model that makes Panda robot do some tasks. During the training I need to get the orientation of end-effector. Therefore, I call the function of PyBullet: getLinkState(panda_robot, ee_link, computeForwardKinematics=True)[1]
, but I found when the ee has the angle in the range (-120, 120) (after convert) w.r.t. world frame. This function returns always positive value. That means I cannot distinguish the ee has rotated itself e.g. +90 degrees or -90 degrees, because the returned values are the same equal to 90 degrees after converting to degrees from quaternion. I expect it will return e.g. -90 degrees or 270 degrees.
Thanks for your help.
Since version 3.0.1 you no longer pass a render
argument during environment initialization but the render_mode
argument instead. With this, the mode
option of the environment's render()
method has vanished.
I'm doing vision-based RL and require the following, all at the same time:
human
modeCurrently I can get either 1 or 2 and 3 but not all three. Am I missing something?
This was previously possible with gym.make(..., render=True)
and env.render(mode='rgb_image')
.
I would suggest to keep the render
method's mode
option, removed in 3.0.1.
Hi, can you provide some benchmarking results with the correspondent algorithm and hyperparameters for the 4 tasks? I've tried SAC, PPO and DDPG but couldn't train an agent for reaching good results (I'm focusing on PandaPickAndPlace and PandaPush)
I am very interesting in the dense reward envs, but I didn't find any benchmarks about them. So, can you provide some benchmarks?
hello, i failed to find whe way when i want to change the object in the task of pickandplace to see the performance.
i noticed that the object of pickandplace task is created by the function create_box (or _create_geometry in fact ) in the pybullet.py
It seemd i can achived my goal through modefing the self. physics_client.createMultiBody. so i change the code between line 575 and 579, added the filename where the object (i want to load, obj type) saved. But obviously my mehotd is not correct.
I faild to change the object, so i'd like to ask you how to do that if you have time to reply me.
Hi! I would like to ask how to load the franka panda in pybullet with white and black colours (just like you showed in readme). I have tried the franka_panda_description package in your github, however I got the franka in pybullet with only white colour.
Looking forward to your reply. Thanks a lot.
Is it possible to have a release for with the latest updates on master? Specifically, the latest release now does not have the update to gym 0.22.0 which has a crucial breaking change that affects repositories updating to gym >= 0.22 using panda-gym.
Hi all,
I'm now using the garage lib to run a reinforcement learning algorithm in a panda gym environment. However, when I use the RaySampler in Garage that can "sample episodes in a data-parallel fashion using a Ray cluster", I got this error: pybullet.error: Not connected to physics server.
It's triggered by this line.
I guess this is a multi-process issue, could you help me with this? Thank you!
Here're more detailed bug log:
Traceback (most recent call last):
File "/home/yygx/scripts/train_panda_airl.py", line 220, in <module>
trainer.train(n_epochs=EPOCH_NUM, batch_size=10000)
File "/home/yygx/src/garage/trainer.py", line 399, in train
average_return = self._algo.train(self)
File "/home/yygx/src/airl/irl_npo.py", line 187, in train
trainer.step_episode = trainer.obtain_episodes(trainer.step_itr) # yy: rollout episodes using the learned policy
File "/home/yygx/src/garage/trainer.py", line 224, in obtain_episodes
env_update=env_update) # yy: generate episodes with learned policy
File "/home/yygx/src/garage/sampler/ray_sampler.py", line 208, in obtain_samples
ready_worker_id, episode_batch = ray.get(result)
File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/site-packages/ray/worker.py", line 1831, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(error): ray::SamplerWorker.rollout() (pid=241822, ip=192.168.86.22, repr=<garage.sampler.ray_sampler.SamplerWorker object at 0x7f3b17457e80>)
File "/home/yygx/src/garage/sampler/ray_sampler.py", line 432, in rollout
return (self.worker_id, self.inner_worker.rollout())
File "/home/yygx/src/garage/tf/samplers/worker.py", line 115, in rollout
return self._inner_worker.rollout()
File "/home/yygx/src/garage/sampler/default_worker.py", line 186, in rollout
self.start_episode()
File "/home/yygx/src/garage/sampler/default_worker.py", line 97, in start_episode
self._prev_obs, episode_info = self.env.reset()
File "/home/yygx/src/garage/envs/gym_env.py", line 210, in reset
first_obs = self._env.reset()
File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/site-packages/gym/wrappers/time_limit.py", line 25, in reset
return self.env.reset(**kwargs)
File "/home/yygx/panda-gym/panda_gym/envs/core.py", line 250, in reset
with self.sim.no_rendering():
File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/yygx/panda-gym/panda_gym/pybullet.py", line 384, in no_rendering
self.physics_client.configureDebugVisualizer(self.physics_client.COV_ENABLE_RENDERING, 0)
pybullet.error: Not connected to physics server.
I am trying to run your code with stable-baselines3 (code under examples->train_push.py), it seems like stable-baselines does not support the observation space structure defined in gymnasium. How am I suppose to solve this problem?
`
Traceback (most recent call last):
File "/home/wxia612/panda-gym/examples/train_push.py", line 10, in
model = DDPG(policy="MultiInputPolicy", env=env, replay_buffer_class=HerReplayBuffer, verbose=1)
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/ddpg/ddpg.py", line 85, in init
super().init(
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/td3/td3.py", line 103, in init
super().init(
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 111, in init
super().init(
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/common/base_class.py", line 179, in init
env = self._wrap_env(env, self.verbose, monitor_wrapper)
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/common/base_class.py", line 228, in _wrap_env
env = DummyVecEnv([lambda: env])
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 29, in init
self.keys, shapes, dtypes = obs_space_info(obs_space)
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/util.py", line 67, in obs_space_info
assert not hasattr(obs_space, "spaces"), f"Unsupported structured space '{type(obs_space)}'"
AssertionError: Unsupported structured space '<class 'gymnasium.spaces.dict.Dict'>'
Process finished with exit code 1
`
Hi,
I have seen in the documentation that panda-gym is compatible until python 3.10. I am using python 3.11 in my project and I am getting the following error:
gymnasium.error.NameNotFound: Environment PandaReach doesn't exist.
I have followed the intallation procedure described here.
Could you pleas help me with that?
Thank you!
Hi, is it possible to upgrade the gymnasium version to a more recent one? Not sure if it's just not been updated or if it's intentionally set to an older version.
Hi @qgallouedec,
I have been trying to reproduce the results of some of the experiments, in particular for the PandaPickAndPlace task. However, I was only able to find hyperparameters for v1. Should results be reproducible for v3?
I tried using both the DDPG and TQC. However, I mostly focused on TQC since it is clearly documented in two places: https://huggingface.co/qgallouedec/tqc-PandaPickAndPlace-v1-3157870761 and https://wandb.ai/openrlbenchmark/sb3.
I can't get anywhere near the results presented in these two sources. I also tried to train the same agent in a dense environment as a sort of sanity check. The results were quite good, the success rate goes above 90% without any issues.
Here is an example of the code I have been trying to run. For your convenience, I removed all callbacks and checkpoints. Also, I am using the bleeding edge version for all the packages, as presented in the docs.
import gymnasium as gym
import panda_gym
from stable_baselines3 import HerReplayBuffer
from sb3_contrib import TQC
env = gym.make("PandaPickAndPlace-v3")
model = TQC(
"MultiInputPolicy",
env,
batch_size=2048,
buffer_size=1_000_000,
gamma=0.95,
learning_rate=0.001,
policy_kwargs=dict(net_arch=[512, 512, 512], n_critics=2),
replay_buffer_class=HerReplayBuffer,
replay_buffer_kwargs=dict(goal_selection_strategy='future', n_sampled_goal=4),
tau=0.05,
seed=3157870761,
verbose=1
)
model.learn(
total_timesteps=1500000.0,
progress_bar=True
)
Hello, When I change the max_episode_steps=300 in register and being trained with tqc in sb3 ,I met this error,what is the problems? Thanks.
python train.py --algo tqc --env PandaStack-v1 -params n_envs:10 ========== PandaStack-v1 ========== Seed: 3400246078 Default hyperparameters for environment (ones being tuned will be overridden): OrderedDict([('batch_size', 1024), ('buffer_size', 1000000), ('env_wrapper', 'sb3_contrib.common.wrappers.TimeFeatureWrapper'), ('gamma', 0.95), ('learning_rate', 0.001), ('learning_starts', 1000), ('n_envs', 10), ('n_timesteps', 30000000000.0), ('policy', 'MultiInputPolicy'), ('policy_kwargs', 'dict(net_arch=[512, 512, 512], n_critics=2)'), ('replay_buffer_class', 'HerReplayBuffer'), ('replay_buffer_kwargs', "dict( online_sampling=True, goal_selection_strategy='future', " 'n_sampled_goal=4, )'), ('tau', 0.05)]) Using 10 environments Creating test environment pybullet build time: Nov 2 2021 15:42:29 argv[0]= C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\gym\logger.py:34: UserWarning: WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow")) argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= Using cuda device Log path: logs/tqc/PandaStack-v1_6 Traceback (most recent call last): File "train.py", line 195, in <module> exp_manager.learn(model) File "C:\codes\rl-baselines3-zoo-master\utils\exp_manager.py", line 202, in learn model.learn(self.n_timesteps, **kwargs) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\sb3_contrib\tqc\tqc.py", line 299, in learn reset_num_timesteps=reset_num_timesteps, File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 375, in learn self.train(batch_size=self.batch_size, gradient_steps=gradient_steps) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\sb3_contrib\tqc\tqc.py", line 194, in train replay_data = self.replay_buffer.sample(batch_size, env=self._vec_normalize_env) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 652, in sample samples.append(self.buffers[i].sample(int(batch_sizes[i]), env)) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 212, in sample return self._sample_transitions(batch_size, maybe_vec_env=env, online_sampling=True) # pytype: disable=bad-return-type File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 295, in _sample_transitions episode_indices = np.random.randint(0, self.n_episodes_stored, batch_size) File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint File "_bounded_integers.pyx", line 1338, in numpy.random._bounded_integers._rand_int32 ValueError: high <= 0
What is the unit of actions in the case of end-effector control: movement of end-effector in x,y and z axis and the change of finger distance? Since the simulator runs for 20 timesteps (40ms) at each action of the agent and the actions are clipped between -1 and 1 I would guess that they are in cm but I would like to know for sure. I though maybe the finger movement is in mm since it needs to cover a smaller length compared to the panda robot?
Is it possible or is there some way to control the robots manually in the panda gym environment in order to capture recordings of demonstrations.
By manually I mean, instead of an agent predicting actions at every step, is it possible or is there some way for a human/user to control the robot using keyboard mappings or something like that.
I am currently implementing an environment similar to the cable insertion task presented in https://arxiv.org/pdf/2112.00597.pdf, but I have an issue that I can not add and control more than 1 Panda robot at the same time. I managed to resolve this issue in a hacky way, but since it is fairly simple to solve and could be useful for others I wanted to create this issue.
The problem is caused by the same body_name="panda" of all added robot:
This could simply be fixed by passing an ID or something which could be added at the end of the body_name.
It would be also nice to pass a robot orientation similar to base_position, so the robots can be placed with different orientation.
Hi,
I am trying to recreate your results from the paper 'panda-gym: Open-source goal-conditioned environments for robotic learning', and the code given in train_push.py does not seem to work with the default parameters.
Can you point me to the RL code you used to get those results? Also, are the learning curves in the paper from Sparse reward setting or Dense?
Thanks!
Hi,
I am observing a weird movement of the end-effector. See the video below. How could this be happening when there is only 1 action corresponding to the variation of the gripper opening?
The task is slightly different than the original PickAndPlace task. I added rotation control of the end-effector around the z axis and the goal state also includes the desired rotation of the block. Could this somehow trigger the weird movement?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.