qgallouedec / panda-gym Goto Github PK

Set of robotic environments based on PyBullet physics engine and gymnasium.

License: MIT License

Python 99.39% Makefile 0.61%

franka-emika robotics reinforcement-learning python deep-learning machine-learning artificial-intelligence

panda-gym's Issues

I trained panda push using sb3, however i got this error when loading the model. Observation spaces do not match

raise ValueError(f"Observation spaces do not match: {observation_space} != {env.observation_space}")

ValueError: Observation spaces do not match: Dict(achieved_goal:Box([-10. -10. -10.], [10. 10. 10.], (3,), float32), desired_goal:Box([-10. -10. -10.], [10. 10. 10.], (3,), float32), observation:Box([-10. -10. -10. -10. -10. -10.], [10. 10. 10. 10. 10. 10.], (6,), float32)) != Dict(achieved_goal:Box([-10. -10. -10.], [10. 10. 10.], (3,), float32), desired_goal:Box([-10. -10. -10.], [10. 10. 10.], (3,), float32), observation:Box([-10. -10. -10. -10. -10. -10. -10. -10. -10. -10. -10. -10. -10. -10.
-10. -10. -10. -10.], [10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10. 10.], (18,), float32))

Question about task observation

Hi,

I am curious about the task observations used in the environment. I am very sorry if my question is very trivial, I am new to reinforcement learning.
The observation states of pick and place tasks are the objects kinematics (position, velocity, etc):

# position, rotation of the object
object_position = self.sim.get_base_position("object")
object_rotation = self.sim.get_base_rotation("object")
object_velocity = self.sim.get_base_velocity("object")
object_angular_velocity = self.sim.get_base_angular_velocity("object")
observation = np.concatenate([object_position, object_rotation, object_velocity, object_angular_velocity])

Even in the PandaReach, the task observation is empty:

def get_obs(self) -> np.ndarray:
        return np.array([])  # no tasak-specific observation

Why is the target position not included in the observation? Such as:

target_position = self.sim.get_base_position("target")
object_position = self.sim.get_base_position("object")
...
observation = np.concatenate([target_position, object_position, ...])

Does this mean that the critic networks in the RL algorithms (SAC or TQC) basically also learning to predict the random target location?
If it is not, for example in pick and place task, does the agent still need to randomly search position with maximum reward after successfully picking the object, when testing the trained model?

Thank you very much.

The gripper will pass through the object

Hello, first of all, thank you for the work, but I have a question. When I create the robot and make robot grasp a object, the gripper will pass through the object, and the contact point detect is zero, could you please tell me why I met this?

Why 'panda_joint8' is disappear in JOINT_INDICES = [0, 1, 2, 3, 4, 5, 6, 9, 10]

1.hello,Why 'panda_joint8'(joint_indices=7) is disappeared in JOINT_INDICES = [0, 1, 2, 3, 4, 5, 6, 9, 10]?
2.I have a .sdf file robot,how can i change your code to input my robot?
Thanks a lot！

Cannot change the viewpoint for render.

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Provide a minimal code :

import gymnasium as gym
import panda_gym

env = gym.make(
    "PandaSlide-v3",
    render_mode="rgb_array",
    renderer="OpenGL",
    render_width=480,
    render_height=480,
    render_target_position=[0.2, 0, 0],
    render_distance=1.0,
    render_yaw=90,
    render_pitch=-70,
    render_roll=0,
)
env.reset()
image = env.render()  # RGB rendering of shape (480, 480, 3)
env.close()

...

System

OS: Ubuntu20
Python version (python --version): Python 3.9
Package version (pip list | grep panda-gym): 3.0.5

Hi there, I got a problem. I wanted to set a different viewpoint during training and test, so I followed the instruction written down in document. I changed the parameters related to render: render_yaw and render_pitch. However, when I ran the training, the viewpoint had no change at all. It's totally the same as the default. How can I change the viewpoint?

Best regards,
Dao

Physics client closed during environment destruction

Describe the bug

I am trying to do a one-step lookahead on the environment's action space by calling deepcopy() on the environment to try out different actions. In pseudo-code, it would look something like this:

actions = [env.action_space.sample() for _ in range(10)]
for action in actions:
  env_copy = deepcopy(env)
  obs, rew, done, info = env_copy.step(action)
  # do stuff with the one step lookahead for each action

However, I cannot quite do this because env_copy gets garbage collected after the first iteration of the loop. The garbage collector closes the physics_client._client in the PyBullet object, which is shared by env and env_copy. As a result, the second iteration of the loop fails at env_copy.step(action) because env_copy was copied from env which has lost its connection to the physics client.

To verify this, you may use the following code on any panda_gym environment.

import deepcopy

env_copy = deepcopy(env)
assert(id(env.sim.physics_client._client) == id(env_copy.sim.physics_client._client))
del env_copy
env.reset() # error: Not connected to physics server.

Is it possible to close the connection based on a reference count implementation? Or would you recommend any other way to do this one-step lookahead?

Problems encountered in manual Stack

Hello，when I want to stack in manual:`env = gym.make("PandaStack-v3", render=True)
observation, info = env.reset()

for _ in range(1000):
current_position = observation["observation"][0:3]
desired_position = observation["desired_goal"][0:3]
action = 5.0 * (desired_position - current_position)
observation, reward, terminated, truncated, info = env.step(action)

if terminated or truncated:
    observation, info = env.reset()

env.close() I met an error:ValueError: operands could not be broadcast together with shapes (3,) (4,) (4,)`
Thanks!

Question regarding end-effector control

In a previous answer (#53) you said: "First, the raw action is scaled by 0.05. The result is added to the current joint position to obtain the target angles of the joints".
Can you explain how you obtain the target angles of the joint from a displacement on the end-effector? I can't find anywhere in the code.
thank you in advance!

module 'gym' has no attribute 'GoalEnv'

I cannot simply reproduce what you described in installation tutorial. Each time I run, I will get the error in the image.

It seems the latest Gym has removed something about GoalEnv which leads to the error. It seems to be related with the latest update. Can you fix it? Or add some requirement of the gym version.

Train files and parameter issues

Dear author：
I tested the Reach/PickAndPlace/Slide/Push tasks, only the PandaReach_v2 task converged，and the others failed to converge as in the paper（rXiv:2106.13687v2 [cs.LG] 19 Dec 2021）.
Could you please show the relevant parameters for training these tasks?
Here is a part of my test code in PickAndPlace:

model = DDPG（policy=“MultiInputPolicy”， env=env，batch_size= 2048， replay_buffer_class=HerReplayBuffer， verbose=1， buffer_size=1000000）
model.learn（total_timesteps=10000000）

The result shows that the success rate is only 0.05

Set joint angle not change position

When resetting environment, robot joint angle seems not back to predefined neutral position. Is that because by default velocity control mode is used? Instead of using resetJointState function, how about using setJointMotorControlArray function, setting control mode to be POSITION_CONTROL and set the desired position?

Render mode argument is missing

Describe the bug

Gym 0.26+ requires a render_mode argument in the constructor.

To Reproduce

Provide a minimal code :

import gymnasium as gym

gym.make("PandaPush-v3", render_mode="human")

The newest version 1.1.0 provide the new random seed, what does it mean?

I found that the author pubulished the newest version 1.1.0. In the release introduction, it shows that the newest version accomplish correct implementation of a random seed for reproducibility. I do not get what the author means? And Comparing to the 1.0.1, which exact problem can the 1.1.0 fix based on this new advantage?

AttributeError: module 'pybullet' has no attribute 'GUI'

Describe the bug

I cannot make a gym environment anymore.
I get an error stating

Traceback (most recent call last):
  File "test.py", line 26, in <module>
    env = gym.make('PandaReach-v2', render=True)
  File "/home/jakob/anaconda3/envs/panda/lib/python3.7/site-packages/gym/envs/registration.py", line 235, in make
    return registry.make(id, **kwargs)
  File "/home/jakob/anaconda3/envs/panda/lib/python3.7/site-packages/gym/envs/registration.py", line 129, in make
    env = spec.make(**kwargs)
  File "/home/jakob/anaconda3/envs/panda/lib/python3.7/site-packages/gym/envs/registration.py", line 90, in make
    env = cls(**_kwargs)
  File "/home/jakob/Promotion/code/panda-gym/panda_gym/envs/panda_tasks/panda_reach.py", line 20, in __init__
    sim = PyBullet(render=render)
  File "/home/jakob/Promotion/code/panda-gym/panda_gym/pybullet.py", line 34, in __init__
    self.connection_mode = p.GUI if render else p.DIRECT
AttributeError: module 'pybullet' has no attribute 'GUI'

To Reproduce

Starting on a clear plate

conda create -n panda python=3.7
conda activate panda
~/panda-gym$ pip install -e .

and running the quick start code

import gym
import panda_gym

env = gym.make('PandaReach-v2', render=True)

leads to the aforementioned error.
I also tried python 3.8

System

OS: Ubuntu 20.04
Python version (python --version): 3.7, 3.8
Package version (pip list | grep panda-gym): panda-gym 2.0.0 | pybullet 3.2.1

pybullet.error: Not connected to physics server. when closing multiple envs

I got the following error when creating and closing multiple environments.
Perhaps, at close() function in PyBullet class, you need to call self.physics_client.disconnect() to specify the client id, instead of p.disconnect().

Dense reward missing

Hi I saw that in the recent release there is a option for dense reward, however i doesn't seems to be avaliable just yet for any of the environments.

Typo in the `core.py` script

Describe the bug

In the envs/core.py script, some the goals are defined inconsistently.

To Fix

at line 236:

desired_goal_shape = observation["achieved_goal"].shape

desired_goal_shape = observation["desired_goal"].shape

and the line 240-241:

desired_goal=spaces.Box(-10.0, 10.0, shape=achieved_goal_shape, dtype=np.float32),
achieved_goal=spaces.Box(-10.0, 10.0, shape=desired_goal_shape, dtype=np.float32),

desired_goal=spaces.Box(-10.0, 10.0, shape=desired_goal_shape, dtype=np.float32),
achieved_goal=spaces.Box(-10.0, 10.0, shape=achieved_goal_shape, dtype=np.float32),

I found to fix this is very important, especially if we customize some environments, thanks a lot!

Can I render/reset an environment based on a specific observation?

I am conducting an experiment where I test different starting points for the completion of the task and I want to visualize these starting points afterwards. Is there any way to visualize these points by using the observation?

SB3-zoo has models for v1 tasks but only registration specs for v2 tasks

Hi, I'm trying to get a SB3 model to train the harder tasks (so far I've failed with SAC+HER), so I went to SB3-zoo to see some examples of successful models. I can't get them to load, and it looks like its because zoo has trained models for the v1 versions, but only has the v2 environments registered. Do you have successful trained V2 models you can push to zoo / did you successfully train any SB3 models out of the box on the v2 versions of the task?

thanks for making this environment!

env.render_mode attribute missing

Describe the bug

The environments created via env = gym.make('PandaPush-v3', render_mode='rgb_array') (also with 'human') are missing the render_mode attribute, so env.render_mode returns None.

This attribute is required e.g. by gymnasium's PixelObservationWrapper.

To Reproduce

import gymnasium as gym
from gymnasium.wrappers import PixelObservationWrapper
import panda_gym

env = gym.make('PandaPush-v3', render_mode='rgb_array')
env = PixelObservationWrapper(env)

AttributeError: env.render_mode must be specified to use PixelObservationWrapper:`gymnasium.make(env_name, render_mode='rgb_array')`.

System

OS: Ubuntu 20.04.5 LTS
Python version (python --version): 3.8.10
Package version (pip list | grep panda-gym): 3.0.3

Too much console output during env creation

Describe the bug

Currently, when creating an environment, a whole bunch of stuff is printed to the console:

console output

argv[0]=--background_color_red=0.8745098114013672
argv[1]=
argv[2]=
argv[3]=
argv[4]=
argv[5]=
argv[6]=
argv[7]=
argv[8]=
argv[9]=
argv[10]=
argv[11]=
argv[12]=
argv[13]=
argv[14]=
argv[15]=
argv[16]=
argv[17]=
argv[18]=
argv[19]=
argv[20]=
argv[21]=--background_color_green=0.21176470816135406
argv[22]=
argv[23]=
argv[24]=
argv[25]=
argv[26]=
argv[27]=
argv[28]=
argv[29]=
argv[30]=
argv[31]=
argv[32]=
argv[33]=
argv[34]=
argv[35]=
argv[36]=
argv[37]=
argv[38]=
argv[39]=
argv[40]=
argv[41]=
argv[42]=--background_color_blue=0.1764705926179886
startThreads creating 1 threads.
starting thread 0
started thread 0 
argc=45
argv[0] = --unused
argv[1] = --background_color_red=0.8745098114013672
argv[2] = 
argv[3] = 
argv[4] = 
argv[5] = 
argv[6] = 
argv[7] = 
argv[8] = 
argv[9] = 
argv[10] = 
argv[11] = 
argv[12] = 
argv[13] = 
argv[14] = 
argv[15] = 
argv[16] = 
argv[17] = 
argv[18] = 
argv[19] = 
argv[20] = 
argv[21] = 
argv[22] = --background_color_green=0.21176470816135406
argv[23] = 
argv[24] = 
argv[25] = 
argv[26] = 
argv[27] = 
argv[28] = 
argv[29] = 
argv[30] = 
argv[31] = 
argv[32] = 
argv[33] = 
argv[34] = 
argv[35] = 
argv[36] = 
argv[37] = 
argv[38] = 
argv[39] = 
argv[40] = 
argv[41] = 
argv[42] = 
argv[43] = --background_color_blue=0.1764705926179886
argv[44] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
Creating context
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.5 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
pthread_getconcurrency()=0
Version = 4.5 (Core Profile) Mesa 21.2.6
Vendor = Mesa/X.org
Renderer = llvmpipe (LLVM 12.0.0, 256 bits)
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0 
MotionThreadFunc thread started
ven = Mesa/X.org
ven = Mesa/X.org

I couldn't find a way to block this output from outside, since it doesn't seem to use the Python stdout. Otherwise with contextlib.redirect_stdout(os.devnull): should work. I also noticed that in IPython only the argvs are output but I have no idea how to achieve this effect via a .py script.

This output seems to partially be PyBullet's fault, the argvs are initialized via panda-gym, though. All empty argvs could be removed here.

To Reproduce

Provide a minimal code :

import gymnasium as gym
import panda_gym

env = gym.make('PandaPush-v3')

System

OS: macOS 12.6.1
Python version (python --version): 3.8.10
Package version (pip list | grep panda-gym): 3.0.0

Hyperparameters for PickAndPlace

Hi,

I am unable to learn a policy for the PandaPickAndPlace task using RL Zoo. I am trying to get the results shared in the experimental results section of the Panda-gym paper. Here are my hyperparameters for the SAC, DDPG and the TQC algo:

PandaPush-v2: &her-defaults
  env_wrapper: sb3_contrib.common.wrappers.TimeFeatureWrapper
  n_timesteps: !!float 1e6
  policy: 'MultiInputPolicy'
  buffer_size: 1000000
  batch_size: 2048
  gamma: 0.95
  learning_rate: !!float 1e-3
  tau: 0.05
  replay_buffer_class: HerReplayBuffer
  replay_buffer_kwargs: "dict(
    online_sampling=True,
    goal_selection_strategy='future',
    n_sampled_goal=4,
  )"
  policy_kwargs: "dict(net_arch=[512, 512, 512], n_critics=2)"

PandaPickAndPlace-v2:
  <<: *her-defaults
  learning_rate: !!float 2e-4

Can you please help me with the hyperparams that you used for your experiments?

pybullet.error: Error loading texture

Hello, I try to run the test file test_env using command pytest envs_test.py.

However, it shows some error:

FAILED envs_test.py::test_env[PandaFlip-v3] - pybullet.error: Error loading texture
FAILED envs_test.py::test_env[PandaFlipJoints-v3] - pybullet.error: Error loading texture
FAILED envs_test.py::test_env[PandaFlipDense-v3] - pybullet.error: Error loading texture
FAILED envs_test.py::test_env[PandaFlipJointsDense-v3] - pybullet.error: Error loading texture

Do you know how to fix that?

Thank you so much

Why the Panda Robot Env can only be trained by tqc in rl-baselines3-zoo?

hello, I have another question is that Why the Panda Robot Env can only be trained by tqc, Are there any other RL algorithms in the rl-baselines3-zoo that can be used to train the Panda？ thanks.

Make the episode lasts longer than 50 timesteps.

Ok.pandareach env is running max 50 step and after that the env is returning done true. I intent to run the env for longer say 300 but can not find a way to do that. Can anybody provide a solution for that?

Render screen is out of sync

Describe the bug

After i=14, the mechanical arm does not continue to move in the render screen, and the screen stops. But render continues to render, and Render's picture is not consistent with the actual robotic arm motion.

To Reproduce

Provide a minimal code :

import gymnasium as gym
import panda_gym
import matplotlib.pyplot as plt
from matplotlib import animation
import numpy as np


def display_frames_as_gif(frames):
    patch = plt.imshow(frames[0])
    plt.axis('off')

    def animate(i):
        patch.set_data(frames[i])

    anim = animation.FuncAnimation(plt.gcf(), animate, frames=len(frames), interval=5)
    anim.save('panda.gif', writer='pillow', fps=30)


frames = []
# env = gym.make("PandaReach-v3", render_mode="human")
env = gym.make("PandaReach-v3", render_mode="rgb_array", renderer="OpenGL")
observation, info = env.reset()

# 生成圆的参数
theta = np.linspace(0, 2 * np.pi, 1000)
r = 0.1  # 圆的半径
# 计算圆的坐标
x = r * np.cos(theta)
y = r * np.sin(theta)
z = np.zeros_like(theta) + observation["observation"][2]  # 所有点都在z=0的平面上
pre_tarj = np.zeros((1000, 3))
pre_tarj[:, 0] = x
pre_tarj[:, 1] = y
pre_tarj[:, 2] += observation["observation"][2]  # 所有点都在z=0的平面上
tarj = np.zeros_like(pre_tarj)
# 初始化位置
desired_position = pre_tarj[0, :]

for i in range(100):
    # Render into buffer.
    image = env.render()
    frames.append(image)
    print(i)
    current_position = observation["observation"][0:3]
    desired_position = pre_tarj[i, :]
    tarj[i, :] = current_position

    action = 5.0 * (desired_position - current_position)
    observation, reward, terminated, truncated, info = env.step(action)

    # if terminated or truncated:
    #     observation, info = env.reset()

env.close()
display_frames_as_gif(frames)

System

OS: Win 10
Python version (python --version): python 3.11
Package version (pip list | grep panda-gym):

Panda gym returning errors when running

I found a gym environment on GitHub for robotics, I tried running it on collab without rendering with the following code

import gym
import panda_gym

    env = gym.make('PandaReach-v2', render=True)
    
    obs = env.reset()
    done = False
    while not done:
        action = env.action_space.sample() # random action
        obs, reward, done, info = env.step(action)
    
    env.close()

I got the following error

```

import gym
import panda_gym

env = gym.make('PandaReach-v2', render=True)

obs = env.reset()
done = False
while not done:
    action = env.action_space.sample() # random action
    obs, reward, done, info = env.step(action)

env.close()
```https://docs.github.com/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax

train_push seems to be broken after merging gymnasium branch

Describe the bug

The following error occurs on trying to run train_push

assert not hasattr(obs_space, "spaces"), f"Unsupported structured space '{type(obs_space)}'"
AssertionError: Unsupported structured space '<class 'gymnasium.spaces.dict.Dict'>'

To Reproduce

Provide a minimal code :

import gymnasium as gym
from stable_baselines3 import DDPG, HerReplayBuffer

import panda_gym

env = gym.make("PandaPush-v3")

model = DDPG(policy="MultiInputPolicy", env=env, replay_buffer_class=HerReplayBuffer, verbose=1)
...

System

OS: ubuntu 20.04
Python version (python --version): 3.8
Package version (pip list | grep panda-gym): current master

Real Life Implementation on Franka Emika Panda

I am working to implement reach task (and trajectory tracking then) on a real Panda robot. I am afraid that I am quite new to this field and am the first in my lab working on learning (my lab focuses on design and control theory). I was following this paper - https://arxiv.org/abs/1803.07067

It is recommended here to use Velocity control for low level joint control.
I was also reading up your post - https://gallouedec.com/post/panda-gymv0/ which outlines the limitations on sim to real transfer.

It would be great if you would be able to guide me on this path.

The issue about the orientation get from getLinkState().

Hi, there.

I'm using panda-gym to train a reinforcement learning model that makes Panda robot do some tasks. During the training I need to get the orientation of end-effector. Therefore, I call the function of PyBullet: getLinkState(panda_robot, ee_link, computeForwardKinematics=True)[1], but I found when the ee has the angle in the range (-120, 120) (after convert) w.r.t. world frame. This function returns always positive value. That means I cannot distinguish the ee has rotated itself e.g. +90 degrees or -90 degrees, because the returned values are the same equal to 90 degrees after converting to degrees from quaternion. I expect it will return e.g. -90 degrees or 270 degrees.

Thanks for your help.

Missing option to render RGB array with high-quality rendering

Since version 3.0.1 you no longer pass a render argument during environment initialization but the render_mode argument instead. With this, the mode option of the environment's render() method has vanished.

I'm doing vision-based RL and require the following, all at the same time:

Full quality rendering as seen in human mode
Option to render to RGB arrays
Faster than real-time rendering (not limited to 25 fps)

Currently I can get either 1 or 2 and 3 but not all three. Am I missing something?

This was previously possible with gym.make(..., render=True) and env.render(mode='rgb_image').
I would suggest to keep the render method's mode option, removed in 3.0.1.

Benchmarking results and hyperparameters

Hi, can you provide some benchmarking results with the correspondent algorithm and hyperparameters for the 4 tasks? I've tried SAC, PPO and DDPG but couldn't train an agent for reaching good results (I'm focusing on PandaPickAndPlace and PandaPush)

Are there any benchmarks about dense reward envs?

I am very interesting in the dense reward envs, but I didn't find any benchmarks about them. So, can you provide some benchmarks?

How to change the object in the task of pick and place

hello, i failed to find whe way when i want to change the object in the task of pickandplace to see the performance.
i noticed that the object of pickandplace task is created by the function create_box (or _create_geometry in fact ) in the pybullet.py
It seemd i can achived my goal through modefing the self. physics_client.createMultiBody. so i change the code between line 575 and 579, added the filename where the object (i want to load, obj type) saved. But obviously my mehotd is not correct.
I faild to change the object, so i'd like to ask you how to do that if you have time to reply me.

How to load the franka panda with colours in pybullet?

Hi! I would like to ask how to load the franka panda in pybullet with white and black colours (just like you showed in readme). I have tried the franka_panda_description package in your github, however I got the franka in pybullet with only white colour.

Looking forward to your reply. Thanks a lot.

how to realize the sb3 train demo code with panda-gym?

Tried and failed with panda-gym==2.0.0 as required, gym==0.26.2, stable-baseline3==2.1.0
Can anyone provide a clue of how to run the demo train code, thanks!!
Also tried to downgrade the gym==0.21.0 by #19, still failed.

Next Release?

Is it possible to have a release for with the latest updates on master? Specifically, the latest release now does not have the update to gym 0.22.0 which has a crucial breaking change that affects repositories updating to gym >= 0.22 using panda-gym.

Get "pybullet.error: Not connected to physics server." when running the environment in a data-parallel fashion

Hi all,

I'm now using the garage lib to run a reinforcement learning algorithm in a panda gym environment. However, when I use the RaySampler in Garage that can "sample episodes in a data-parallel fashion using a Ray cluster", I got this error: pybullet.error: Not connected to physics server. It's triggered by this line.

I guess this is a multi-process issue, could you help me with this? Thank you!

Here're more detailed bug log:

Traceback (most recent call last):
  File "/home/yygx/scripts/train_panda_airl.py", line 220, in <module>
    trainer.train(n_epochs=EPOCH_NUM, batch_size=10000)
  File "/home/yygx/src/garage/trainer.py", line 399, in train
    average_return = self._algo.train(self)
  File "/home/yygx/src/airl/irl_npo.py", line 187, in train
    trainer.step_episode = trainer.obtain_episodes(trainer.step_itr)  # yy: rollout episodes using the learned policy
  File "/home/yygx/src/garage/trainer.py", line 224, in obtain_episodes
    env_update=env_update)  # yy: generate episodes with learned policy
  File "/home/yygx/src/garage/sampler/ray_sampler.py", line 208, in obtain_samples
    ready_worker_id, episode_batch = ray.get(result)
  File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/site-packages/ray/worker.py", line 1831, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(error): ray::SamplerWorker.rollout() (pid=241822, ip=192.168.86.22, repr=<garage.sampler.ray_sampler.SamplerWorker object at 0x7f3b17457e80>)
  File "/home/yygx/src/garage/sampler/ray_sampler.py", line 432, in rollout
    return (self.worker_id, self.inner_worker.rollout())
  File "/home/yygx/src/garage/tf/samplers/worker.py", line 115, in rollout
    return self._inner_worker.rollout()
  File "/home/yygx/src/garage/sampler/default_worker.py", line 186, in rollout
    self.start_episode()
  File "/home/yygx/src/garage/sampler/default_worker.py", line 97, in start_episode
    self._prev_obs, episode_info = self.env.reset()
  File "/home/yygx/src/garage/envs/gym_env.py", line 210, in reset
    first_obs = self._env.reset()
  File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/site-packages/gym/wrappers/time_limit.py", line 25, in reset
    return self.env.reset(**kwargs)
  File "/home/yygx/panda-gym/panda_gym/envs/core.py", line 250, in reset
    with self.sim.no_rendering():
  File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/yygx/panda-gym/panda_gym/pybullet.py", line 384, in no_rendering
    self.physics_client.configureDebugVisualizer(self.physics_client.COV_ENABLE_RENDERING, 0)
pybullet.error: Not connected to physics server.

Unsupported structured space 'gymnasium.spaces.dict.Dict'

I am trying to run your code with stable-baselines3 (code under examples->train_push.py), it seems like stable-baselines does not support the observation space structure defined in gymnasium. How am I suppose to solve this problem?

`
Traceback (most recent call last):
File "/home/wxia612/panda-gym/examples/train_push.py", line 10, in
model = DDPG(policy="MultiInputPolicy", env=env, replay_buffer_class=HerReplayBuffer, verbose=1)
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/ddpg/ddpg.py", line 85, in init
super().init(
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/td3/td3.py", line 103, in init
super().init(
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 111, in init
super().init(
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/common/base_class.py", line 179, in init
env = self._wrap_env(env, self.verbose, monitor_wrapper)
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/common/base_class.py", line 228, in _wrap_env
env = DummyVecEnv([lambda: env])
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 29, in init
self.keys, shapes, dtypes = obs_space_info(obs_space)
File "/home/wxia612/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/util.py", line 67, in obs_space_info
assert not hasattr(obs_space, "spaces"), f"Unsupported structured space '{type(obs_space)}'"
AssertionError: Unsupported structured space '<class 'gymnasium.spaces.dict.Dict'>'

Process finished with exit code 1
`

When is it panda-gym going to be compatible with python 3.11?

Hi,

I have seen in the documentation that panda-gym is compatible until python 3.10. I am using python 3.11 in my project and I am getting the following error:

gymnasium.error.NameNotFound: Environment PandaReach doesn't exist.

I have followed the intallation procedure described here.

Could you pleas help me with that?

Thank you!

Bump gymnasium from v26

Hi, is it possible to upgrade the gymnasium version to a more recent one? Not sure if it's just not been updated or if it's intentionally set to an older version.

PandaPickAndPlace-v3 Training and Hyperparameters

Hi @qgallouedec,

I have been trying to reproduce the results of some of the experiments, in particular for the PandaPickAndPlace task. However, I was only able to find hyperparameters for v1. Should results be reproducible for v3?

I tried using both the DDPG and TQC. However, I mostly focused on TQC since it is clearly documented in two places: https://huggingface.co/qgallouedec/tqc-PandaPickAndPlace-v1-3157870761 and https://wandb.ai/openrlbenchmark/sb3.

I can't get anywhere near the results presented in these two sources. I also tried to train the same agent in a dense environment as a sort of sanity check. The results were quite good, the success rate goes above 90% without any issues.

To Reproduce

Here is an example of the code I have been trying to run. For your convenience, I removed all callbacks and checkpoints. Also, I am using the bleeding edge version for all the packages, as presented in the docs.

import gymnasium as gym
import panda_gym
from stable_baselines3 import HerReplayBuffer
from sb3_contrib import TQC

env = gym.make("PandaPickAndPlace-v3")

model = TQC(
    "MultiInputPolicy",
    env,
    batch_size=2048,
    buffer_size=1_000_000,
    gamma=0.95,
    learning_rate=0.001,
    policy_kwargs=dict(net_arch=[512, 512, 512], n_critics=2),
    replay_buffer_class=HerReplayBuffer,
    replay_buffer_kwargs=dict(goal_selection_strategy='future', n_sampled_goal=4),
    tau=0.05,
    seed=3157870761,
    verbose=1
)

model.learn(
    total_timesteps=1500000.0,
    progress_bar=True
)

ValueError: high <= 0

Hello, When I change the max_episode_steps=300 in register and being trained with tqc in sb3 ,I met this error,what is the problems? Thanks.
python train.py --algo tqc --env PandaStack-v1 -params n_envs:10 ========== PandaStack-v1 ========== Seed: 3400246078 Default hyperparameters for environment (ones being tuned will be overridden): OrderedDict([('batch_size', 1024), ('buffer_size', 1000000), ('env_wrapper', 'sb3_contrib.common.wrappers.TimeFeatureWrapper'), ('gamma', 0.95), ('learning_rate', 0.001), ('learning_starts', 1000), ('n_envs', 10), ('n_timesteps', 30000000000.0), ('policy', 'MultiInputPolicy'), ('policy_kwargs', 'dict(net_arch=[512, 512, 512], n_critics=2)'), ('replay_buffer_class', 'HerReplayBuffer'), ('replay_buffer_kwargs', "dict( online_sampling=True, goal_selection_strategy='future', " 'n_sampled_goal=4, )'), ('tau', 0.05)]) Using 10 environments Creating test environment pybullet build time: Nov 2 2021 15:42:29 argv[0]= C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\gym\logger.py:34: UserWarning: WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow")) argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= Using cuda device Log path: logs/tqc/PandaStack-v1_6 Traceback (most recent call last): File "train.py", line 195, in <module> exp_manager.learn(model) File "C:\codes\rl-baselines3-zoo-master\utils\exp_manager.py", line 202, in learn model.learn(self.n_timesteps, **kwargs) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\sb3_contrib\tqc\tqc.py", line 299, in learn reset_num_timesteps=reset_num_timesteps, File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 375, in learn self.train(batch_size=self.batch_size, gradient_steps=gradient_steps) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\sb3_contrib\tqc\tqc.py", line 194, in train replay_data = self.replay_buffer.sample(batch_size, env=self._vec_normalize_env) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 652, in sample samples.append(self.buffers[i].sample(int(batch_sizes[i]), env)) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 212, in sample return self._sample_transitions(batch_size, maybe_vec_env=env, online_sampling=True) # pytype: disable=bad-return-type File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 295, in _sample_transitions episode_indices = np.random.randint(0, self.n_episodes_stored, batch_size) File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint File "_bounded_integers.pyx", line 1338, in numpy.random._bounded_integers._rand_int32 ValueError: high <= 0

Restrict the panda robot to only move in the front

Hello，Is it possible to limit the range of movement of the robot， because the target is in the front, but it explores the back (or other places), which is inefficient, just like this:

Thanks a lot.

Panda robot action units

What is the unit of actions in the case of end-effector control: movement of end-effector in x,y and z axis and the change of finger distance? Since the simulator runs for 20 timesteps (40ms) at each action of the agent and the actions are clipped between -1 and 1 I would guess that they are in cm but I would like to know for sure. I though maybe the finger movement is in mm since it needs to cover a smaller length compared to the panda robot?

Manual Control of Robot Arm By Human/User

Is it possible or is there some way to control the robots manually in the panda gym environment in order to capture recordings of demonstrations.

By manually I mean, instead of an agent predicting actions at every step, is it possible or is there some way for a human/user to control the robot using keyboard mappings or something like that.

Feature idea: multiple Panda robot in the same environment

I am currently implementing an environment similar to the cable insertion task presented in https://arxiv.org/pdf/2112.00597.pdf, but I have an issue that I can not add and control more than 1 Panda robot at the same time. I managed to resolve this issue in a hacky way, but since it is fairly simple to solve and could be useful for others I wanted to create this issue.

The problem is caused by the same body_name="panda" of all added robot:
This could simply be fixed by passing an ID or something which could be added at the end of the body_name.

It would be also nice to pass a robot orientation similar to base_position, so the robots can be placed with different orientation.

PandaPush-v2 does not learn with SB3

Hi,

I am trying to recreate your results from the paper 'panda-gym: Open-source goal-conditioned environments for robotic learning', and the code given in train_push.py does not seem to work with the default parameters.
Can you point me to the RL code you used to get those results? Also, are the learning curves in the paper from Sparse reward setting or Dense?
Thanks!

Weird movement of the end-effector

Hi,

I am observing a weird movement of the end-effector. See the video below. How could this be happening when there is only 1 action corresponding to the variation of the gripper opening?

The task is slightly different than the original PickAndPlace task. I added rotation control of the end-effector around the z axis and the goal state also includes the desired rotation of the block. Could this somehow trigger the weird movement?

Thanks!

weird_ee.mp4

qgallouedec / panda-gym Goto Github PK

panda-gym's Issues

Describe the bug

To Reproduce

System

Describe the bug

Describe the bug

To Reproduce

Describe the bug

To Reproduce

System

Describe the bug

To Fix

Describe the bug

To Reproduce

System

Describe the bug

To Reproduce

System

Describe the bug

To Reproduce

System

Describe the bug

To Reproduce

System

To Reproduce

Recommend Projects

Recommend Topics

Recommend Org