Comments (11)
Thanks, I'll take a look. I'm going back to you soon.
from panda-gym.
This is very surprising. There are no big changes between v1 and v3. The friction is better managed and that's it.
I'll take a look on my end, I'll get back to you.
from panda-gym.
This is very surprising. There are no big changes between v1 and v3. The friction is better managed and that's it. I'll take a look on my end, I'll get back to you.
Sounds good. Let me know if there is anything I can do to help out.
from panda-gym.
Have you tried to run experiments with rl-zoo3? Can you share your plots?
from panda-gym.
I haven't tried rl-zoo3. I wanted to train it myself, as shown in the code block.
I don't have plots at the moment but will try to log training now. It might take some time... Unfortunately, I didn't run it with the Tensorboard callback during training. However, I can get the models generated at checkpoints if that works as well.
from panda-gym.
I just realized that the way I was loading the model from a checkpoint isn't correct and does not work properly. This might be the issue. Please give me some time to investigate, I will keep you updated.
Sorry for wasting your time.
from panda-gym.
Hi, I am getting back to you with more information. I was not able to replicate results even when training continuously. I am attaching code, plots and environment-related information. Please let me know if you need anything else or if you find a bug in my code.
Local Environment
Plots
Success Rate
Reward
Episode Length
Code (in its entirety)
import gymnasium as gym
import panda_gym
import numpy as np
import datetime
from stable_baselines3 import HerReplayBuffer
from stable_baselines3.common.callbacks import CheckpointCallback
from sb3_contrib import TQC
env = gym.make("PandaPickAndPlace-v3")
# Create TQC agent:
model = TQC(
"MultiInputPolicy",
env,
batch_size=2048,
buffer_size=1_000_000,
gamma=0.95,
learning_rate=0.001,
policy_kwargs=dict(net_arch=[512, 512, 512], n_critics=2),
replay_buffer_class=HerReplayBuffer,
replay_buffer_kwargs=dict(goal_selection_strategy='future', n_sampled_goal=4),
tau=0.05,
seed=3157870761,
verbose=1,
tensorboard_log='./tensorboard/TQC/',
)
stringified_time = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
checkpoint_callback = CheckpointCallback(
save_freq=100_000,
save_path=f"./models/{stringified_time}/",
name_prefix="tqc_panda_pick_and_place"
) # Create checkpoint callback.
# Model training:
model.learn(
total_timesteps=1_100_000,
callback=checkpoint_callback,
progress_bar=True
)
model.save("tqc_panda_pick_and_place_final") # Save final model.
System Information
- OS: Windows-10-10.0.18363-SP0 10.0.18363
- Python: 3.9.16
- Stable-Baselines3: 2.0.0a6
- PyTorch: 2.0.0
- GPU Enabled: True
- Numpy: 1.24.3
- Cloudpickle: 2.2.1
- Gymnasium: 0.28.1
Colab Experiment
I tried training it in Colab as well, the environment timed out at around 400k steps, I am also attaching the same information for that experiment. The results do not look the same to me, but I could not find the difference in the code. I can share the notebook as well. :)
Plots
Success Rate
Code
!pip install panda-gym
!pip install git+https://github.com/DLR-RM/stable-baselines3
!pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/
!pip install tqdm
!pip install rich
import gymnasium as gym
import panda_gym
import numpy as np
import datetime
from stable_baselines3 import HerReplayBuffer
from stable_baselines3.common.callbacks import CheckpointCallback
from sb3_contrib import TQC
base_path = '<user-specific-after-mounting-drive>'
env = gym.make("PandaPickAndPlace-v3")
model = TQC(
"MultiInputPolicy",
env,
batch_size=2048,
buffer_size=1000000,
gamma=0.95,
learning_rate=0.001,
policy_kwargs=dict(net_arch=[512, 512, 512], n_critics=2),
replay_buffer_class=HerReplayBuffer,
replay_buffer_kwargs=dict(goal_selection_strategy='future', n_sampled_goal=4),
tau=0.05,
seed=3157870761,
verbose=1,
tensorboard_log=f'{base_path}/tensorboard/',
)
stringified_time = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
checkpoint_callback = CheckpointCallback(
save_freq=10_000,
save_path=f"{base_path}/models/{stringified_time}/",
name_prefix="tqc_panda_pick_and_place"
) # Callback for saving the model
# Model training:
model.learn(
total_timesteps=1_000_000.0,
callback=checkpoint_callback,
progress_bar=True
)
model.save(f"{base_path}/tqc_panda_pick_and_place_final")
System Information
- OS: Linux-5.10.147+-x86_64-with-glibc2.31 # 1 SMP Sat Dec 10 16:00:40 UTC 2022
- Python: 3.10.11
- Stable-Baselines3: 2.0.0a6
- PyTorch: 2.0.0+cu118
- GPU Enabled: True
- Numpy: 1.22.4
- Cloudpickle: 2.2.1
- Gymnasium: 0.28.1
- OpenAI Gym: 0.25.2
from panda-gym.
Hi, has anyone figured it out at the end? I can't reproduce PIckAndPlace results using TQC or SAC with either huggingface hyperparameters or hyperparameters from the panda-gym paper.
from panda-gym.
Hi, I made no progress. I got inconsistent results and never managed to replicate the ones documented.
from panda-gym.
Ok, the error was actually on my side. While I still can't reproduce SAC results, TQC works with huggingface hyperparameters after fixing the bug in my code.
from panda-gym.
Hi! Has anyone successfully completed the pick and place
task with DDPG
or SAC
? I'm confused about the reason for the failure. What possible factors caused this?
from panda-gym.
Related Issues (20)
- Can I render/reset an environment based on a specific observation? HOT 1
- pybullet.error: Error loading texture HOT 6
- When is it panda-gym going to be compatible with python 3.11? HOT 1
- how to realize the sb3 train demo code with panda-gym? HOT 2
- Typo in the `core.py` script HOT 1
- Render screen is out of sync HOT 6
- `NAN` in the training HOT 1
- How to get `predict` data from a saved model? HOT 2
- TypeError: catching classes that do not inherit from BaseException is not allowed HOT 6
- Shape of `achieved_goal` and `desired_goal` HOT 6
- Discussion about the future work HOT 9
- Decreasing Size of Action Space HOT 2
- Compatibility of gym-0.24 branch? HOT 2
- panda-gym v2 error; "TypeError: catching classes that do not inherit from BaseException is not allowed". HOT 2
- Clarification about 'observation', 'achieved_goal' and 'desired_goal' HOT 5
- Flip and Stack question HOT 3
- change the observation and return HOT 1
- reproduce the results
- reproduce the results HOT 1
- Set to move in a two-dimensional plane HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from panda-gym.