Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

PandaPickAndPlace-v3 Training and Hyperparameters about panda-gym HOT 11 CLOSED

tindiz commented on June 23, 2024

PandaPickAndPlace-v3 Training and Hyperparameters

from panda-gym.

Comments (11)

qgallouedec commented on June 23, 2024 1

Thanks, I'll take a look. I'm going back to you soon.

from panda-gym.

qgallouedec commented on June 23, 2024

This is very surprising. There are no big changes between v1 and v3. The friction is better managed and that's it.
I'll take a look on my end, I'll get back to you.

from panda-gym.

tindiz commented on June 23, 2024

This is very surprising. There are no big changes between v1 and v3. The friction is better managed and that's it. I'll take a look on my end, I'll get back to you.

Sounds good. Let me know if there is anything I can do to help out.

from panda-gym.

qgallouedec commented on June 23, 2024

Have you tried to run experiments with rl-zoo3? Can you share your plots?

from panda-gym.

tindiz commented on June 23, 2024

I haven't tried rl-zoo3. I wanted to train it myself, as shown in the code block.

I don't have plots at the moment but will try to log training now. It might take some time... Unfortunately, I didn't run it with the Tensorboard callback during training. However, I can get the models generated at checkpoints if that works as well.

from panda-gym.

tindiz commented on June 23, 2024

I just realized that the way I was loading the model from a checkpoint isn't correct and does not work properly. This might be the issue. Please give me some time to investigate, I will keep you updated.

Sorry for wasting your time.

from panda-gym.

tindiz commented on June 23, 2024

Hi, I am getting back to you with more information. I was not able to replicate results even when training continuously. I am attaching code, plots and environment-related information. Please let me know if you need anything else or if you find a bug in my code.

Local Environment

Plots

Success Rate

Reward

Episode Length

Code (in its entirety)

import gymnasium as gym
import panda_gym
import numpy as np
import datetime

from stable_baselines3 import HerReplayBuffer
from stable_baselines3.common.callbacks import CheckpointCallback
from sb3_contrib import TQC


env = gym.make("PandaPickAndPlace-v3")

# Create TQC agent:
model = TQC(
    "MultiInputPolicy",
    env,
    batch_size=2048,
    buffer_size=1_000_000,
    gamma=0.95,
    learning_rate=0.001,
    policy_kwargs=dict(net_arch=[512, 512, 512], n_critics=2),
    replay_buffer_class=HerReplayBuffer,
    replay_buffer_kwargs=dict(goal_selection_strategy='future', n_sampled_goal=4),
    tau=0.05,
    seed=3157870761,
    verbose=1,
    tensorboard_log='./tensorboard/TQC/',
)

stringified_time = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
checkpoint_callback = CheckpointCallback(
    save_freq=100_000, 
    save_path=f"./models/{stringified_time}/", 
    name_prefix="tqc_panda_pick_and_place"
)  # Create checkpoint callback.

# Model training: 
model.learn(
    total_timesteps=1_100_000, 
    callback=checkpoint_callback, 
    progress_bar=True
)
model.save("tqc_panda_pick_and_place_final")  # Save final model.

System Information

OS: Windows-10-10.0.18363-SP0 10.0.18363
Python: 3.9.16
Stable-Baselines3: 2.0.0a6
PyTorch: 2.0.0
GPU Enabled: True
Numpy: 1.24.3
Cloudpickle: 2.2.1
Gymnasium: 0.28.1

Colab Experiment

I tried training it in Colab as well, the environment timed out at around 400k steps, I am also attaching the same information for that experiment. The results do not look the same to me, but I could not find the difference in the code. I can share the notebook as well. :)

Plots

Success Rate

Code

!pip install panda-gym
!pip install git+https://github.com/DLR-RM/stable-baselines3
!pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/
!pip install tqdm
!pip install rich

import gymnasium as gym
import panda_gym
import numpy as np
import datetime
from stable_baselines3 import HerReplayBuffer
from stable_baselines3.common.callbacks import CheckpointCallback
from sb3_contrib import TQC

base_path = '<user-specific-after-mounting-drive>'

env = gym.make("PandaPickAndPlace-v3")

model = TQC(
    "MultiInputPolicy",
    env,
    batch_size=2048,
    buffer_size=1000000,
    gamma=0.95,
    learning_rate=0.001,
    policy_kwargs=dict(net_arch=[512, 512, 512], n_critics=2),
    replay_buffer_class=HerReplayBuffer,
    replay_buffer_kwargs=dict(goal_selection_strategy='future', n_sampled_goal=4),
    tau=0.05,
    seed=3157870761,
    verbose=1,
    tensorboard_log=f'{base_path}/tensorboard/',
)

stringified_time = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
checkpoint_callback = CheckpointCallback( 
    save_freq=10_000,
    save_path=f"{base_path}/models/{stringified_time}/", 
    name_prefix="tqc_panda_pick_and_place"
)  # Callback for saving the model

# Model training: 
model.learn(
    total_timesteps=1_000_000.0,
    callback=checkpoint_callback, 
    progress_bar=True
)
model.save(f"{base_path}/tqc_panda_pick_and_place_final")

System Information

OS: Linux-5.10.147+-x86_64-with-glibc2.31 # 1 SMP Sat Dec 10 16:00:40 UTC 2022
Python: 3.10.11
Stable-Baselines3: 2.0.0a6
PyTorch: 2.0.0+cu118
GPU Enabled: True
Numpy: 1.22.4
Cloudpickle: 2.2.1
Gymnasium: 0.28.1
OpenAI Gym: 0.25.2

from panda-gym.

benquick123 commented on June 23, 2024

Hi, has anyone figured it out at the end? I can't reproduce PIckAndPlace results using TQC or SAC with either huggingface hyperparameters or hyperparameters from the panda-gym paper.

from panda-gym.

tindiz commented on June 23, 2024

Hi, I made no progress. I got inconsistent results and never managed to replicate the ones documented.

from panda-gym.

benquick123 commented on June 23, 2024

Ok, the error was actually on my side. While I still can't reproduce SAC results, TQC works with huggingface hyperparameters after fixing the bug in my code.

from panda-gym.

zichunxx commented on June 23, 2024

Hi! Has anyone successfully completed the pick and place task with DDPG or SAC? I'm confused about the reason for the failure. What possible factors caused this?

from panda-gym.

PandaPickAndPlace-v3 Training and Hyperparameters about panda-gym HOT 11 CLOSED

Comments (11)

Local Environment

Plots

Success Rate

Reward

Episode Length

Code (in its entirety)

System Information

Colab Experiment

Plots

Success Rate

Code

System Information

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent