Git Product home page Git Product logo

Comments (11)

qgallouedec avatar qgallouedec commented on June 20, 2024 1

Thanks, I'll take a look. I'm going back to you soon.

from panda-gym.

qgallouedec avatar qgallouedec commented on June 20, 2024

This is very surprising. There are no big changes between v1 and v3. The friction is better managed and that's it.
I'll take a look on my end, I'll get back to you.

from panda-gym.

tindiz avatar tindiz commented on June 20, 2024

This is very surprising. There are no big changes between v1 and v3. The friction is better managed and that's it. I'll take a look on my end, I'll get back to you.

Sounds good. Let me know if there is anything I can do to help out.

from panda-gym.

qgallouedec avatar qgallouedec commented on June 20, 2024

Have you tried to run experiments with rl-zoo3? Can you share your plots?

from panda-gym.

tindiz avatar tindiz commented on June 20, 2024

I haven't tried rl-zoo3. I wanted to train it myself, as shown in the code block.

I don't have plots at the moment but will try to log training now. It might take some time... Unfortunately, I didn't run it with the Tensorboard callback during training. However, I can get the models generated at checkpoints if that works as well.

from panda-gym.

tindiz avatar tindiz commented on June 20, 2024

I just realized that the way I was loading the model from a checkpoint isn't correct and does not work properly. This might be the issue. Please give me some time to investigate, I will keep you updated.

Sorry for wasting your time.

from panda-gym.

tindiz avatar tindiz commented on June 20, 2024

Hi, I am getting back to you with more information. I was not able to replicate results even when training continuously. I am attaching code, plots and environment-related information. Please let me know if you need anything else or if you find a bug in my code.

Local Environment

Plots

Success Rate

rollout_success_rate

Reward

rollout_ep_rew_mean

Episode Length

rollout_ep_len_mean

Code (in its entirety)

import gymnasium as gym
import panda_gym
import numpy as np
import datetime

from stable_baselines3 import HerReplayBuffer
from stable_baselines3.common.callbacks import CheckpointCallback
from sb3_contrib import TQC


env = gym.make("PandaPickAndPlace-v3")

# Create TQC agent:
model = TQC(
    "MultiInputPolicy",
    env,
    batch_size=2048,
    buffer_size=1_000_000,
    gamma=0.95,
    learning_rate=0.001,
    policy_kwargs=dict(net_arch=[512, 512, 512], n_critics=2),
    replay_buffer_class=HerReplayBuffer,
    replay_buffer_kwargs=dict(goal_selection_strategy='future', n_sampled_goal=4),
    tau=0.05,
    seed=3157870761,
    verbose=1,
    tensorboard_log='./tensorboard/TQC/',
)

stringified_time = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
checkpoint_callback = CheckpointCallback(
    save_freq=100_000, 
    save_path=f"./models/{stringified_time}/", 
    name_prefix="tqc_panda_pick_and_place"
)  # Create checkpoint callback.

# Model training: 
model.learn(
    total_timesteps=1_100_000, 
    callback=checkpoint_callback, 
    progress_bar=True
)
model.save("tqc_panda_pick_and_place_final")  # Save final model.

System Information

  • OS: Windows-10-10.0.18363-SP0 10.0.18363
  • Python: 3.9.16
  • Stable-Baselines3: 2.0.0a6
  • PyTorch: 2.0.0
  • GPU Enabled: True
  • Numpy: 1.24.3
  • Cloudpickle: 2.2.1
  • Gymnasium: 0.28.1

Colab Experiment

I tried training it in Colab as well, the environment timed out at around 400k steps, I am also attaching the same information for that experiment. The results do not look the same to me, but I could not find the difference in the code. I can share the notebook as well. :)

Plots

Success Rate

rollout_success_rate_colab

Code

!pip install panda-gym
!pip install git+https://github.com/DLR-RM/stable-baselines3
!pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/
!pip install tqdm
!pip install rich

import gymnasium as gym
import panda_gym
import numpy as np
import datetime
from stable_baselines3 import HerReplayBuffer
from stable_baselines3.common.callbacks import CheckpointCallback
from sb3_contrib import TQC

base_path = '<user-specific-after-mounting-drive>'

env = gym.make("PandaPickAndPlace-v3")

model = TQC(
    "MultiInputPolicy",
    env,
    batch_size=2048,
    buffer_size=1000000,
    gamma=0.95,
    learning_rate=0.001,
    policy_kwargs=dict(net_arch=[512, 512, 512], n_critics=2),
    replay_buffer_class=HerReplayBuffer,
    replay_buffer_kwargs=dict(goal_selection_strategy='future', n_sampled_goal=4),
    tau=0.05,
    seed=3157870761,
    verbose=1,
    tensorboard_log=f'{base_path}/tensorboard/',
)

stringified_time = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
checkpoint_callback = CheckpointCallback( 
    save_freq=10_000,
    save_path=f"{base_path}/models/{stringified_time}/", 
    name_prefix="tqc_panda_pick_and_place"
)  # Callback for saving the model

# Model training: 
model.learn(
    total_timesteps=1_000_000.0,
    callback=checkpoint_callback, 
    progress_bar=True
)
model.save(f"{base_path}/tqc_panda_pick_and_place_final")

System Information

  • OS: Linux-5.10.147+-x86_64-with-glibc2.31 # 1 SMP Sat Dec 10 16:00:40 UTC 2022
  • Python: 3.10.11
  • Stable-Baselines3: 2.0.0a6
  • PyTorch: 2.0.0+cu118
  • GPU Enabled: True
  • Numpy: 1.22.4
  • Cloudpickle: 2.2.1
  • Gymnasium: 0.28.1
  • OpenAI Gym: 0.25.2

from panda-gym.

benquick123 avatar benquick123 commented on June 20, 2024

Hi, has anyone figured it out at the end? I can't reproduce PIckAndPlace results using TQC or SAC with either huggingface hyperparameters or hyperparameters from the panda-gym paper.

from panda-gym.

tindiz avatar tindiz commented on June 20, 2024

Hi, I made no progress. I got inconsistent results and never managed to replicate the ones documented.

from panda-gym.

benquick123 avatar benquick123 commented on June 20, 2024

Ok, the error was actually on my side. While I still can't reproduce SAC results, TQC works with huggingface hyperparameters after fixing the bug in my code.

from panda-gym.

zichunxx avatar zichunxx commented on June 20, 2024

Hi! Has anyone successfully completed the pick and place task with DDPG or SAC? I'm confused about the reason for the failure. What possible factors caused this?

from panda-gym.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.