Git Product home page Git Product logo

Comments (8)

xiaoninghe avatar xiaoninghe commented on August 23, 2024 1

Is the vectorized env issue happening on Colab, too?

the line ConnectionResetError: [Errno 104] Connection reset by peer is probably relevant to #86 (but I was running on both the GPU server as well as a local machine with a. GPU)

I got the ConnectionResetError when I run the 2_reinforcement_learning notebook on this line:

eval_env = SubprocVecEnv([make_env(env_id, record_dir="logs/videos") for i in range(1)])

The problem I found was due to the stable-baselines3 version. After changing to stable-baselines3==1.7.0 in the first cell the problem went away. Maybe you can try if that fixes anything?

Edit: Ah, just saw the bug is fixed after migration to gymnasium in v0.5.0
https://colab.research.google.com/github/haosulab/ManiSkill2/blob/0.5.0/examples/tutorials/2_reinforcement_learning.ipynb

from maniskill.

StoneT2000 avatar StoneT2000 commented on August 23, 2024 1

update, this is fixed now once #139 is merged!

from maniskill.

megatran avatar megatran commented on August 23, 2024

Is the vectorized env issue happening on Colab, too?

the line ConnectionResetError: [Errno 104] Connection reset by peer is probably relevant to https://github.com/haosulab/ManiSkill2/issues/86 (but I was running on both the GPU server as well as a local machine with a. GPU)

from maniskill.

StoneT2000 avatar StoneT2000 commented on August 23, 2024

Investigating now.

from maniskill.

StoneT2000 avatar StoneT2000 commented on August 23, 2024

Ok so currently the issue is that when using dense rewards, we rely on the visual meshes of the handles to compute a pretty good dense reward. However, the fast vec env implementation relies on rendering everything on a separate server effectively so when computing dense rewards these visual mesh data are not available actually on a per env process.

For now you will have to just use sparse reward. Alternatively you can construct a simple dense reward it'll just train somewhat slower. I'll reply once I discuss with others about what changes we will make for these visual mesh based reward functions

from maniskill.

megatran avatar megatran commented on August 23, 2024

Ok so currently the issue is that when using dense rewards, we rely on the visual meshes of the handles to compute a pretty good dense reward. However, the fast vec env implementation relies on rendering everything on a separate server effectively so when computing dense rewards these visual mesh data are not available actually on a per env process.

For now you will have to just use sparse reward. Alternatively you can construct a simple dense reward it'll just train somewhat slower. I'll reply once I discuss with others about what changes we will make for these visual mesh based reward functions

Thanks so much for looking into this! I'm pretty new to robot learning in general and couldn't find a page on the documentation that describes what the simulator does when dense/sparse reward is selected. Most of the tutorial colab pages use dense rewards so I just assume it's a good choice to use.

Could you help clarify my understanding a bit?

To me, dense reward sounds like the env is giving more finite feedback in every time step, and sparse reward sounds like a less frequent amount of reward coming from the env (perhaps every other time step or when the goal is achieved?). If my mental model is accurate, then this choice of dense vs. sparse would affect PPO training, right?

For some more context, I'm playing around with this idea of training a PPO model that can effectively learn a task using both state-space and visual embeddings (it seems like the CustomExtractor(BaseFeaturesExtractor) in the sample RL notebook only processes either visual or state, so I'm modifying that). Then I want to train the DAGGER imitation learner using the expert demonstration (coming from the mentioned PPO). The primary reason for using PPO as the expert for DAGGER is the lack of available expert data for the specific task.

Do you think the choice of sparse vs. dense rewards would affect this approach? Thank you again!

from maniskill.

StoneT2000 avatar StoneT2000 commented on August 23, 2024

In all ManiSkill environments naive PPO/SAC will almost never solve them from state or visual based observations when using sparse rewards. This is the same for any environment in other benchmarks. If your agent can get some success in a sparse reward environment using PPO/SAC without any kind of exploration method/intrinsic motivation method then that task is too easy in my opinion (e.g. Most implementations of some kind of Cube Lifting environment)

Sparse rewards are basically giving your agent +1 if it succeeds and 0 if it does not, meaning for each episode your reward only comes at the end. Modelling sparse reward functions is an open problem but interesting to tackle since almost all environments have easy definable sparse rewards (which are just success indicators) but no easily definable dense rewards.

There is "expert" data for each task, there are ~1k demonstration trajectories for every environment with every combo of observation space x action space. However these are generated with motion planning, which may or may not impact learning.

from maniskill.

hantao-zhou avatar hantao-zhou commented on August 23, 2024

Nope the error still persist at 8th December of 2023

from maniskill.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.