When running any cell with make_vec_env() , such as th

update, this is fixed now once <a class="issue-link js-issue-link" data-error-text="Fa

Google Colab Visual RL "connection reset by peer", and PPO issues about maniskill HOT 8 CLOSED

rliu100 commented on August 23, 2024 1

Google Colab Visual RL "connection reset by peer", and PPO issues

from maniskill.

Comments (8)

xiaoninghe commented on August 23, 2024 1

Is the vectorized env issue happening on Colab, too?

the line ConnectionResetError: [Errno 104] Connection reset by peer is probably relevant to #86 (but I was running on both the GPU server as well as a local machine with a. GPU)

I got the ConnectionResetError when I run the 2_reinforcement_learning notebook on this line:

eval_env = SubprocVecEnv([make_env(env_id, record_dir="logs/videos") for i in range(1)])

The problem I found was due to the stable-baselines3 version. After changing to stable-baselines3==1.7.0 in the first cell the problem went away. Maybe you can try if that fixes anything?

Edit: Ah, just saw the bug is fixed after migration to gymnasium in v0.5.0
https://colab.research.google.com/github/haosulab/ManiSkill2/blob/0.5.0/examples/tutorials/2_reinforcement_learning.ipynb

from maniskill.

StoneT2000 commented on August 23, 2024 1

update, this is fixed now once #139 is merged!

from maniskill.

megatran commented on August 23, 2024

Is the vectorized env issue happening on Colab, too?

the line ConnectionResetError: [Errno 104] Connection reset by peer is probably relevant to https://github.com/haosulab/ManiSkill2/issues/86 (but I was running on both the GPU server as well as a local machine with a. GPU)

from maniskill.

StoneT2000 commented on August 23, 2024

Investigating now.

from maniskill.

StoneT2000 commented on August 23, 2024

Ok so currently the issue is that when using dense rewards, we rely on the visual meshes of the handles to compute a pretty good dense reward. However, the fast vec env implementation relies on rendering everything on a separate server effectively so when computing dense rewards these visual mesh data are not available actually on a per env process.

For now you will have to just use sparse reward. Alternatively you can construct a simple dense reward it'll just train somewhat slower. I'll reply once I discuss with others about what changes we will make for these visual mesh based reward functions

from maniskill.

megatran commented on August 23, 2024

Ok so currently the issue is that when using dense rewards, we rely on the visual meshes of the handles to compute a pretty good dense reward. However, the fast vec env implementation relies on rendering everything on a separate server effectively so when computing dense rewards these visual mesh data are not available actually on a per env process.

For now you will have to just use sparse reward. Alternatively you can construct a simple dense reward it'll just train somewhat slower. I'll reply once I discuss with others about what changes we will make for these visual mesh based reward functions

Thanks so much for looking into this! I'm pretty new to robot learning in general and couldn't find a page on the documentation that describes what the simulator does when dense/sparse reward is selected. Most of the tutorial colab pages use dense rewards so I just assume it's a good choice to use.

Could you help clarify my understanding a bit?

To me, dense reward sounds like the env is giving more finite feedback in every time step, and sparse reward sounds like a less frequent amount of reward coming from the env (perhaps every other time step or when the goal is achieved?). If my mental model is accurate, then this choice of dense vs. sparse would affect PPO training, right?

For some more context, I'm playing around with this idea of training a PPO model that can effectively learn a task using both state-space and visual embeddings (it seems like the CustomExtractor(BaseFeaturesExtractor) in the sample RL notebook only processes either visual or state, so I'm modifying that). Then I want to train the DAGGER imitation learner using the expert demonstration (coming from the mentioned PPO). The primary reason for using PPO as the expert for DAGGER is the lack of available expert data for the specific task.

Do you think the choice of sparse vs. dense rewards would affect this approach? Thank you again!

from maniskill.

StoneT2000 commented on August 23, 2024

In all ManiSkill environments naive PPO/SAC will almost never solve them from state or visual based observations when using sparse rewards. This is the same for any environment in other benchmarks. If your agent can get some success in a sparse reward environment using PPO/SAC without any kind of exploration method/intrinsic motivation method then that task is too easy in my opinion (e.g. Most implementations of some kind of Cube Lifting environment)

Sparse rewards are basically giving your agent +1 if it succeeds and 0 if it does not, meaning for each episode your reward only comes at the end. Modelling sparse reward functions is an open problem but interesting to tackle since almost all environments have easy definable sparse rewards (which are just success indicators) but no easily definable dense rewards.

There is "expert" data for each task, there are ~1k demonstration trajectories for every environment with every combo of observation space x action space. However these are generated with motion planning, which may or may not impact learning.

from maniskill.

hantao-zhou commented on August 23, 2024

Nope the error still persist at 8th December of 2023

from maniskill.

Google Colab Visual RL "connection reset by peer", and PPO issues about maniskill HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent