Comments (8)
Is the vectorized env issue happening on Colab, too?
the line
ConnectionResetError: [Errno 104] Connection reset by peer
is probably relevant to #86 (but I was running on both the GPU server as well as a local machine with a. GPU)
I got the ConnectionResetError
when I run the 2_reinforcement_learning notebook on this line:
eval_env = SubprocVecEnv([make_env(env_id, record_dir="logs/videos") for i in range(1)])
The problem I found was due to the stable-baselines3
version. After changing to stable-baselines3==1.7.0
in the first cell the problem went away. Maybe you can try if that fixes anything?
Edit: Ah, just saw the bug is fixed after migration to gymnasium in v0.5.0
https://colab.research.google.com/github/haosulab/ManiSkill2/blob/0.5.0/examples/tutorials/2_reinforcement_learning.ipynb
from maniskill.
update, this is fixed now once #139 is merged!
from maniskill.
Is the vectorized env issue happening on Colab, too?
the line ConnectionResetError: [Errno 104] Connection reset by peer
is probably relevant to https://github.com/haosulab/ManiSkill2/issues/86 (but I was running on both the GPU server as well as a local machine with a. GPU)
from maniskill.
Investigating now.
from maniskill.
Ok so currently the issue is that when using dense rewards, we rely on the visual meshes of the handles to compute a pretty good dense reward. However, the fast vec env implementation relies on rendering everything on a separate server effectively so when computing dense rewards these visual mesh data are not available actually on a per env process.
For now you will have to just use sparse reward. Alternatively you can construct a simple dense reward it'll just train somewhat slower. I'll reply once I discuss with others about what changes we will make for these visual mesh based reward functions
from maniskill.
Ok so currently the issue is that when using dense rewards, we rely on the visual meshes of the handles to compute a pretty good dense reward. However, the fast vec env implementation relies on rendering everything on a separate server effectively so when computing dense rewards these visual mesh data are not available actually on a per env process.
For now you will have to just use sparse reward. Alternatively you can construct a simple dense reward it'll just train somewhat slower. I'll reply once I discuss with others about what changes we will make for these visual mesh based reward functions
Thanks so much for looking into this! I'm pretty new to robot learning in general and couldn't find a page on the documentation that describes what the simulator does when dense/sparse reward is selected. Most of the tutorial colab pages use dense
rewards so I just assume it's a good choice to use.
Could you help clarify my understanding a bit?
To me, dense reward sounds like the env is giving more finite feedback in every time step, and sparse reward sounds like a less frequent amount of reward coming from the env (perhaps every other time step or when the goal is achieved?). If my mental model is accurate, then this choice of dense vs. sparse would affect PPO training, right?
For some more context, I'm playing around with this idea of training a PPO model that can effectively learn a task using both state-space and visual embeddings (it seems like the CustomExtractor(BaseFeaturesExtractor)
in the sample RL notebook only processes either visual or state, so I'm modifying that). Then I want to train the DAGGER imitation learner using the expert demonstration (coming from the mentioned PPO). The primary reason for using PPO as the expert for DAGGER is the lack of available expert data for the specific task.
Do you think the choice of sparse vs. dense rewards would affect this approach? Thank you again!
from maniskill.
In all ManiSkill environments naive PPO/SAC will almost never solve them from state or visual based observations when using sparse rewards. This is the same for any environment in other benchmarks. If your agent can get some success in a sparse reward environment using PPO/SAC without any kind of exploration method/intrinsic motivation method then that task is too easy in my opinion (e.g. Most implementations of some kind of Cube Lifting environment)
Sparse rewards are basically giving your agent +1 if it succeeds and 0 if it does not, meaning for each episode your reward only comes at the end. Modelling sparse reward functions is an open problem but interesting to tackle since almost all environments have easy definable sparse rewards (which are just success indicators) but no easily definable dense rewards.
There is "expert" data for each task, there are ~1k demonstration trajectories for every environment with every combo of observation space x action space. However these are generated with motion planning, which may or may not impact learning.
from maniskill.
Nope the error still persist at 8th December of 2023
from maniskill.
Related Issues (20)
- Computing end effector pose of robot HOT 2
- Align evaluation setups for different online RL algorithms
- [Enhancement] Make control_mode pd_ee_pose for target pose control HOT 4
- [Docs] Update google colab quick start with some nicer images in the first cell and new info
- Motionplanning GPU multi-env ? HOT 1
- Improve PPO baselines when there are no partial resets HOT 3
- Question on the effects of `use_target` in a controller config object HOT 3
- [Question]Motion Planning for Articulated Object link! HOT 1
- Support systems without GPUs for cpu sim running only
- [Question]How can I get real-time bbox about object when its position changes in motionplanning? HOT 1
- [Question] Debug Drawing in ManiSkill HOT 4
- [Question] Difficulty Achieving Correct Orientation in 'PickCube-v0' with Pose Control
- ValueError: Unicode strings with encoding declaration are not supported. HOT 1
- Getting specific object pose in mobile manipulation scene HOT 2
- [Question] Inverse Kinematics on GPU HOT 4
- Document how to build controllers in depth
- How to handle unexpected motions? HOT 2
- [Bug] env.get_state fails but env.get_state_dict works for PegInesertionSide HOT 1
- Fails when running RGB based PPO baseline HOT 3
- [Question] `max_episode_steps` for `num_envs>1` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from maniskill.