Comments (8)
updated with more details
from maniskill.
Does this still fail when you try on colab after the new fix? Or this is just your GPU server. On my side Colab works fine atm (I tried running the visual RL code).
Do you have any details about your GPU server setup?
from maniskill.
Colab runs fine (I'm training the Visual RL block at the moment).
The GPU setup uses srun
to request for resources.
I'm ssh to it from my Mac (using -X forwarding with XQuartz for graphics). After a GPU resource is allocated, I can ssh into that and I can confirm that X11 forwarding still works by using xclock
in the server and the clock actually shows up on my Mac screen.
I'm using conda env with Python 3.8
My ~/.bashrc in the server has these configuration for vulkan
export VK_ICD_FILENAMES=/etc/vulkan/icd.d/nvidia_icd.json
export VK_LAYER_PATH=/etc/vulkan/implicit_layer.d/nvidia_layers.json
Since the regular env can run and render the scene, I think the graphical forwarding is working.
However, this only happens when I try to use VecEnv
. There's this error/warning that says Only 1 renderer is allowed per process. All previously created renderer resources are now invalid
. This makes me wonder whether the GPU-optimized vectorized environments are trying to create multiple Vulkan renderes within a single process? From the error, I interpret it as Vulkan only allowing "1 render per process" but this VecEnv
is somehow attempting to create multiple renderers/resources per process.
nvidia-smi
nvidia-smi
Mon May 1 22:30:54 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA TITAN Xp On | 00000000:02:00.0 Off | N/A |
| 23% 18C P8 8W / 250W| 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
from maniskill.
So I'm running into a similar issue when I try on a local linux laptop with a GPU
Here's my local Python 3.8 env setup
Package Version Editable project location
------------------------ ---------- -------------------------------------------
absl-py 1.4.0
antlr4-python3-runtime 4.9.3
anyio 3.6.2
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
asttokens 2.2.1
attrs 23.1.0
backcall 0.2.0
beautifulsoup4 4.12.2
bleach 6.0.0
cachetools 5.3.0
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 3.1.0
cloudpickle 2.2.1
cmake 3.26.3
comm 0.1.3
contourpy 1.0.7
cycler 0.11.0
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
executing 1.2.0
fastjsonschema 2.16.3
filelock 3.12.0
fonttools 4.39.3
fqdn 1.5.1
gdown 4.7.1
gitdb 4.0.10
GitPython 3.1.31
google-auth 2.17.3
google-auth-oauthlib 1.0.0
grpcio 1.54.0
gym 0.21.0
h5py 3.8.0
hydra-core 1.3.2
idna 3.4
imageio 2.28.1
imageio-ffmpeg 0.4.8
importlib-metadata 4.13.0
importlib-resources 5.12.0
ipykernel 6.22.0
ipython 8.12.1
ipython-genutils 0.2.0
isoduration 20.11.0
jedi 0.18.2
Jinja2 3.1.2
jsonpointer 2.3
jsonschema 4.17.3
jupyter_client 8.2.0
jupyter_core 5.3.0
jupyter-events 0.6.3
jupyter_server 2.5.0
jupyter_server_terminals 0.4.4
jupyterlab-pygments 0.2.2
kiwisolver 1.4.4
lit 16.0.2
mani-skill2 0.4.2 /home/nt/Documents/RobotLearning/ManiSkill2
Markdown 3.4.3
MarkupSafe 2.1.2
matplotlib 3.7.1
matplotlib-inline 0.1.6
mistune 2.0.5
mpmath 1.3.0
nbclassic 0.5.6
nbclient 0.7.4
nbconvert 7.3.1
nbformat 5.8.0
nest-asyncio 1.5.6
networkx 3.1
notebook 6.5.4
notebook_shim 0.2.3
numpy 1.23.5
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
oauthlib 3.2.2
omegaconf 2.3.0
opencv-python 4.7.0.72
packaging 23.1
pandas 2.0.1
pandocfilters 1.5.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 23.0.1
pkgutil_resolve_name 1.3.10
platformdirs 3.5.0
prometheus-client 0.16.0
prompt-toolkit 3.0.38
protobuf 4.22.3
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycparser 2.21
Pygments 2.15.1
pyparsing 3.0.9
pyrsistent 0.19.3
PySocks 1.7.1
python-dateutil 2.8.2
python-json-logger 2.0.7
pytz 2023.3
PyYAML 6.0
pyzmq 25.0.2
r3m 0.0.0 /home/nt/Documents/RobotLearning/r3m
requests 2.29.0
requests-oauthlib 1.3.1
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rsa 4.9
Rtree 1.0.1
sapien 2.2.1
scipy 1.10.1
Send2Trash 1.8.2
setuptools 65.5.0
six 1.16.0
smmap 5.0.0
sniffio 1.3.0
soupsieve 2.4.1
stable-baselines3 1.8.0
stack-data 0.6.2
sympy 1.11.1
tabulate 0.9.0
tensorboard 2.12.2
tensorboard-data-server 0.7.0
tensorboard-plugin-wit 1.8.1
terminado 0.17.1
tinycss2 1.2.1
torch 2.0.0
torchvision 0.15.1
tornado 6.3.1
tqdm 4.65.0
traitlets 5.9.0
transforms3d 0.4.1
trimesh 3.21.5
triton 2.0.0
typing_extensions 4.5.0
tzdata 2023.3
uri-template 1.2.0
urllib3 1.26.15
wcwidth 0.2.6
webcolors 1.13
webencodings 0.5.1
websocket-client 1.5.1
Werkzeug 2.3.3
wheel 0.38.4
zipp 3.15.0
[2023-05-02 13:44:11.123] [svulkan2] [warning] Only 1 renderer is allowed per process. All previously created renderer resources are now invalid
2023-05-02 13:44:11,160 - mani_skill2 - INFO - RenderServer is running at: localhost:34585
2023-05-02 13:44:12,781 - mani_skill2 - ERROR - 'NoneType' object has no attribute 'vertices'
Traceback (most recent call last):
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py", line 56, in _worker
env = env_fn()
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/registration.py", line 11, in _make_env
env = env_spec.make(**kwargs)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/utils/registration.py", line 34, in make
return self.cls(**_kwargs)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 38, in __init__
super().__init__(*args, **kwargs)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/base_env.py", line 55, in __init__
super().__init__(*args, **kwargs)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 178, in __init__
obs = self.reset(reconfigure=True)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 159, in reset
return super().reset(seed=seed, reconfigure=reconfigure, model_id=model_id)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/base_env.py", line 87, in reset
ret = super().reset(seed=self._episode_seed, reconfigure=reconfigure)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 473, in reset
self.reconfigure()
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 359, in reconfigure
self._load_articulations()
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 66, in _load_articulations
self._set_cabinet_handles_mesh()
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 94, in _set_cabinet_handles_mesh
meshes.extend(get_visual_body_meshes(visual_body))
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/utils/trimesh_utils.py", line 40, in get_visual_body_meshes
vertices = render_shape.mesh.vertices * visual_body.scale # [n, 3]
AttributeError: 'NoneType' object has no attribute 'vertices'
Process ForkServerProcess-4:
Traceback (most recent call last):
File "/home/nt/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/nt/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py", line 86, in _worker
env.close()
UnboundLocalError: local variable 'env' referenced before assignment
2023-05-02 13:44:12,803 - mani_skill2 - ERROR - 'NoneType' object has no attribute 'vertices'
Traceback (most recent call last):
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py", line 56, in _worker
env = env_fn()
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/registration.py", line 11, in _make_env
env = env_spec.make(**kwargs)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/utils/registration.py", line 34, in make
return self.cls(**_kwargs)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 38, in __init__
super().__init__(*args, **kwargs)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/base_env.py", line 55, in __init__
super().__init__(*args, **kwargs)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 178, in __init__
obs = self.reset(reconfigure=True)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 159, in reset
return super().reset(seed=seed, reconfigure=reconfigure, model_id=model_id)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/base_env.py", line 87, in reset
ret = super().reset(seed=self._episode_seed, reconfigure=reconfigure)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 473, in reset
self.reconfigure()
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 359, in reconfigure
self._load_articulations()
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 66, in _load_articulations
self._set_cabinet_handles_mesh()
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 94, in _set_cabinet_handles_mesh
meshes.extend(get_visual_body_meshes(visual_body))
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/utils/trimesh_utils.py", line 40, in get_visual_body_meshes
vertices = render_shape.mesh.vertices * visual_body.scale # [n, 3]
AttributeError: 'NoneType' object has no attribute 'vertices'
Process ForkServerProcess-5:
Traceback (most recent call last):
File "/home/nt/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/nt/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py", line 86, in _worker
env.close()
UnboundLocalError: local variable 'env' referenced before assignment
---------------------------------------------------------------------------
ConnectionResetError Traceback (most recent call last)
Cell In[10], line 50
46 eval_env.reset()
48 # create num_envs training environments, with max_episode_steps=100
49 # instead of the default 200 to speed up training
---> 50 env: VecEnv = make_vec_env(
51 env_id,
52 num_envs,
53 obs_mode=obs_mode,
54 reward_mode=reward_mode,
55 control_mode=control_mode,
56 # specify wrappers for each individual environment e.g here we specify the
57 # Continuous task wrapper and pass in the max_episode_steps parameter via the partial tool
58 wrappers=[
59 partial(ContinuousTaskWrapper, max_episode_steps=100)
60 ]
61 )
62 env = ManiSkillRGBDVecEnvWrapper(env)
63 # use the maniskill provided SB3VecEnvWrapper to make the environment compatible with SB3
File ~/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/registration.py:81, in make(env_id, num_envs, server_address, wrappers, enable_segmentation, **kwargs)
77 else:
78 raise NotImplementedError(
79 f"Unsupported observation mode for VecEnv: {obs_mode}"
80 )
---> 81 venv = venv_cls([env_fn for _ in range(num_envs)], server_address=server_address)
82 venv.obs_mode = obs_mode
84 if "robot_seg" in obs_mode:
File ~/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py:435, in RGBDVecEnv.__init__(self, *args, **kwargs)
434 def __init__(self, *args, **kwargs):
--> 435 super().__init__(*args, **kwargs)
437 from mani_skill2.utils.wrappers.observation import RGBDObservationWrapper
439 RGBDObservationWrapper.update_observation_space(self.observation_space)
File ~/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py:204, in VecEnv.__init__(self, env_fns, start_method, server_address, server_kwargs)
202 remote.send(("handshake", None))
203 for remote in self.remotes:
--> 204 remote.recv()
206 # Infer texture names
207 texture_names = set()
File ~/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/connection.py:250, in _ConnectionBase.recv(self)
248 self._check_closed()
249 self._check_readable()
--> 250 buf = self._recv_bytes()
251 return _ForkingPickler.loads(buf.getbuffer())
File ~/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/connection.py:414, in Connection._recv_bytes(self, maxsize)
413 def _recv_bytes(self, maxsize=None):
--> 414 buf = self._recv(4)
415 size, = struct.unpack("!i", buf.getvalue())
416 if size == -1:
File ~/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/connection.py:379, in Connection._recv(self, size, read)
377 remaining = size
378 while remaining > 0:
--> 379 chunk = read(handle, remaining)
380 n = len(chunk)
381 if n == 0:
ConnectionResetError: [Errno 104] Connection reset by peer
from maniskill.
after further debugging, I wonder if the opencabinet env is the culprit,
When I default to the example "LiftCube-v0", Colab, GPU server, and local machine seem to run fine!
num_envs = 2 # you can increases this and decrease the n_steps parameter if you have more cores to speed up training
env_id = "LiftCube-v0"
obs_mode = "state"
control_mode = "pd_ee_delta_pose"
reward_mode = "dense"
from maniskill.
Could you try using a sparse reward setting for the failing envs?
from maniskill.
Closing in favor of https://github.com/haosulab/ManiSkill2/issues/88
from maniskill.
Current solution: https://github.com/haosulab/ManiSkill2/issues/88#issuecomment-1532194498
from maniskill.
Related Issues (20)
- Computing end effector pose of robot HOT 2
- Align evaluation setups for different online RL algorithms
- [Enhancement] Make control_mode pd_ee_pose for target pose control HOT 4
- [Docs] Update google colab quick start with some nicer images in the first cell and new info
- Motionplanning GPU multi-env ? HOT 1
- Improve PPO baselines when there are no partial resets HOT 3
- Question on the effects of `use_target` in a controller config object HOT 3
- [Question]Motion Planning for Articulated Object link! HOT 1
- Support systems without GPUs for cpu sim running only
- [Question]How can I get real-time bbox about object when its position changes in motionplanning? HOT 1
- [Question] Debug Drawing in ManiSkill HOT 4
- [Question] Difficulty Achieving Correct Orientation in 'PickCube-v0' with Pose Control
- ValueError: Unicode strings with encoding declaration are not supported. HOT 1
- Getting specific object pose in mobile manipulation scene HOT 2
- [Question] Inverse Kinematics on GPU HOT 4
- Document how to build controllers in depth
- How to handle unexpected motions? HOT 2
- [Bug] env.get_state fails but env.get_state_dict works for PegInesertionSide HOT 1
- Fails when running RGB based PPO baseline HOT 3
- [Question] `max_episode_steps` for `num_envs>1` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from maniskill.