Git Product home page Git Product logo

Comments (8)

megatran avatar megatran commented on August 23, 2024

updated with more details

from maniskill.

StoneT2000 avatar StoneT2000 commented on August 23, 2024

Does this still fail when you try on colab after the new fix? Or this is just your GPU server. On my side Colab works fine atm (I tried running the visual RL code).

Do you have any details about your GPU server setup?

from maniskill.

megatran avatar megatran commented on August 23, 2024

Colab runs fine (I'm training the Visual RL block at the moment).

The GPU setup uses srun to request for resources.

I'm ssh to it from my Mac (using -X forwarding with XQuartz for graphics). After a GPU resource is allocated, I can ssh into that and I can confirm that X11 forwarding still works by using xclock in the server and the clock actually shows up on my Mac screen.

I'm using conda env with Python 3.8

My ~/.bashrc in the server has these configuration for vulkan

export VK_ICD_FILENAMES=/etc/vulkan/icd.d/nvidia_icd.json
export VK_LAYER_PATH=/etc/vulkan/implicit_layer.d/nvidia_layers.json

Since the regular env can run and render the scene, I think the graphical forwarding is working.

image

However, this only happens when I try to use VecEnv. There's this error/warning that says Only 1 renderer is allowed per process. All previously created renderer resources are now invalid. This makes me wonder whether the GPU-optimized vectorized environments are trying to create multiple Vulkan renderes within a single process? From the error, I interpret it as Vulkan only allowing "1 render per process" but this VecEnv is somehow attempting to create multiple renderers/resources per process.

nvidia-smi

nvidia-smi
Mon May  1 22:30:54 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA TITAN Xp                 On | 00000000:02:00.0 Off |                  N/A |
| 23%   18C    P8                8W / 250W|      1MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                       

from maniskill.

megatran avatar megatran commented on August 23, 2024

So I'm running into a similar issue when I try on a local linux laptop with a GPU

Here's my local Python 3.8 env setup

Package                  Version    Editable project location
------------------------ ---------- -------------------------------------------
absl-py                  1.4.0
antlr4-python3-runtime   4.9.3
anyio                    3.6.2
argon2-cffi              21.3.0
argon2-cffi-bindings     21.2.0
arrow                    1.2.3
asttokens                2.2.1
attrs                    23.1.0
backcall                 0.2.0
beautifulsoup4           4.12.2
bleach                   6.0.0
cachetools               5.3.0
certifi                  2022.12.7
cffi                     1.15.1
charset-normalizer       3.1.0
cloudpickle              2.2.1
cmake                    3.26.3
comm                     0.1.3
contourpy                1.0.7
cycler                   0.11.0
debugpy                  1.6.7
decorator                5.1.1
defusedxml               0.7.1
executing                1.2.0
fastjsonschema           2.16.3
filelock                 3.12.0
fonttools                4.39.3
fqdn                     1.5.1
gdown                    4.7.1
gitdb                    4.0.10
GitPython                3.1.31
google-auth              2.17.3
google-auth-oauthlib     1.0.0
grpcio                   1.54.0
gym                      0.21.0
h5py                     3.8.0
hydra-core               1.3.2
idna                     3.4
imageio                  2.28.1
imageio-ffmpeg           0.4.8
importlib-metadata       4.13.0
importlib-resources      5.12.0
ipykernel                6.22.0
ipython                  8.12.1
ipython-genutils         0.2.0
isoduration              20.11.0
jedi                     0.18.2
Jinja2                   3.1.2
jsonpointer              2.3
jsonschema               4.17.3
jupyter_client           8.2.0
jupyter_core             5.3.0
jupyter-events           0.6.3
jupyter_server           2.5.0
jupyter_server_terminals 0.4.4
jupyterlab-pygments      0.2.2
kiwisolver               1.4.4
lit                      16.0.2
mani-skill2              0.4.2      /home/nt/Documents/RobotLearning/ManiSkill2
Markdown                 3.4.3
MarkupSafe               2.1.2
matplotlib               3.7.1
matplotlib-inline        0.1.6
mistune                  2.0.5
mpmath                   1.3.0
nbclassic                0.5.6
nbclient                 0.7.4
nbconvert                7.3.1
nbformat                 5.8.0
nest-asyncio             1.5.6
networkx                 3.1
notebook                 6.5.4
notebook_shim            0.2.3
numpy                    1.23.5
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
oauthlib                 3.2.2
omegaconf                2.3.0
opencv-python            4.7.0.72
packaging                23.1
pandas                   2.0.1
pandocfilters            1.5.0
parso                    0.8.3
pexpect                  4.8.0
pickleshare              0.7.5
Pillow                   9.5.0
pip                      23.0.1
pkgutil_resolve_name     1.3.10
platformdirs             3.5.0
prometheus-client        0.16.0
prompt-toolkit           3.0.38
protobuf                 4.22.3
psutil                   5.9.5
ptyprocess               0.7.0
pure-eval                0.2.2
pyasn1                   0.5.0
pyasn1-modules           0.3.0
pycparser                2.21
Pygments                 2.15.1
pyparsing                3.0.9
pyrsistent               0.19.3
PySocks                  1.7.1
python-dateutil          2.8.2
python-json-logger       2.0.7
pytz                     2023.3
PyYAML                   6.0
pyzmq                    25.0.2
r3m                      0.0.0      /home/nt/Documents/RobotLearning/r3m
requests                 2.29.0
requests-oauthlib        1.3.1
rfc3339-validator        0.1.4
rfc3986-validator        0.1.1
rsa                      4.9
Rtree                    1.0.1
sapien                   2.2.1
scipy                    1.10.1
Send2Trash               1.8.2
setuptools               65.5.0
six                      1.16.0
smmap                    5.0.0
sniffio                  1.3.0
soupsieve                2.4.1
stable-baselines3        1.8.0
stack-data               0.6.2
sympy                    1.11.1
tabulate                 0.9.0
tensorboard              2.12.2
tensorboard-data-server  0.7.0
tensorboard-plugin-wit   1.8.1
terminado                0.17.1
tinycss2                 1.2.1
torch                    2.0.0
torchvision              0.15.1
tornado                  6.3.1
tqdm                     4.65.0
traitlets                5.9.0
transforms3d             0.4.1
trimesh                  3.21.5
triton                   2.0.0
typing_extensions        4.5.0
tzdata                   2023.3
uri-template             1.2.0
urllib3                  1.26.15
wcwidth                  0.2.6
webcolors                1.13
webencodings             0.5.1
websocket-client         1.5.1
Werkzeug                 2.3.3
wheel                    0.38.4
zipp                     3.15.0
[2023-05-02 13:44:11.123] [svulkan2] [warning] Only 1 renderer is allowed per process. All previously created renderer resources are now invalid
2023-05-02 13:44:11,160 - mani_skill2 - INFO - RenderServer is running at: localhost:34585
2023-05-02 13:44:12,781 - mani_skill2 - ERROR - 'NoneType' object has no attribute 'vertices'
Traceback (most recent call last):
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py", line 56, in _worker
    env = env_fn()
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/registration.py", line 11, in _make_env
    env = env_spec.make(**kwargs)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/utils/registration.py", line 34, in make
    return self.cls(**_kwargs)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 38, in __init__
    super().__init__(*args, **kwargs)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/base_env.py", line 55, in __init__
    super().__init__(*args, **kwargs)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 178, in __init__
    obs = self.reset(reconfigure=True)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 159, in reset
    return super().reset(seed=seed, reconfigure=reconfigure, model_id=model_id)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/base_env.py", line 87, in reset
    ret = super().reset(seed=self._episode_seed, reconfigure=reconfigure)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 473, in reset
    self.reconfigure()
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 359, in reconfigure
    self._load_articulations()
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 66, in _load_articulations
    self._set_cabinet_handles_mesh()
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 94, in _set_cabinet_handles_mesh
    meshes.extend(get_visual_body_meshes(visual_body))
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/utils/trimesh_utils.py", line 40, in get_visual_body_meshes
    vertices = render_shape.mesh.vertices * visual_body.scale  # [n, 3]
AttributeError: 'NoneType' object has no attribute 'vertices'
Process ForkServerProcess-4:
Traceback (most recent call last):
  File "/home/nt/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/nt/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py", line 86, in _worker
    env.close()
UnboundLocalError: local variable 'env' referenced before assignment
2023-05-02 13:44:12,803 - mani_skill2 - ERROR - 'NoneType' object has no attribute 'vertices'
Traceback (most recent call last):
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py", line 56, in _worker
    env = env_fn()
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/registration.py", line 11, in _make_env
    env = env_spec.make(**kwargs)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/utils/registration.py", line 34, in make
    return self.cls(**_kwargs)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 38, in __init__
    super().__init__(*args, **kwargs)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/base_env.py", line 55, in __init__
    super().__init__(*args, **kwargs)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 178, in __init__
    obs = self.reset(reconfigure=True)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 159, in reset
    return super().reset(seed=seed, reconfigure=reconfigure, model_id=model_id)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/base_env.py", line 87, in reset
    ret = super().reset(seed=self._episode_seed, reconfigure=reconfigure)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 473, in reset
    self.reconfigure()
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/sapien_env.py", line 359, in reconfigure
    self._load_articulations()
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 66, in _load_articulations
    self._set_cabinet_handles_mesh()
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/envs/ms1/open_cabinet_door_drawer.py", line 94, in _set_cabinet_handles_mesh
    meshes.extend(get_visual_body_meshes(visual_body))
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/utils/trimesh_utils.py", line 40, in get_visual_body_meshes
    vertices = render_shape.mesh.vertices * visual_body.scale  # [n, 3]
AttributeError: 'NoneType' object has no attribute 'vertices'
Process ForkServerProcess-5:
Traceback (most recent call last):
  File "/home/nt/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/nt/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/nt/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py", line 86, in _worker
    env.close()
UnboundLocalError: local variable 'env' referenced before assignment

---------------------------------------------------------------------------
ConnectionResetError                      Traceback (most recent call last)
Cell In[10], line 50
     46 eval_env.reset()
     48 # create num_envs training environments, with max_episode_steps=100
     49 # instead of the default 200 to speed up training
---> 50 env: VecEnv = make_vec_env(
     51     env_id,
     52     num_envs,
     53     obs_mode=obs_mode,
     54     reward_mode=reward_mode,
     55     control_mode=control_mode,
     56     # specify wrappers for each individual environment e.g here we specify the
     57     # Continuous task wrapper and pass in the max_episode_steps parameter via the partial tool
     58     wrappers=[
     59         partial(ContinuousTaskWrapper, max_episode_steps=100)
     60     ]
     61 )
     62 env = ManiSkillRGBDVecEnvWrapper(env)
     63 # use the maniskill provided SB3VecEnvWrapper to make the environment compatible with SB3

File ~/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/registration.py:81, in make(env_id, num_envs, server_address, wrappers, enable_segmentation, **kwargs)
     77 else:
     78     raise NotImplementedError(
     79         f"Unsupported observation mode for VecEnv: {obs_mode}"
     80     )
---> 81 venv = venv_cls([env_fn for _ in range(num_envs)], server_address=server_address)
     82 venv.obs_mode = obs_mode
     84 if "robot_seg" in obs_mode:

File ~/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py:435, in RGBDVecEnv.__init__(self, *args, **kwargs)
    434 def __init__(self, *args, **kwargs):
--> 435     super().__init__(*args, **kwargs)
    437     from mani_skill2.utils.wrappers.observation import RGBDObservationWrapper
    439     RGBDObservationWrapper.update_observation_space(self.observation_space)

File ~/Documents/RobotLearning/ManiSkill2/mani_skill2/vector/vec_env.py:204, in VecEnv.__init__(self, env_fns, start_method, server_address, server_kwargs)
    202     remote.send(("handshake", None))
    203 for remote in self.remotes:
--> 204     remote.recv()
    206 # Infer texture names
    207 texture_names = set()

File ~/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/connection.py:250, in _ConnectionBase.recv(self)
    248 self._check_closed()
    249 self._check_readable()
--> 250 buf = self._recv_bytes()
    251 return _ForkingPickler.loads(buf.getbuffer())

File ~/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/connection.py:414, in Connection._recv_bytes(self, maxsize)
    413 def _recv_bytes(self, maxsize=None):
--> 414     buf = self._recv(4)
    415     size, = struct.unpack("!i", buf.getvalue())
    416     if size == -1:

File ~/miniconda3/envs/robotlearning38/lib/python3.8/multiprocessing/connection.py:379, in Connection._recv(self, size, read)
    377 remaining = size
    378 while remaining > 0:
--> 379     chunk = read(handle, remaining)
    380     n = len(chunk)
    381     if n == 0:

ConnectionResetError: [Errno 104] Connection reset by peer


from maniskill.

megatran avatar megatran commented on August 23, 2024

after further debugging, I wonder if the opencabinet env is the culprit,

When I default to the example "LiftCube-v0", Colab, GPU server, and local machine seem to run fine!

num_envs = 2 # you can increases this and decrease the n_steps parameter if you have more cores to speed up training
env_id = "LiftCube-v0"
obs_mode = "state"
control_mode = "pd_ee_delta_pose"
reward_mode = "dense"

from maniskill.

StoneT2000 avatar StoneT2000 commented on August 23, 2024

Could you try using a sparse reward setting for the failing envs?

from maniskill.

StoneT2000 avatar StoneT2000 commented on August 23, 2024

Closing in favor of https://github.com/haosulab/ManiSkill2/issues/88

from maniskill.

StoneT2000 avatar StoneT2000 commented on August 23, 2024

Current solution: https://github.com/haosulab/ManiSkill2/issues/88#issuecomment-1532194498

from maniskill.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.