nvlabs / diffrl Goto Github PK

View Code? Open in Web Editor NEW

248.0 9.0 40.0 25.01 MB

[ICLR 2022] Accelerated Policy Learning with Parallel Differentiable Simulation

Home Page: https://short-horizon-actor-critic.github.io/

License: Other

Python 90.52% C++ 3.77% MATLAB 0.22% C 5.31% Makefile 0.08% Batchfile 0.09% Shell 0.01%

reinforcement-learning robotic-control differentiable-simulation

diffrl's Introduction

SHAC

This repository contains the implementation for the paper Accelerated Policy Learning with Parallel Differentiable Simulation (ICLR 2022).

In this paper, we present a GPU-based differentiable simulation and propose a policy learning method named SHAC leveraging the developed differentiable simulation. We provide a comprehensive benchmark set for policy learning with differentiable simulation. The benchmark set contains six robotic control problems for now as shown in the figure below.

Installation

git clone https://github.com/NVlabs/DiffRL.git --recursive
The code has been tested on
- Operating System: Ubuntu 16.04, 18.04, 20.04, 21.10, 22.04
- Python Version: 3.7, 3.8
- GPU: TITAN X, RTX 1080, RTX 2080, RTX 3080, RTX 3090, RTX 3090 Ti

Prerequisites

In the project folder, create a virtual environment in Anaconda:
```
conda env create -f diffrl_conda.yml
conda activate shac
```
dflex
```
cd dflex
pip install -e .
```
rl_games, forked from rl-games (used for PPO and SAC training):
```
cd externals/rl_games
pip install -e .
```
Install an older version of protobuf required for TensorboardX:
```
pip install protobuf==3.20.0
```

Test Examples

A test example can be found in the examples folder.

python test_env.py --env AntEnv

If the console outputs Finish Successfully in the last line, the code installation succeeds.

Training

Running the following commands in examples folder allows to train Ant with SHAC.

python train_shac.py --cfg ./cfg/shac/ant.yaml --logdir ./logs/Ant/shac

We also provide a one-line script in the examples/train_script.sh folder to replicate the results reported in the paper for both our method and for baseline method. The results might slightly differ from the paper due to the randomness of the cuda and different Operating System/GPU/Python versions. The plot reported in paper is produced with TITAN X on Ubuntu 16.04.

SHAC (Our Method)

For example, running the following commands in examples folder allows to train Ant and SNU Humanoid (Humanoid MTU in the paper) environments with SHAC respectively for 5 individual seeds.

python train_script.py --env Ant --algo shac --num-seeds 5

python train_script.py --env SNUHumanoid --algo shac --num-seeds 5

Baseline Algorithms

For example, running the following commands in examples folder allows to train Ant environment with PPO implemented in RL_games for 5 individual seeds,

python train_script.py --env Ant --algo ppo --num-seeds 5

Testing

To test the trained policy, you can input the policy checkpoint into the training script and use a --play flag to indicate it is for testing. For example, the following command allows to test a trained policy (assume the policy is located in logs/Ant/shac/policy.pt)

python train_shac.py --cfg ./cfg/shac/ant.yaml --checkpoint ./logs/Ant/shac/policy.pt --play [--render]

The --render flag indicates whether to export the video of the task execution. If does, the exported video is encoded in .usd format, and stored in the examples/output folder. To visualize the exported .usd file, refer to USD at NVIDIA.

Citation

If you find our paper or code is useful, please consider citing:

  @inproceedings{xu2021accelerated,
    title={Accelerated Policy Learning with Parallel Differentiable Simulation},
    author={Xu, Jie and Makoviychuk, Viktor and Narang, Yashraj and Ramos, Fabio and Matusik, Wojciech and Garg, Animesh and Macklin, Miles},
    booktitle={International Conference on Learning Representations},
    year={2021}
  }

diffrl's People

Contributors

Stargazers

Watchers

diffrl's Issues

How to set collision between non-connected dynamic bodies?

here on the video , non of the agents have proper collsion between their legs in the default physics engine.
How can I set self collision between non-connected dynamic bodies?

sample video
https://youtu.be/T63vY8ZCFxo

Why only lower half of snu_humanoid is used?

Is there some issues with convergence? Has someone tried to train full snu_humanoid?
At first glance it seems that unnatural running style could be caused by different mass distribution and other consequences of absence of upper torso, head and hands
Its not issue actually, I just very curious :-)

Anyway thanks for really great job done, its actually amazing!

Obtain gradient information explicitly

May I ask how to explicitly obtain gradient information in the environment, such as obtaining the gradient of s_{t+1} to s_t? Or is the information obtained using torch.autograd accurate? Thank you!

What is MM_caching_frequency?

Hello, thank you for open-sourcing all this very interesting work. I don't understand what "MM_caching_frequency" is and why it changes throughout the configuration files.

Error when python test_env.py --env AntEnv

Excuse me, I met such problem when I try the command python test_env.py --env AntEnv in the folder examples as the guide
The version of my Pytorch is 1.11.0, cuda is 12.1
Is there anything wrong with my system? I'll appreciate it a lot if you can help me with this problem.

Rebuilding kernels
Detected CUDA files, patching ldflags
Emitting ninja build file /home/frank/DiffRL/dflex/dflex/kernels/build.ninja...
Building extension module kernels...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda-12.1/bin/nvcc  -DTORCH_EXTENSION_NAME=kernels -DTORCH_API_INCLUDE_EXTENSION_H -
DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -
I/home/frank/DiffRL/dflex/dflex -isystem /home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/TH -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem 
/home/frank/anaconda3/envs/shac/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -
D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-
relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-
fPIC' -gencode=arch=compute_35,code=compute_35 -std=c++14 -c /home/frank/DiffRL/dflex/dflex/kernels/cuda.cu -o cuda.cuda.o
FAILED: cuda.cuda.o
/usr/local/cuda-12.1/bin/nvcc  -DTORCH_EXTENSION_NAME=kernels -DTORCH_API_INCLUDE_EXTENSION_H -
DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -
I/home/frank/DiffRL/dflex/dflex -isystem /home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/TH -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem 
/home/frank/anaconda3/envs/shac/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -
D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-
relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-
fPIC' -gencode=arch=compute_35,code=compute_35 -std=c++14 -c /home/frank/DiffRL/dflex/dflex/kernels/cuda.cu -o cuda.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_35'
[2/3] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=kernels -DTORCH_API_INCLUDE_EXTENSION_H -
DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -
I/home/frank/DiffRL/dflex/dflex -isystem /home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/TH -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem 
/home/frank/anaconda3/envs/shac/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -Z -O2 -DNDEBUG -c 
/home/frank/DiffRL/dflex/dflex/kernels/main.cpp -o main.o
/home/frank/DiffRL/dflex/dflex/kernels/main.cpp: In function ‘df::float3 box_sdf_grad_cpu_func(df::float3, df::float3)’:
/home/frank/DiffRL/dflex/dflex/kernels/main.cpp:1051:47: warning: control reaches end of non-void function [-Wreturn-type]
 1051 |     var_58 = df::select(var_56, var_53, var_57);
          |
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1740, in _run_ninja_build
    subprocess.run(
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test_env.py", line 17, in <module>                                                                                                                                                       
    import envs
  File "/home/frank/DiffRL/envs/__init__.py", line 8, in <module>                                                                                                                                        
    from envs.dflex_env import DFlexEnv                                                                                                                                                        
  File "/home/frank/DiffRL/envs/dflex_env.py", line 15, in <module>                                                                                                                              
    import dflex as df                                                                                                                                                                         
  File "/home/frank/DiffRL/dflex/dflex/__init__.py", line 15, in <module>                                                                                                                            
    kernel_init()                                                                                                                                                                              
  File "/home/frank/DiffRL/dflex/dflex/sim.py", line 67, in kernel_init                                                                                                                          
    kernels = df.compile()                                                                                                                                                                     
  File "/home/frank/DiffRL/dflex/dflex/adjoint.py", line 1865, in compile                                                                                                                        
    module = torch.utils.cpp_extension.load_inline('kernels',                                                                                                                                  
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1293, in load_inline                                                                     
    return _jit_compile(                                                                                                                                                                       
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1357, in _jit_compile                                                                    
    _write_ninja_file_and_build_library(                                                                                                                                                       
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1469, in _write_ninja_file_and_build_library
    _run_ninja_build(                                                                                                                                                                          
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build                                                                
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'kernels'

Backward is not reentrant Error

I tried to verify the gradients of env.step using torch.autograd.gradcheck. My current environment is ant. I get the following error:

raise GradcheckError('Backward is not reentrant, i.e., running backward with '
torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient.The tolerance for nondeterminism was 0.001.

Here is my code for the gradient test:

def test_grad(actions):

env.step(actions)
state = env.state
joint_q = state.joint_q
joint_qd = state.joint_qd
loss = torch.norm(joint_q)+torch.norm(joint_qd)

return loss

inputs = (actions)
test = torch.autograd.gradcheck(test_grad,inputs,nondet_tol=1e-3)

Is this behavior as expected? Thanks so much!

what should I do to set z-axis up

Now that the released code is y-axis up, I want to set z-axis up in simulation. I set contact normal vector to (0,0,1) rather than (0,1,0), and gravity as (0, 0, -9.8). Also, I replace the expression vyny with vznz, vy with vz, ny with nz, etc, mainly in files sim.py and model.py. However, the result dynamics is not the same as the original setting. What more should I do?

Error when --render the humanoid

Bacis info my system is Ubuntu 20.8, GPU 3080, NVCC 11.6, gcc/g++ 7.5.0. Other setting is same as the env.

After I train the train_shac.py humanoid, I want to render it via ucd

My command is
python train_shac.py --cfg ./cfg/shac/humanoid.yaml --checkpoint ./logs/SNUHumanoid/shac/40/best_policy.pt --play --render
However, it cannot work for unexpected reason:

Using cached kernels
Setting seed: 0
~/anaconda3/envs/shac/lib/python3.8/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
28 27
Start joint_q: [0.0, 1.35, 0.0, -0.7071067811865475, -0.0, -0.0, 0.7071067811865476, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
~DiffRL/dflex/dflex/model.py:1687: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /opt/conda/conda-bld/pytorch_1646755903507/work/torch/csrc/utils/tensor_new.cpp:210.)
m.shape_transform = torch.tensor(transform_flatten_list(self.shape_transform), dtype=torch.float32, device=adapter)
num_act = 21
num_envs = 1
num_actions = 21
num_obs = 76
Sequential(
(0): Linear(in_features=76, out_features=256, bias=True)
(1): ELU(alpha=1.0)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(3): Linear(in_features=256, out_features=128, bias=True)
(4): ELU(alpha=1.0)
(5): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
(6): Linear(in_features=128, out_features=21, bias=True)
(7): Identity()
)
Parameter containing:
tensor([-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1.], device='cuda:0',
requires_grad=True)
Sequential(
(0): Linear(in_features=76, out_features=128, bias=True)
(1): ELU(alpha=1.0)
(2): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
(3): Linear(in_features=128, out_features=128, bias=True)
(4): ELU(alpha=1.0)
(5): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
(6): Linear(in_features=128, out_features=1, bias=True)
)
Traceback (most recent call last):
File "train_shac.py", line 114, in
traj_optimizer.play(cfg_train)
~/DiffRL/algorithms/shac.py", line 561, in play
self.run(cfg['params']['config']['player']['games_num'])
~/anaconda3/envs/shac/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
~/DiffRL/algorithms/shac.py", line 377, in run
mean_policy_loss, mean_policy_discounted_loss, mean_episode_length = self.evaluate_policy(num_games = num_games, deterministic = not self.stochastic_evaluation)
~/anaconda3/envs/shac/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
~/DiffRL/algorithms/shac.py", line 317, in evaluate_policy
obs = self.obs_rms.normalize(obs)
~/DiffRL/utils/running_mean_std.py", line 56, in normalize
result = (arr - self.mean) / torch.sqrt(self.var + 1e-5)
RuntimeError: The size of tensor a (76) must match the size of tensor b (53) at non-singleton dimension 1

Do you have any idea to fix that

self.name error in SNU_humanoid.py

after successifully traning snu_humainoid , testing with rendering option gives an error about
self.name do not exist in snu_humanoid.py in , 78 line

my solution is just removing 'self.name'
And everything works fine

#self.stage = Usd.Stage.CreateNew("outputs/" + self.name + "HumanoidSNU_Low_" + str(self.num_envs) + ".usd") self.stage = Usd.Stage.CreateNew("outputs/" + "dd" + "HumanoidSNU_Low_" + str(self.num_envs) + ".usd")

Brax implementation

Thought id mention; there is a brax implementation of SHAC here

I suppose its hard to compare directly since the envs are not identical, but if one of the original authors of SHAC can review it for apparent agreement with your algorithms as intended, thatd be super useful.

SHAC and policies for partial observability

I was wondering if you have made any attempt at combining SHAC with an LSTM or transformer policy, or some policy that effectively can reason about some history of states, rather than just the current one; as is desirable for instance when dealing with partial observability of the state.

While conceptually it does not sound too complicated, I know that getting the implementation details right can be tricky for something like PPO; and I was curious if you have attempted any such thing, and if so if there were any issues you have ran into?

[BUG] Initialization Velocities Scale with Distance from the Origin

@ViktorM

While writing a MuJoCo-Viewer Renderer for DiffRL, I noticed that initialization velocities appear to scale the farther out an actor is from the origin.

This behavior appears to occur in the original .usd's as well, so I don't believe it's an artifact of the visualization.

ExplodingVelocities.mp4

The early termination threshold also seems more sensitive far away from the origin.

I tested setting the stochastic initialization velocity to a constant and the velocities were still amplified far away from the origin, so I believe this is something deeper within the fundaments of dflex...

Example Request - Cartpole/Ant using Warp instead of dFlex

Hi, first of all, thanks a lot for your great piece of software! We are really excited to apply it in our research in surgical simulation. However, we ran into problems while trying to switch dFlex to its successor - Warp.

Would it be possible to provide us with a minimal reference Cartpole and/or Ant SHAC example using Warp? That would be very helpful not only for our group but also for other users.

Thank you

Add box obstacles

I want to try and add box obstacles to the environment, e.g. to create stairs that Ant or Cheetah has to scale. Is there a way to do this, perhaps by editing the ground plane?

I see in sim.py that there are some functions that handle contact forces. I'm not sure which ones would be relevant to modify to achieve the above, if any.