facebookresearch / rlmeta Goto Github PK

RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

License: MIT License

CMake 0.67% Python 60.96% C++ 38.37%

rlmeta's Introduction

RLMeta

rlmeta - a flexible lightweight research framework for Distributed Reinforcement Learning based on PyTorch and moolib

Installation

To build from source, please install PyTorch first, and then run the commands below.

$ git clone https://github.com/facebookresearch/rlmeta
$ cd rlmeta
$ git submodule sync && git submodule update --init --recursive
$ pip install -e .

Run an Example

To run the example for Atari Pong game with PPO algorithm:

$ cd examples/atari/ppo
$ python atari_ppo.py env.game="Pong" num_epochs=20

We are using hydra to define configs for trainining jobs. The configs are defined in

./conf/conf_ppo.yaml

The logs and checkpoints will be automatically saved to

./outputs/{YYYY-mm-dd}/{HH:MM:SS}/

After training, we can draw the training curve by run

$ python ../../plot.py --log_file=./outputs/{YYYY-mm-dd}/{HH:MM:SS}/atari_ppo.log --fig_file=./atari_ppo.png --xkey=time

One example of the training curve is shown below.

License

rlmeta is licensed under the MIT License. See LICENSE for details.

rlmeta's People

Contributors

Stargazers

Watchers

rlmeta's Issues

Do we have a docker file for rlmeta?

Hi team,

It is very difficult to install the rlmeta due to the dependent packages like moolib are not easy to be installed? Do we have a docker file for trying out rlmeta?

Switch to new OpenAI Gym step API

The new version of OpenAI Gym uses a new step API which returns (observations, reward, termination, truncation, info) instead of (observations, reward, done, info). We have to make the wrappers to support this.

Track this progress in this issue.

How to sample partial trajectories?

Many value estimation methods relies on sub-sequences of a trajectory, (i.e. retrace, gae, n-step, lambda-returns). How can this be achieved with current samplers? A simple workaround would be to use a clever idx for each sample and use __get__ to extract the sub-sequence one element at a time, but I believe it might impact performances.

Other ideas? Otherwise how can this be implemented in the c++ code?

Add namespace or identifier for Remotable instance

Currently when we register remote method in a Remotable instance, we cannot distinguish the different remote method calls if there are multiple Remotable instances contains the same method name. We have to add a namespace or identifier field in Remotable class.

Currently we can start different servers to handle this which is not very convenient.

Clang Link error

If using Clang instead of gcc, the linkage will fail on moolib and rlmeta during pip install.

Longer-term and relation to other RL libraries under Meta

Hi, excited to see this work on distributed RL, building off moolib (and TorchBeast originally). I'm wondering what the longer-term direction of this project is?

Will functionality be merged into TorchRL (which mentions an upcoming IMPALA implementation)? https://github.com/facebookresearch/rl#upcoming-features

Is moolib still being maintained? facebookresearch/moolib#32 (comment)

There are so many RL libraries these days.

Pip installation fails in virtual env and SIGILL on DGX machines

it seems that pip install -e . does prepare the proper directories but does not include the built package. We solved by adding:

+        include_package_data=False,
+        packages=find_packages(include=['rlmeta', 'rlmeta.*']),

here:

rlmeta/setup.py

Line 87 in c43d0f1

ext_modules=[CMakeExtension("rlmeta", "./rlmeta")],

Nit: It might be useful to provide an easy way to pass a cuda/cudnn path to cmake, maybe something like DCUDNN_LIBRARY_PATH=os.einviron.get("CUDA_LIBRARY_PATH, "")

Finally the flag --march=native might cause some issues especially for HPC. We removed it for our cluster and managed to reliably train on different machines.

Add ProcessManager to maintain processes.

Currently the processes are created directly in Server and Loop. It is very common that there are some zombie processes left when the main process terminates. It may be better to have a ProcessManager to manage the processes on a single node.

Open a tracking issue here for this feature request.

Moolib Backend Issues

Recently there are several issues from moolib backend.

Based the observation of facebookresearch/moolib#36, there is a performance regression in moolib.
There are several installation issues in moolib.

Based on this we are thinking about building another backend not using moolib. Open this issue to track the progress.

PR for gRPC backend: #63

m_server::push time out and m_server::act time out

I was trying to execute the example program atari_ppo.py on the following machine:
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
32GB RAM
GTX 1080 with 8G RAM
Ubuntu 16.04
cuda 10.2
==
I have edited my configuration file conf_ppo.yaml to adapt to reduce the resource usage

m_server_name: "m_server"
m_server_addr: "127.0.0.1:4411"

r_server_name: "r_server"
r_server_addr: "127.0.0.1:4412"

c_server_name: "c_server"
c_server_addr: "127.0.0.1:4413"

train_device: "cuda:0"
infer_device: "cuda:0"

timeout: 180

env: "PongNoFrameskip-v4"
max_episode_steps: 2700

num_train_rollouts: 1 
num_train_workers: 1

num_eval_rollouts: 1
num_eval_workers: 1

replay_buffer_size: 1024 
prefetch: 2

batch_size: 32
lr: 3e-4
push_every_n_steps: 50

num_epochs: 1000
steps_per_epoch: 3000

num_eval_episodes: 20

train_seed: 123
eval_seed: 456

Here is what I got:

[2022-01-18 18:34:54,797][root][INFO] - {'m_server_name': 'm_server', 'm_server_addr': '127.0.0.1:4411', 'r_server_name': 'r_server', 'r_server_addr': '127.0.0.1:4412', 'c_server_name': 'c_server', 'c_server_addr': '127.0.0.1:4413', 'train_device': 'cuda:0', 'infer_device': 'cuda:0', 'env': 'PongNoFrameskip-v4', 'max_episode_steps': 2700, 'num_train_rollouts': 1, 'num_train_workers': 1, 'num_eval_rollouts': 1, 'num_eval_workers': 1, 'replay_buffer_size': 1024, 'prefetch': 2, 'batch_size': 8, 'lr': 0.0003, 'push_every_n_steps': 100, 'num_epochs': 20, 'steps_per_epoch': 300, 'num_eval_episodes': 20, 'train_seed': 123, 'eval_seed': 456}
[2022-01-18 18:35:08,193][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:09,194][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:10,196][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:11,198][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:12,220][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:13,222][root][INFO] - Warming up replay buffer: [  894 / 1024 ]
[2022-01-18 18:35:14,228][root][INFO] - Warming up replay buffer: [  894 / 1024 ]
[2022-01-18 18:35:15,229][root][INFO] - Warming up replay buffer: [  894 / 1024 ]
[2022-01-18 18:35:16,231][root][INFO] - Warming up replay buffer: [ 1024 / 1024 ]
Exception in callback handle_task_exception(<Task finishe...) timed out')>) at /media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py:11
handle: <Handle handle_task_exception(<Task finishe...) timed out')>) at /media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py:11>
Traceback (most recent call last):
  File "/home/ml2558/miniconda3/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py", line 17, in handle_task_exception
    raise e
  File "/media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py", line 13, in handle_task_exception
    task.result()
  File "/media/research/ml2558/rlmeta/rlmeta/core/loop.py", line 161, in _run_loop
    stats = await self._run_episode(env, agent, index)
  File "/media/research/ml2558/rlmeta/rlmeta/core/loop.py", line 182, in _run_episode
    action = await agent.async_act(timestep)
  File "/media/research/ml2558/rlmeta/rlmeta/agents/ppo/ppo_agent.py", line 78, in async_act
    action, logpi, v = await self.model.async_act(
RuntimeError: Call (m_server::act) timed out
Error executing job with overrides: ['env=PongNoFrameskip-v4', 'num_epochs=20']
Traceback (most recent call last):
  File "/media/research/ml2558/rlmeta/examples/atari/ppo/atari_ppo.py", line 96, in main
    stats = agent.train(cfg.steps_per_epoch)
  File "/media/research/ml2558/rlmeta/rlmeta/agents/ppo/ppo_agent.py", line 139, in train
    self.model.push()
  File "/media/research/ml2558/rlmeta/rlmeta/core/model.py", line 69, in push
    self.client.sync(self.server_name, "push", state_dict)
RuntimeError: Call (m_server::<unknown>) timed out

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I tried to modify the timeout but seems with the same error. Any hint on how to resolve this?

TensorCircularBuffer with capacity larger of 1mln fails

Replay buffer of capacity of 1mln tries to allocate 846.72 gb. Steps to reproduce:

from rlmeta.storage import TensorCircularBuffer
import torch

rb = TensorCircularBuffer(capacity=int(1e6))
rb.append(torch.randn(10, 3, 84, 84))

Log:

RuntimeError: [enforce fail at alloc_cpu.cpp:66] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 846720000000 bytes. Error code 12 (Cannot allocate memory)
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x55 (0x7fd5b71980c5 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::alloc_cpu(unsigned long) + 0x7ac (0x7fd5b71894cc in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x23bc3 (0x7fd5b7176bc3 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #3: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) + 0x7bf (0x7fd5e04a5b2f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::detail::empty_cpu(c10::ArrayRef<long>, c10::ScalarType, bool, c10::optional<c10::MemoryFormat>) + 0x40 (0x7fd5e04a64a0 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #5: at::detail::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x34 (0x7fd5e04a64f4 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::native::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1f (0x7fd5e09b826f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x24f700b (0x7fd5e122a00b in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0xe3 (0x7fd5e0f75653 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x24d200f (0x7fd5e120500f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::_ops::empty_memory_format::call(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1b7 (0x7fd5e0fb3077 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x4586c (0x7fd5b5ba886c in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #12: <unknown function> + 0x49700 (0x7fd5b5bac700 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #13: <unknown function> + 0x4a0c0 (0x7fd5b5bad0c0 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #14: <unknown function> + 0x1dd0f (0x7fd5b5b80d0f in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>
frame #30: <unknown function> + 0x3feb0 (0x7fd65cacbeb0 in /lib64/libc.so.6)
frame #31: __libc_start_main + 0x80 (0x7fd65cacbf60 in /lib64/libc.so.6)

[Documentation] Tracking documentation site progress

The documentation site is under construction. We will track the progress here.

Replay buffer crashes after being cleared

Minimal example:

import torch
from _rlmeta_extension import UniformSampler
from rlmeta.core.replay_buffer import ReplayBuffer
from rlmeta.storage import TensorCircularBuffer

replay_buffer = ReplayBuffer(TensorCircularBuffer(12), UniformSampler())

while True:
    for t in torch.randn(size=(12,2)).chunk(12,dim=0):
        replay_buffer.append(t)
        replay_buffer.sample(12)
    replay_buffer.clear()

Stack trace:

RuntimeError: output with shape [2] doesn't match the broadcast shape [1, 2]
Exception raised from mark_resize_outputs at ../aten/src/ATen/TensorIterator.cpp:1181 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7fd72c9a220e in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7fd72c97d5e8 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #2: at::TensorIteratorBase::mark_resize_outputs(at::TensorIteratorConfig const&) + 0x241 (0x7fd755cf6301 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x64 (0x7fd755cf6e54 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x19d4f8c (0x7fd755f11f8c in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #5: at::native::copy_(at::Tensor&, at::Tensor const&, bool) + 0x62 (0x7fd755f12ec2 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::_ops::copy_::redispatch(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) + 0x75 (0x7fd756886555 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x46e94f5 (0x7fd758c264f5 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::copy_::redispatch(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) + 0x75 (0x7fd756886555 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x46ea6ad (0x7fd758c276ad in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) + 0x16e (0x7fd7568cdbce in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x495df (0x7fd7024265df in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #12: <unknown function> + 0x4a0c0 (0x7fd7024270c0 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #13: <unknown function> + 0x1dd0f (0x7fd7023fad0f in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>
frame #31: <unknown function> + 0x3feb0 (0x7fd7aa936eb0 in /lib64/libc.so.6)
frame #32: __libc_start_main + 0x80 (0x7fd7aa936f60 in /lib64/libc.so.6)
frame #33: _start + 0x25 (0x5649803a1095 in /home/d3sm0/.venvs/torch_env/bin/python)