Git Product home page Git Product logo

rlmeta's Introduction

RLMeta

rlmeta - a flexible lightweight research framework for Distributed Reinforcement Learning based on PyTorch and moolib

Installation

To build from source, please install PyTorch first, and then run the commands below.

$ git clone https://github.com/facebookresearch/rlmeta
$ cd rlmeta
$ git submodule sync && git submodule update --init --recursive
$ pip install -e .

Run an Example

To run the example for Atari Pong game with PPO algorithm:

$ cd examples/atari/ppo
$ python atari_ppo.py env.game="Pong" num_epochs=20

We are using hydra to define configs for trainining jobs. The configs are defined in

./conf/conf_ppo.yaml

The logs and checkpoints will be automatically saved to

./outputs/{YYYY-mm-dd}/{HH:MM:SS}/

After training, we can draw the training curve by run

$ python ../../plot.py --log_file=./outputs/{YYYY-mm-dd}/{HH:MM:SS}/atari_ppo.log --fig_file=./atari_ppo.png --xkey=time

One example of the training curve is shown below.

atari_ppo

License

rlmeta is licensed under the MIT License. See LICENSE for details.

rlmeta's People

Contributors

bcui19 avatar entilzha avatar xiaomengy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rlmeta's Issues

Do we have a docker file for rlmeta?

Hi team,

It is very difficult to install the rlmeta due to the dependent packages like moolib are not easy to be installed? Do we have a docker file for trying out rlmeta?

Switch to new OpenAI Gym step API

The new version of OpenAI Gym uses a new step API which returns (observations, reward, termination, truncation, info) instead of (observations, reward, done, info). We have to make the wrappers to support this.

Track this progress in this issue.

How to sample partial trajectories?

Many value estimation methods relies on sub-sequences of a trajectory, (i.e. retrace, gae, n-step, lambda-returns). How can this be achieved with current samplers? A simple workaround would be to use a clever idx for each sample and use __get__ to extract the sub-sequence one element at a time, but I believe it might impact performances.

Other ideas? Otherwise how can this be implemented in the c++ code?

Add namespace or identifier for Remotable instance

Currently when we register remote method in a Remotable instance, we cannot distinguish the different remote method calls if there are multiple Remotable instances contains the same method name. We have to add a namespace or identifier field in Remotable class.

Currently we can start different servers to handle this which is not very convenient.

Clang Link error

If using Clang instead of gcc, the linkage will fail on moolib and rlmeta during pip install.

Pip installation fails in virtual env and SIGILL on DGX machines

it seems that pip install -e . does prepare the proper directories but does not include the built package. We solved by adding:

+        include_package_data=False,
+        packages=find_packages(include=['rlmeta', 'rlmeta.*']),

here:

ext_modules=[CMakeExtension("rlmeta", "./rlmeta")],

Nit: It might be useful to provide an easy way to pass a cuda/cudnn path to cmake, maybe something like DCUDNN_LIBRARY_PATH=os.einviron.get("CUDA_LIBRARY_PATH, "")

Finally the flag --march=native might cause some issues especially for HPC. We removed it for our cluster and managed to reliably train on different machines.

Add ProcessManager to maintain processes.

Currently the processes are created directly in Server and Loop. It is very common that there are some zombie processes left when the main process terminates. It may be better to have a ProcessManager to manage the processes on a single node.

Open a tracking issue here for this feature request.

Moolib Backend Issues

Recently there are several issues from moolib backend.

  1. Based the observation of facebookresearch/moolib#36, there is a performance regression in moolib.
  2. There are several installation issues in moolib.

Based on this we are thinking about building another backend not using moolib. Open this issue to track the progress.

PR for gRPC backend: #63

m_server::push time out and m_server::act time out

  • I was trying to execute the example program atari_ppo.py on the following machine:
    Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
    32GB RAM
    GTX 1080 with 8G RAM
    Ubuntu 16.04
    cuda 10.2
    ==
    I have edited my configuration file conf_ppo.yaml to adapt to reduce the resource usage
m_server_name: "m_server"
m_server_addr: "127.0.0.1:4411"

r_server_name: "r_server"
r_server_addr: "127.0.0.1:4412"

c_server_name: "c_server"
c_server_addr: "127.0.0.1:4413"

train_device: "cuda:0"
infer_device: "cuda:0"

timeout: 180

env: "PongNoFrameskip-v4"
max_episode_steps: 2700

num_train_rollouts: 1 
num_train_workers: 1

num_eval_rollouts: 1
num_eval_workers: 1

replay_buffer_size: 1024 
prefetch: 2

batch_size: 32
lr: 3e-4
push_every_n_steps: 50

num_epochs: 1000
steps_per_epoch: 3000

num_eval_episodes: 20

train_seed: 123
eval_seed: 456

Here is what I got:

[2022-01-18 18:34:54,797][root][INFO] - {'m_server_name': 'm_server', 'm_server_addr': '127.0.0.1:4411', 'r_server_name': 'r_server', 'r_server_addr': '127.0.0.1:4412', 'c_server_name': 'c_server', 'c_server_addr': '127.0.0.1:4413', 'train_device': 'cuda:0', 'infer_device': 'cuda:0', 'env': 'PongNoFrameskip-v4', 'max_episode_steps': 2700, 'num_train_rollouts': 1, 'num_train_workers': 1, 'num_eval_rollouts': 1, 'num_eval_workers': 1, 'replay_buffer_size': 1024, 'prefetch': 2, 'batch_size': 8, 'lr': 0.0003, 'push_every_n_steps': 100, 'num_epochs': 20, 'steps_per_epoch': 300, 'num_eval_episodes': 20, 'train_seed': 123, 'eval_seed': 456}
[2022-01-18 18:35:08,193][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:09,194][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:10,196][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:11,198][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:12,220][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:13,222][root][INFO] - Warming up replay buffer: [  894 / 1024 ]
[2022-01-18 18:35:14,228][root][INFO] - Warming up replay buffer: [  894 / 1024 ]
[2022-01-18 18:35:15,229][root][INFO] - Warming up replay buffer: [  894 / 1024 ]
[2022-01-18 18:35:16,231][root][INFO] - Warming up replay buffer: [ 1024 / 1024 ]
Exception in callback handle_task_exception(<Task finishe...) timed out')>) at /media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py:11
handle: <Handle handle_task_exception(<Task finishe...) timed out')>) at /media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py:11>
Traceback (most recent call last):
  File "/home/ml2558/miniconda3/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py", line 17, in handle_task_exception
    raise e
  File "/media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py", line 13, in handle_task_exception
    task.result()
  File "/media/research/ml2558/rlmeta/rlmeta/core/loop.py", line 161, in _run_loop
    stats = await self._run_episode(env, agent, index)
  File "/media/research/ml2558/rlmeta/rlmeta/core/loop.py", line 182, in _run_episode
    action = await agent.async_act(timestep)
  File "/media/research/ml2558/rlmeta/rlmeta/agents/ppo/ppo_agent.py", line 78, in async_act
    action, logpi, v = await self.model.async_act(
RuntimeError: Call (m_server::act) timed out
Error executing job with overrides: ['env=PongNoFrameskip-v4', 'num_epochs=20']
Traceback (most recent call last):
  File "/media/research/ml2558/rlmeta/examples/atari/ppo/atari_ppo.py", line 96, in main
    stats = agent.train(cfg.steps_per_epoch)
  File "/media/research/ml2558/rlmeta/rlmeta/agents/ppo/ppo_agent.py", line 139, in train
    self.model.push()
  File "/media/research/ml2558/rlmeta/rlmeta/core/model.py", line 69, in push
    self.client.sync(self.server_name, "push", state_dict)
RuntimeError: Call (m_server::<unknown>) timed out

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I tried to modify the timeout but seems with the same error. Any hint on how to resolve this?

TensorCircularBuffer with capacity larger of 1mln fails

Replay buffer of capacity of 1mln tries to allocate 846.72 gb. Steps to reproduce:

from rlmeta.storage import TensorCircularBuffer
import torch

rb = TensorCircularBuffer(capacity=int(1e6))
rb.append(torch.randn(10, 3, 84, 84))

Log:

RuntimeError: [enforce fail at alloc_cpu.cpp:66] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 846720000000 bytes. Error code 12 (Cannot allocate memory)
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x55 (0x7fd5b71980c5 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::alloc_cpu(unsigned long) + 0x7ac (0x7fd5b71894cc in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x23bc3 (0x7fd5b7176bc3 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #3: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) + 0x7bf (0x7fd5e04a5b2f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::detail::empty_cpu(c10::ArrayRef<long>, c10::ScalarType, bool, c10::optional<c10::MemoryFormat>) + 0x40 (0x7fd5e04a64a0 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #5: at::detail::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x34 (0x7fd5e04a64f4 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::native::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1f (0x7fd5e09b826f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x24f700b (0x7fd5e122a00b in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0xe3 (0x7fd5e0f75653 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x24d200f (0x7fd5e120500f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::_ops::empty_memory_format::call(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1b7 (0x7fd5e0fb3077 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x4586c (0x7fd5b5ba886c in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #12: <unknown function> + 0x49700 (0x7fd5b5bac700 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #13: <unknown function> + 0x4a0c0 (0x7fd5b5bad0c0 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #14: <unknown function> + 0x1dd0f (0x7fd5b5b80d0f in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>
frame #30: <unknown function> + 0x3feb0 (0x7fd65cacbeb0 in /lib64/libc.so.6)
frame #31: __libc_start_main + 0x80 (0x7fd65cacbf60 in /lib64/libc.so.6)

Replay buffer crashes after being cleared

Minimal example:

import torch
from _rlmeta_extension import UniformSampler
from rlmeta.core.replay_buffer import ReplayBuffer
from rlmeta.storage import TensorCircularBuffer

replay_buffer = ReplayBuffer(TensorCircularBuffer(12), UniformSampler())

while True:
    for t in torch.randn(size=(12,2)).chunk(12,dim=0):
        replay_buffer.append(t)
        replay_buffer.sample(12)
    replay_buffer.clear()

Stack trace:

RuntimeError: output with shape [2] doesn't match the broadcast shape [1, 2]
Exception raised from mark_resize_outputs at ../aten/src/ATen/TensorIterator.cpp:1181 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7fd72c9a220e in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7fd72c97d5e8 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #2: at::TensorIteratorBase::mark_resize_outputs(at::TensorIteratorConfig const&) + 0x241 (0x7fd755cf6301 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x64 (0x7fd755cf6e54 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x19d4f8c (0x7fd755f11f8c in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #5: at::native::copy_(at::Tensor&, at::Tensor const&, bool) + 0x62 (0x7fd755f12ec2 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::_ops::copy_::redispatch(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) + 0x75 (0x7fd756886555 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x46e94f5 (0x7fd758c264f5 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::copy_::redispatch(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) + 0x75 (0x7fd756886555 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x46ea6ad (0x7fd758c276ad in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) + 0x16e (0x7fd7568cdbce in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x495df (0x7fd7024265df in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #12: <unknown function> + 0x4a0c0 (0x7fd7024270c0 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #13: <unknown function> + 0x1dd0f (0x7fd7023fad0f in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>
frame #31: <unknown function> + 0x3feb0 (0x7fd7aa936eb0 in /lib64/libc.so.6)
frame #32: __libc_start_main + 0x80 (0x7fd7aa936f60 in /lib64/libc.so.6)
frame #33: _start + 0x25 (0x5649803a1095 in /home/d3sm0/.venvs/torch_env/bin/python)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.