rlworkgroup / garage Goto Github PK

A toolkit for reproducible reinforcement learning research.

License: MIT License

Python 89.49% Shell 0.79% Makefile 0.16% Dockerfile 0.22% Jupyter Notebook 9.33%

rl-algorithms reproducibility pytorch tensorflow

garage's Introduction

garage

garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementations built using that toolkit.

The toolkit provides wide range of modular tools for implementing RL algorithms, including:

Composable neural network models
Replay buffers
High-performance samplers
An expressive experiment definition interface
Tools for reproducibility (e.g. set a global random seed which all components respect)
Logging to many outputs, including TensorBoard
Reliable experiment checkpointing and resuming
Environment interfaces for many popular benchmark suites
Supporting for running garage in diverse environments, including always up-to-date Docker containers

See the latest documentation for getting started instructions and detailed APIs.

Installation

pip install --user garage

Examples

Starting from version v2020.10.0, garage comes packaged with examples. To get a list of examples, run:

garage examples

You can also run garage examples --help, or visit the documentation for even more details.

Join the Community

Join the garage-announce mailing list for infrequent updates (<1/mo.) on the status of the project and new releases.

Need some help? Want to ask garage is right for your project? Have a question which is not quite a bug and not quite a feature request?

Join the community Slack by filling out this Google Form.

Algorithms

The table below summarizes the algorithms available in garage.

Algorithm	Framework(s)
CEM	numpy
CMA-ES	numpy
REINFORCE (a.k.a. VPG)	PyTorch, TensorFlow
DDPG	PyTorch, TensorFlow
DQN	PyTorch, TensorFlow
DDQN	PyTorch, TensorFlow
ERWR	TensorFlow
NPO	TensorFlow
PPO	PyTorch, TensorFlow
REPS	TensorFlow
TD3	PyTorch, TensorFlow
TNPG	TensorFlow
TRPO	PyTorch, TensorFlow
MAML	PyTorch
RL2	TensorFlow
PEARL	PyTorch
SAC	PyTorch
MTSAC	PyTorch
MTPPO	PyTorch, TensorFlow
MTTRPO	PyTorch, TensorFlow
Task Embedding	TensorFlow
Behavioral Cloning	PyTorch

Supported Tools and Frameworks

garage requires Python 3.6+. If you need Python 3.5 support, the last garage release to support Python 3.5 was v2020.06.

The package is tested on Ubuntu 18.04. It is also known to run on Ubuntu 16.04, 18.04, and 20.04, and recent versions of macOS using Homebrew. Windows users can install garage via WSL, or by making use of the Docker containers.

We currently support PyTorch and TensorFlow for implementing the neural network portions of RL algorithms, and additions of new framework support are always welcome. PyTorch modules can be found in the package garage.torch and TensorFlow modules can be found in the package garage.tf. Algorithms which do not require neural networks are found in the package garage.np.

The package is available for download on PyPI, and we ensure that it installs successfully into environments defined using conda, Pipenv, and virtualenv.

Testing

The most important feature of garage is its comprehensive automated unit test and benchmarking suite, which helps ensure that the algorithms and modules in garage maintain state-of-the-art performance as the software changes.

Our testing strategy has three pillars:

Automation: We use continuous integration to test all modules and algorithms in garage before adding any change. The full installation and test suite is also run nightly, to detect regressions.
Acceptance Testing: Any commit which might change the performance of an algorithm is subjected to comprehensive benchmarks on the relevant algorithms before it is merged
Benchmarks and Monitoring: We benchmark the full suite of algorithms against their relevant benchmarks and widely-used implementations regularly, to detect regressions and improvements we may have missed.

Supported Releases

Release	Build Status	Last date of support
v2021.03		May 31st, 2021

Maintenance releases have a stable API and dependency tree, and receive bug fixes and critical improvements but not new features. We currently support each release for a window of 2 months.

Citing garage

If you use garage for academic research, please cite the repository using the following BibTeX entry. You should update the commit field with the commit or release tag your publication uses.

@misc{garage,
 author = {The garage contributors},
 title = {Garage: A toolkit for reproducible reinforcement learning research},
 year = {2019},
 publisher = {GitHub},
 journal = {GitHub repository},
 howpublished = {\url{https://github.com/rlworkgroup/garage}},
 commit = {be070842071f736eb24f28e4b902a9f144f5c97b}
}

Credits

The earliest code for garage was adopted from predecessor project called rllab. The garage project is grateful for the contributions of the original rllab authors, and hopes to continue advancing the state of reproducibility in RL research in the same spirit. garage has previously been supported by the Amazon Research Award "Watch, Practice, Learn, Do: Unsupervised Learning of Robust and Composable Robot Motion Skills by Fusing Expert Demonstrations with Robot Experience."

Made with ❤ at and

garage's People

Contributors

Stargazers

Watchers

Forkers

gntoni scapeqin wwxfromtju innerlee landoufulxf shikharbahl half-potato krzentner xucheng1 robotgradient ml-lab kairproject yuanzhaoyz ying-wen venutrue taochenosu henry-zhang-bohan hejia-zhang mwufi andrewliujian flyers dpduanpu shadiakiki1986 catherinesue seanhsieh vincentzhang kelvinson uitml tobyge edrya psxz wei-tianhao maxiaoba remosy mark-koren fybazs nikolausberl mburakg damonclifford kukanani paulshuva nish21 yanxg xht033 fanjups nttrungmt adrianbzg lywong92 seanmcrae arbenton wyjw splendor-kill kornbergfresnel lpetrich h3dema piraka9011 nhanph parachutel habibzadeh featuremachine zaneh1992 jonasnm zhangyx96 dgharsallah joomladigger lxmwust sicelukwanda huy-ha dosssman tfrance capri2014 kunbb collector-m megayeye maltimore akulashray xuchengguang qin1921 guyk1971 gwaxg dremartstud bqzhu922 dadacheng igor-krawczuk snazari rainwangphy hyyh28 wjssx aipachakutiqwan manik-hossain lightningtrade js-god mzy2240 waldow90 nerdstark21 prograguo ashutosh-adhikari andcelli adibellathur gitanshu

garage's Issues

Run existing unit tests in the CI

setup_linux.sh fails while looking for mujoco 1.5.0 on a clean system

This is probably related to ryanjulian/rllab#89, since dm_control uses mujoco 1.5.0. We may need to combine setup_mujoco.sh and setup_.sh

Imported from ryanjulian/rllab#90

tf/TD3

Incorporate the TD3 algorithm in garage.

Sawyer runtime support

Imported from ryanjulian/rllab#8

DQN for TensorFlow

Original paper:
https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf

OpenAI baselines implementation:
https://github.com/openai/baselines/tree/master/baselines/deepq

(There are many more resources on Google with DQN implementations in TF)

Sketch:

Provide an implementation in sandbox/rocky/tf/dqn.py
Add any needed primitives to rllab/ and sandbox/rocky/tf/
Provide a regression test (of the reward curve) against the openai/baselines implementation

Imported from ryanjulian/rllab#50

GitHub check for commit message guidelines

Imported from ryanjulian/rllab#107

Normalize batch shaping codebase-wide

Currently this is proxied by Policy.recurrent, but there are loss functions for non-recurrent policies which need fixed-length input trajectories/valid variables (i.e. any time you want to differentiate through the loss function).

Permanent fix for 'GLEW initalization error: Missing GL version' on Linux machines

Right now the temporary solution to this issue is prepend python examples/xxx.py with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-384/libGL.so; note that you must replace nvidia-384 to the one installed on your machine (use nvidia-smi to determine the driver version currently in use). Relevant comments are here and here, and this link is a wrapper written as a temporary fix.

However, the more permanent solution would require pre-loading without the use of LD_PRELOAD on the command line. See DeepMind's implementation as a starting point.

Imported from ryanjulian/rllab#117

Get rid of layers library in the TensorFlow tree

This is largely superseded by tf.nn, and it adds another unnecessary layer of complexity--it would be easier to understand if primitives were just written in pure TF.

Imported from ryanjulian/rllab#42

Cross-platform support for TF asynchronous plotting

The current implementation of async plotting uses Multithreading. However, using multiprocessing will throw segmentation for Mac OSX machines, but we want to prioritize multiprocessing for Linux machines. Therefore, write code that will support both implementations.

Fix mujoco env

I found an issue in mujoco_env. The MjSim of mujoco_py does not has an attribute of geom_margin, which is used by _get_full_obs(). This is caused by a pr that refactors rllab.mujoco_py to mujoco_py

There might be other issues imported this pr. So a regression test of mujoco envs is desired for this issue.

init() step for the environment interface

This is very useful for real robots. Is it supported by gym.Env?

Imported from ryanjulian/rllab#70

Add ros stack blocks and place blocks in bins env

Imported from ryanjulian/rllab#77

Bullet support

We would like to add support for the Bullet physics engine to rllab. Thankfully, the Bullet team have recently provided Python bindings in the form of pybullet, and even provides examples of how to implement the gym.Env interface (from OpenAI Gym) using pyBullet.

This task is to add pybullet to the rllab conda environment, and implement a class (similar to GymEnv, e.g. BulletEnv) which allows any rllab algorithm to learn against pybullet environments. You will also need to implement the plot interface, if pybullet does not already, which shows the user a 3D animation of the environment. Essentially, you should duplicate the experience of running one of the MuJoCo-based examples (e.g. trpo_swimmer.py), but using a Bullet environment instead. You should include examples (in examples/ and sandbox/rocky/tf/launchers/) of launcher scripts which use an algorithms (suggestion: TRPO) to train the KukaGymEnv environment.

This is conceptually the same as GymEnv, which allows rllab users to import any OpenAI Gym environment and learn against them. In fact, pybullet environments implement the Gym interface, so in theory we should be done as soon as we can import pybullet. In practice, our constructor for Gym environments only takes the string name (e.g. "Humanoid-v1") of a Gym environment, not the class of a Gym environment. The pybullet environments do not have string shortcuts because they are not part of the official Gym repository. Furthermore, we'd like to use other unofficial Gym environments in rllab, but it is currently difficult for the same reason.

So you might structure this task as two pull requests (1) adding pybullet to the conda environment and (2) Modifying GymEnv to support arbitrary environments which implement the gym.Env interface (attempted in ryanjulian/rllab#12).

Consider this a professional software engineering task, and provide a high-quality solution which does not break existing users, minimizes change, and is stable. Please always use PEP8 style in your code, and format it using YAPF (with the PEP8 setting). Submit your pull request against the integration branch of this repository.

Some notes:

You can find examples of how to launch rllab in examples and sandbox/rocky/tf/launchers. Note that everything must run using the run_experiment_lite wrapper.
rllab currently has two parallel implementation of the neural network portions of the library. The original is written in Theano and is found in rllab/. The tree sandbox/rocky/tf re-implements classes from the original tree using TensorFlow, and is backwards-compatible with the Theano tree. We are working towards using only one NN library soon, but for now your implementation needs to work in both trees.
rllab is an upstream dependency to many projects, so it is important we do not break the existing APIs. Adding to APIs is fine as long as there is a good reason.

Imported from ryanjulian/rllab#5

Add initialize interface for robot

Imported from ryanjulian/rllab#74

Add lint checks to CI

Options:

pylint
autopep8
flake8
flake8-bugbear
pyflakes
pycodestyle
mypy
pytype

See:
https://github.com/PyCQA

Important rules (running list):

PEP8 import styling

Imported from ryanjulian/rllab#100

Asynchronous TensorFlow plotting doesn't work on Mac OSX

Asynchronous plotting for TensorFlow works perfectly fine on Linux using threading.Thread, but Mac OS X will not display a window, even if threading.Thread is switched to multiprocessing.Process, which is how it was implemented using Theano.

Imported from ryanjulian/rllab#127

Create ray sampler

Right now rllab uses parallelism in an ad-hoc manner through multiprocessing library, mostly in the sampler. If we use a principled parallelism library (e.g. mpi, ray, others), we can probably clean up the code while avoiding tricky multiprocessing bugs in the future.

Imported from ryanjulian/rllab#82

Sawyer MuJoCo Support

Asynchronous plotting for TensorFlow

Task

Presently asynchronous plotting/3D rendering is not supported in the part of rllab based on TensorFlow (sandbox.rocky.tf), but it is supported in the rllab code which uses Theano.

This means that when you turn plotting on for a Theano training session, the plot does not block the training process. The TensorFlow implementation runs the rendering loop directly in the training algorithm (rather than a worker), so it blocks. This makes training using TensorFlow much slower than Theano when plotting is turned on (they are about the same without plotting).

TensorFlow's notion of a session makes this tricky. I'm not 100% sure that there is a solution. If you figure out that it is impossible, or requires rewriting large parts of the repository, email me with what you tried and some explanations why.

Current behavior
3D plotting with MuJoCo in TensorFlow is synchronous, blocks the training process.

Desired behavior
3D plotting with MuJoCo in TensorFlow is asynchronous and does not block the training process (just as with Theano)

Pointers to relevant source

To run a basic training algorithm which has 3D plots using Theano (you will need to pass plot=True to TRPO):
https://github.com/ryanjulian/rllab/blob/master/examples/trpo_swimmer.py
To run a basic training algorithm with TensorFlow:
https://github.com/ryanjulian/rllab/blob/master/sandbox/rocky/tf/launchers/trpo_cartpole.py
Note: this does not use a MuJoCo environment/3D plotting, but you just need to change the environment to SwimmerEnv() and it will.
Asynchronous plotter API used by the Theano code
https://github.com/ryanjulian/rllab/tree/master/rllab/plotter
Theano and TF implementations of BatchPolopt (calls the plotter)
Theano: https://github.com/ryanjulian/rllab/blob/master/rllab/algos/batch_polopt.py
TF: https://github.com/ryanjulian/rllab/blob/master/sandbox/rocky/tf/algos/batch_polopt.py
rllab implementation of Serializable (used to transfer objects across multiprocessing queues): https://github.com/ryanjulian/rllab/blob/master/rllab/core/serializable.py

Submission instructions

Fork this repository into you own Github account, and implement this feature in its own branch, based on the master branch. When you are done and would like a code review, send me an email with a link to your feature branch. DO NOT SUBMIT A PULL REQUEST TO THIS REPOSITORY.

Consider this a professional software engineering task, and provide a high-quality solution which does not break existing users, minimizes change, and is stable. Tests are welcome where appropriate. Please always use PEP8 style in your code, and format it using YAPF (with the PEP8 setting).

Notes

Testing the software requires a freely-available student license for MuJoCo available here. It takes a couple days to get approved, so do it early.
rllab has setup instructions at http://rllab.readthedocs.io
You can find examples of how to launch rllab in examples and sandbox/rocky/tf/launchers. Note that everything must run using the run_experiment_lite wrapper with the parameter n_parallel greater than 1 (this triggers multiprocess operation).
rllab currently has two parallel implementation of the neural network portions of the library. The original is written in Theano and is found in rllab/. The tree sandbox/rocky/tf re-implements classes from the original tree using TensorFlow, and is backwards-compatible with the Theano tree. We are working towards using only one NN library soon, but for now your implementation needs to work in both trees.
rllab is an upstream dependency to many projects, so it is important we do not break the existing APIs. Adding to APIs is fine as long as there is a good reason.

Imported from ryanjulian/rllab#1

Custom flake8-import-order style class

See https://github.com/PyCQA/flake8-import-order

This will allow us to put sandbox and contrib in separate groups below imports from garage

Roboschool support

Imported from ryanjulian/rllab#6

Runtime docker containers

They have not been touched in a long time.

Tasks:

Dependency audit
Rebase onto current NVIDIA base containers
Reintroduce docker build CI tests

Imported from ryanjulian/rllab#79

Cleanup normalized env

Normalized env of gym env is done at ryanjulian/rllab#125, but the refactoring of gym env is not done yet(ryanjulian/rllab#129). Please refactor for normalized_gym_env.py of our codebase.

Imported from: ryanjulian/rllab#131

Extract objects manager interface for contrib.ros

Imported from ryanjulian/rllab#76

GPU version of the pip package

This is more principled than stuffing dependency logic into shell scripts.

Imported from ryanjulian/rllab#78

Support for conditional distributions

Given a Distribution(y|x), where x = [x_1, x_2, x_3] I should be able to define a conditional distributions Dist(y| x_1), Dist(y | [x_1, x_3]), Dist(y | x_2), etc.

Imported from ryanjulian/rllab#3

Gym environments should not show a plot when plot=False

Imported from ryanjulian/rllab#31

Merge rlkit code to start garage/torch

It will be the base for garage.torch

Add tf GPU options

It should be possible to set the TensorFlow session options whenever a tf.Session is created for training, such as here (other places where a session is constructed might not need to use these options). It is sometimes necessary to limit the available memory to tf running on a GPU, etc.
A possible implementation could allow the user to specify the GPU options via a ConfigProto setting in config_personal.py (set to None by default).

Imported from ryanjulian/rllab#123

Add adaptive task parameters support in ros.env

Imported from ryanjulian/rllab#97

DDPG for TensorFlow

Original paper:
https://arxiv.org/abs/1509.02971

Implementation in the Theano tree:
https://github.com/ryanjulian/rllab/blob/integration/rllab/algos/ddpg.py

OpenAI baselines implementation:
https://github.com/openai/baselines/tree/master/baselines/ddpg

blog post (there are many other resources
http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html

Sketch:

Provide an implementation in sandbox/rocky/tf/ddpg.py
Add any needed primitives to rllab/ and sandbox/rocky/tf/
Provide a regression test (of the reward curve) against the openai/baselines implementation

Imported from ryanjulian/rllab#26

Replace Distributions with tf.Distributions

To add some details: Distributions are used by policies and other modules to add distribution functionality, such as computing the KL divergence between two distributions etc. given the parameters of a distribution (which are often output tensors of an NN). TF.Distributions probably implements the same thing so that we should try to replace our code by using the TF counterpart.

Imported from ryanjulian/rllab#52

Add the functionality to compute transformation matrix between vicon frame and robot frame in contrib.ros.util.vicon

Imported from ryanjulian/rllab#94

Rewrite the logger

The current logger interface is okay, but the implementation is a bit messy and the TensorBoard integration is kind of a bolt-on afterthought. It's also package-global which can induce bad implementation decisions.

I'd like to rewrite the logger to be properly encapsulated as a class(es). There can still be a global singleton instance for easy access.

Ideas:

Eliminate global scope
First-class TensorBoard support
Multiprocess-aware logging
Make the logger API appropriate for all aspects of rllab (e.g. bring-up, training algo, random debugging, etc.) to avoid code peppered with print statements
Decouple logged datapoints from output formats
Decouple checkpointing and logging
Sophisticated checkpoint/log destinations (e.g. remote buckets?)
Take advantage of a minimalist logging framework rather than hand-crafting logger formats?

Imported from ryanjulian/rllab#80

Integrate MuJoCo setup script with setup_<platform>.sh

We should get rid of this to make packaging easier, unless there's a very compelling reason why not to. dm_control makes it work fine.

Imported from ryanjulian/rllab#96

setup_linux.sh fails while installing dm_control

It's missing an absl dependency.

Imported from ryanjulian/rllab#89

Refactor contrib.ros

Make it more clean and simple

Imported from ryanjulian/rllab#75

Support for dynamics randomization

https://blog.openai.com/generalizing-from-simulation/

We should probably not attempt this until #4 is done.

For now, only MuJoCo support is necessary. If MuJoCo is too burdensome, we can consider switching engines (e.g. Bullet)

Related issue openai/mujoco-py#148 suggests this may be nontrivial.

Imported from ryanjulian/rllab#61

Normalization for gym envs

Now that we are using OpenAI gym directly in our RL algorithms, we still need to normalize the environment (actions, observations, rewards) as in https://github.com/ryanjulian/rllab/blob/integration/rllab/envs/normalized_env.py.

So implement NormalizedEnv for gym.Env.

Imported from ryanjulian/rllab#64

Replace conda with a pip package

conda makes it difficult to use rllab as a library.

We would like to transition to using the standard Python package interface. This will require getting all the dependencies to install using pip, plus probably a some custom setup scripts for setup.py.

Adding wheel compilation to the CI (e.g. appveyor) is also in-scope for this project.

Imported from ryanjulian/rllab#81

Rename rllab to garage

Move theano-specific code to garage.theano

Theano should no longer be first-class while tf second-class. We are aiming for major parts of rllab to be NN-framework agnostic.

This should move theano-specific components into garage.theano, while stripping Theano dependencies from common parts of the code.

Imported fromhttps://github.com/ryanjulian/rllab/issues/83

dm_control regression test

Imported from ryanjulian/rllab#35

Add PEP8 checks to the CI

Imported from ryanjulian/rllab#95

Move sandbox.rocky.tf to rllab.tf

We are moving towards making the common parts of rllab agnostic of the NN library. TensorFlow should no longer be a second-class citizen.

This change would remove the TensorFlow sandbox and make the TensorFlow tree a first-class rllab citizen.

Imported from ryanjulian/rllab#84

Support multi-modal policies

In order to support visuomotor control learning and other problems, we need to implement a way to use policies that consist of submodules which handle certain input modalities, such as images and vectors. OpenAI Gym already has support for a tuple_space that is a tuple of different spaces. The most common use-case of such multi-modal observation spaces are combinations of 2d images and vectors.

Exact specification needs to be done but for now the task items look as follows:

add a new space representing 2d images
implement a test environment that has a tuple_space as observation space consisting of an image and a vector (e.g. reacher with top-down view image and 2d endeffector position)
additionally a wrapper would be useful that adds a visual output to an existing environment (renders user-defined camera to 2d pixel array and adds it to the tuple space, or makes a tuple space if environment was unimodal before)
implement a multi-modal policy that builds convolutional submodules for image spaces and MLPs for vectors, and merges the top layers from these submodules via an MLP that computes the final output

Your feedback on this issue is most welcome so that we can split up this feature into smaller tasks.

Imported from ryanjulian/rllab#108

Replace rllab.envs.Env with gym.Env

The community has settled on gym.Env as a de-facto standard environment interface. There's no reason to keep our own around.

The scope of this change is to remove the rllab.envs.Env base interface, and refactor implementing classes to instead implement gym.Env. Note that this explicitly does not mean that we are adopting the physics engine, registration system, benchmarks, etc of OpenAI Gym--just the gym.Env abstract interface.

Imported from ryanjulian/rllab#85

InvertedDoublePendulumEnv is broken

Traceback (most recent call last):
  File "tests/envs/test_envs.py", line 65, in <module>
    envs = [cls() for cls in simple_env_classes]
  File "tests/envs/test_envs.py", line 65, in <listcomp>
    envs = [cls() for cls in simple_env_classes]
  File "/home/rjulian/code/garage/rllab/envs/mujoco/point_env.py", line 21, in __init__
    super(PointEnv, self).__init__(*args, **kwargs)
  File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 85, in __init__
    self.reset()
  File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 131, in reset
    return self.get_current_obs()
  File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 134, in get_current_obs
    return self._get_full_obs()
  File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 138, in _get_full_obs
    cdists = np.copy(self.sim.geom_margin).flat
AttributeError: 'mujoco_py.cymj.MjSim' object has no attribute 'geom_margin'

conda build regression test

Imported from ryanjulian/rllab#34