DeepRL

If you have any question or want to report a bug, please open an issue instead of emailing me directly.

Modularized implementation of popular deep RL algorithms by PyTorch. Easy switch between toy tasks and challenging games.

Implemented algorithms:

(Double/Dueling) Deep Q-Learning (DQN)
Categorical DQN (C51, Distributional DQN with KL Distance)
Quantile Regression DQN
(Continuous/Discrete) Synchronous Advantage Actor Critic (A2C)
Synchronous N-Step Q-Learning
Deep Deterministic Policy Gradient (DDPG)
Proximal Policy Optimization (PPO)
The Option-Critic Architecture (OC)
Twined Delayed DDPG (TD3)
Bi-Res-DDPG/DAC/Geoff-PAC/QUOTA/ACE

Asynchronous algorithms (e.g., A3C) can be found in v0.1. Action Conditional Video Prediction can be found in v0.4. Synchronous PPO for Atari games can be found in v1.1

Dependency

MacOS 10.12 or Ubuntu 16.04
PyTorch v1.4.0
Python 3.6, 3.5
OpenAI Baselines (commit 8e56dd)
Core dependencies: pip install -e .

Remarks

There is a super fast DQN implementation with an async actor for data generation and an async replay buffer to transfer data to GPU. Enable this implementation by setting config.async_actor = True and using AsyncReplay. However, with atari games this fast implementation may not work in macOS. Use Ubuntu or Docker instead.
Although there is a setup.py, this repo is never designed to be a high-level library like Keras. Use it as your codebase instead.
TensorFlow is used only for logging. Open AI baselines is used very slightly. If you carefully read the code, you should be able to remove/replace them.

Usage

examples.py contains examples for all the implemented algorithms

Dockerfile contains the environment for generating the curves below.

Please use this bibtex if you want to cite this repo

@misc{deeprl,
  author = {Shangtong, Zhang},
  title = {Modularized Implementation of Deep RL Algorithms in PyTorch},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/ShangtongZhang/DeepRL}},
}

Curves (commit `36aad5`)

BreakoutNoFrameskip-v4 (1 run)

Mujoco

DDPG/TD3 evaluation performance. (5 runs, mean + standard error)
PPO online performance. (5 runs, mean + standard error, smoothed by a window of size 10)

References

Code of My Papers

They are located in other branches of this repo and seem to be good examples for using this codebase.

excaive / deeprl Goto Github PK

deeprl's Introduction

DeepRL

Dependency

Remarks

Usage

Curves (commit `36aad5`)

BreakoutNoFrameskip-v4 (1 run)

Mujoco

References

Code of My Papers

deeprl's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

excaive / deeprl Goto Github PK

deeprl's Introduction

DeepRL

Dependency

Remarks

Usage

Curves (commit 36aad5)

BreakoutNoFrameskip-v4 (1 run)

Mujoco

References

Code of My Papers

deeprl's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org

Curves (commit `36aad5`)