RL-Algorithms

Contact Info

If you have any questions, please leave a message or send the email to

[email protected] or [email protected]

Very Important Note

If you would like to see all formulas in all .md file, please add the MathJax Plugin. The steps to add the plugin is to search the Chrome app store and type 'mathjax' in the 'search' section. Then add MathJax Plugin for Github

Introduction

In reinforcement learning, a policy can be generated directly from the value function. Besides, a policy can be directly parameterised. This is what the policy gradient comes. If you are not familiar with policy gradient, please read the book ''

The following reinforcement learning algorithms are included:

Q-Learning
DQN
NPG
A2C
TRPO
PPO
ACER
DDPG
D4PG
SAC
TD3

Code

Q-Learning
Deep-Q network (DQN)
Vanilla Policy Gradient
Natural Policy Gradient (NPG)
Actor-Critic (A2C)
Trust Region Policy Optimization (TRPO)
Proximal Policy Optimization (PPO)
Actor Critic with Experience Replay (ACER)
Deep Deterministic Policy Gradient (DDPG)
Distributed Distributional Deep Deterministic Policy Gradient (D4PG)
Soft Actor Critic (SAC)
Twin Delayed Deep Deterministic policy gradient (TD3)

Reference

Kakade, Sham M. "A natural policy gradient." In Advances in neural information processing systems, pp. 1531-1538. 2002.
Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International conference on machine learning, pp. 1928-1937. 2016.
Schulman, John, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. 2015.
Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
Wang, Ziyu, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. "Sample efficient actor-critic with experience replay." arXiv preprint arXiv:1611.01224 (2016).
Lillicrap, Timothy Paul, Jonathan James Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daniel Pieter Wierstra. "Continuous control with deep reinforcement learning." U.S. Patent Application 15/217,758, filed January 26, 2017.
Barth-Maron, Gabriel, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap. "Distributed distributional deterministic policy gradients." arXiv preprint arXiv:1804.08617 (2018).
Haarnoja, Tuomas, Aurick Zhou, Pieter Abbeel, and Sergey Levine. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290 (2018).
Wu, Yuhuai, Elman Mansimov, Roger B. Grosse, Shun Liao, and Jimmy Ba. "Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation." In Advances in neural information processing systems, pp. 5279-5288. 2017.
Fujimoto, Scott, Herke van Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." arXiv preprint arXiv:1802.09477 (2018).
Lapan, Maxim. Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more. Packt Publishing Ltd, 2018.

colin-zgf / rl-algorithms Goto Github PK

rl-algorithms's Introduction

RL-Algorithms

Contact Info

Very Important Note

Introduction

Contents

Code

Reference

rl-algorithms's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent