If you have any questions, please leave a message or send the email to
[email protected] or [email protected]
If you would like to see all formulas in all .md file, please add the MathJax Plugin. The steps to add the plugin is to search the Chrome app store and type 'mathjax' in the 'search' section. Then add MathJax Plugin for Github
In reinforcement learning, a policy can be generated directly from the value function. Besides, a policy can be directly parameterised. This is what the policy gradient comes. If you are not familiar with policy gradient, please read the book ''
The following reinforcement learning algorithms are included:
- Q-Learning
- Deep-Q network (DQN)
- Vanilla Policy Gradient
- Natural Policy Gradient (NPG)
- Actor-Critic (A2C)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- Actor Critic with Experience Replay (ACER)
- Deep Deterministic Policy Gradient (DDPG)
- Distributed Distributional Deep Deterministic Policy Gradient (D4PG)
- Soft Actor Critic (SAC)
- Twin Delayed Deep Deterministic policy gradient (TD3)
- Kakade, Sham M. "A natural policy gradient." In Advances in neural information processing systems, pp. 1531-1538. 2002.
- Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International conference on machine learning, pp. 1928-1937. 2016.
- Schulman, John, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. 2015.
- Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
- Wang, Ziyu, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. "Sample efficient actor-critic with experience replay." arXiv preprint arXiv:1611.01224 (2016).
- Lillicrap, Timothy Paul, Jonathan James Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daniel Pieter Wierstra. "Continuous control with deep reinforcement learning." U.S. Patent Application 15/217,758, filed January 26, 2017.
- Barth-Maron, Gabriel, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap. "Distributed distributional deterministic policy gradients." arXiv preprint arXiv:1804.08617 (2018).
- Haarnoja, Tuomas, Aurick Zhou, Pieter Abbeel, and Sergey Levine. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290 (2018).
- Wu, Yuhuai, Elman Mansimov, Roger B. Grosse, Shun Liao, and Jimmy Ba. "Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation." In Advances in neural information processing systems, pp. 5279-5288. 2017.
- Fujimoto, Scott, Herke van Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." arXiv preprint arXiv:1802.09477 (2018).
- Lapan, Maxim. Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more. Packt Publishing Ltd, 2018.