Git Product home page Git Product logo

mpg's Introduction

Mixed Policy Gradient (MPG)

The repository MPG is originated from the paper "Mixed policy gradient" (see here for details), which contains a cluster of high-quality implementations of deep reinforcement learning algorithms, including the proposed MPG and other baseline algorithms, namely n-step Approximate Dynamic Programming (n-step ADP), n-step Deterministic Policy Gradient (n-step DPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC). In addition, we also implemented the widely used on-policy algorithms such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) (see branch mpg_on_policy).

The implementation is fairly thin and primarily optimized for our own development purposes. It is designed with TensorFlow 2 and Ray to realize a high-throughput asynchronous learning architecture, which modularizes the process of sampling, storing, learning, evaluating and testing with clear interfaces, organizes each of them in parallel, as shown below. This architecture can help to scale to hundreds of cpu cores to largely enhance the sampling and update throughput. Besides, with the general design, most of the gradient-based reinforcement learning algorithms can be incorporated.

images

Get started

Run the script files under the train_scripts folder, in which you can choose an algorithm and modify its related parameters. Then, enjoy it. :)

References

The algorithms are based on the following papers:

Guan, Y., Duan, J., Li, S. E., Li, J., Chen, J., & Cheng, B. (2021). Mixed Policy Gradient. arXiv preprint arXiv:2102.11513.

Silver, David, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. "Deterministic policy gradient algorithms." In International conference on machine learning, pp. 387-395. PMLR, 2014.

Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." In International Conference on Machine Learning, pp. 1587-1596. PMLR, 2018.

Haarnoja, Tuomas, Aurick Zhou, Pieter Abbeel, and Sergey Levine. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." In International Conference on Machine Learning, pp. 1861-1870. PMLR, 2018.

Schulman, John, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. PMLR, 2015.

Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).

If MPG helps you in your academic research, you are encouraged to cite our paper. Here is an example bibtex:

@article{guan2021mixed,
  title={Mixed Policy Gradient},
  author={Guan, Yang and Duan, Jingliang and Li, Shengbo Eben and Li, Jie and Chen, Jianyu and Cheng, Bo},
  journal={arXiv preprint arXiv:2102.11513},
  year={2021}
}

mpg's People

Contributors

idthanm avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.