Git Product home page Git Product logo

deep-bayesian-quadrature-policy-optimization's Introduction

Deep Bayesian Quadrature Policy Optimization

Akella Ravi Tej1, Kamyar Azizzadenesheli1, Mohammad Ghavamzadeh2, Anima Anandkumar3, Yisong Yue3
1Purdue University, 2Google Research, 3Caltech


Preprint: arxiv.org/abs/2006.15637
Publication: AAAI-21 (also presented at NeurIPS Deep RL and Real-World RL Workshops 2020)
Project Website: akella17.github.io/publications/Deep-Bayesian-Quadrature-Policy-Optimization/

Bayesian Quadrature for Policy Gradient

MIT license contributions welcome

Bayesian quadrature is an approach in probabilistic numerics for approximating a numerical integration. When estimating the policy gradient integral, replacing standard Monte-Carlo estimation with Bayesian quadrature provides

  1. more accurate gradient estimates with a significantly lower variance
  2. a consistent improvement in the sample complexity and average return for several policy gradient algorithms
  3. a methodological way to quantify the uncertainty in gradient estimation.

This repository contains a computationally efficient implementation of BQ for estimating the policy gradient integral (gradient vector) and the estimation uncertainty (gradient covariance matrix). The source code is written in a modular fashion, currently supporting three policy gradient estimators and three policy gradient algorithms (9 combinations overall):

Policy Gradient Estimators :-

  1. Monte-Carlo Estimation
  2. Deep Bayesian Quadrature Policy Gradient (DBQPG)
  3. Uncertainty Aware Policy Gradient (UAPG)

Policy Gradient Algorithms :-

  1. Vanilla Policy Gradient
  2. Natural Policy Gradient (NPG)
  3. Trust-Region Policy Optimization (TRPO)

Project Setup

This codebase requires Python 3.6 (or higher). We recommend using Anaconda or Miniconda for setting up the virtual environment. Here's a walk through for the installation and project setup.

git clone https://github.com/Akella17/Deep-Bayesian-Quadrature-Policy-Optimization.git
cd Deep-Bayesian-Quadrature-Policy-Optimization
conda create -n DBQPG python=3.6
conda activate DBQPG
pip install -r requirements.txt

Supported Environments

  1. Classic Control
  2. MuJoCo
  3. PyBullet
  4. Roboschool
  5. DeepMind Control Suite (via dm_control2gym)

Training

Modular implementation:

python agent.py --env-name <gym_environment_name> --pg_algorithm <VanillaPG/NPG/TRPO> --pg_estimator <MC/BQ> --UAPG_flag

All the experiments will run for 1000 policy updates and the logs get stored in session_logs/ folder. To reproduce the results in the paper, refer the following command:

# Running Monte-Carlo baselines
python agent.py --env-name <gym_environment_name> --pg_algorithm <VanillaPG/NPG/TRPO> --pg_estimator MC
# DBQPG as the policy gradient estimator
python agent.py --env-name <gym_environment_name> --pg_algorithm <VanillaPG/NPG/TRPO> --pg_estimator BQ
# UAPG as the policy gradient estimator
python agent.py --env-name <gym_environment_name> --pg_algorithm <VanillaPG/NPG/TRPO> --pg_estimator BQ --UAPG_flag

For more customization options, kindly take a look at the arguments.py.

Visualization

visualize.ipynb can be used to visualize the Tensorboard files stored in session_logs/ (requires jupyter and tensorboard installed).

Results

Vanilla Policy Gradient

Average of 10 runs.

Natural Policy Gradient

Average of 10 runs.

Trust Region Policy Optimization

Average of 10 runs.

Implementation References

Contributing

Contributions are very welcome. If you know how to make this code better, please open an issue. If you want to submit a pull request, please open an issue first. Also see the todo list below.

TODO

  • Implement policy network for discrete action space and test on Arcade Learning Environment (ALE).
  • Add other policy gradient algorithms.

Citation

If you find this work useful, please consider citing:

@article{ravi2020DBQPG,
    title={Deep Bayesian Quadrature Policy Optimization},
    author={Akella Ravi Tej and Kamyar Azizzadenesheli and Mohammad Ghavamzadeh and Anima Anandkumar and Yisong Yue},
    journal={arXiv preprint arXiv:2006.15637},
    year={2020}
}

deep-bayesian-quadrature-policy-optimization's People

Contributors

akella17 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.