Git Product home page Git Product logo

alh's Introduction

Setup

(Linux only) Create test_env environment by either:

cd code && sh setup.sh

or:

cd code
conda create --name test_env --file environment.yml
conda activate test_env
pip install -r requirements.txt

then install MuJoCo as instruction

Online setting

Our ALH is implemented in file ./code/online/memTD3.py. The adaptive rollout and two variants of discovery scheme are described in both files ./code/online/main.py (Mujoco test) and ./code/online/main_toy.py (MultiNormEnv analysis). For a clear presentation, please refer to our paper.

Run MultiNormEnv

conda activate test_env
cd code/online && sh run_experiments_toy.sh [n]

where [n] is the number of parallel processes. If the machine does not have gpu [n], the total number of parallel processes is equal to [n]. If the machine has multiple gpus, the total number of parallel processes is equal to [n] x [number of gpus]

Run Mujoco-Gym

conda activate test_env
cd code/online && sh run_experiments.sh [n]

Offline setting

Our ALH is implemented in file ./code/offline/memTD3.py. The adaptive rollout is described in file ./code/offline/evaluate.py.

conda activate test_env
cd code/offline && sh run_experiments.sh [n]

References

To be fairly compared with TD3/TD3+BC, our implementation bases on author implementation of:

@inproceedings{fujimoto2018addressing,
    title={Addressing Function Approximation Error in Actor-Critic Methods},
    author={Fujimoto, Scott and Hoof, Herke and Meger, David},
    booktitle={International Conference on Machine Learning},
    pages={1582--1591},
    year={2018}
}
@inproceedings{fujimoto2021minimalist,
    title={A Minimalist Approach to Offline Reinforcement Learning},
    author={Scott Fujimoto and Shixiang Shane Gu},
    booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
    year={2021},
}

And compares with implementation in:

@inproceedings{janner2019mbpo,
    author = {Michael Janner and Justin Fu and Marvin Zhang and Sergey Levine},
    title = {When to Trust Your Model: Model-Based Policy Optimization},
    booktitle = {Advances in Neural Information Processing Systems},
    year = {2019}
}
@misc{pytorch_minimal_ppo,
    author = {Barhate, Nikhil},
    title = {Minimal PyTorch Implementation of Proximal Policy Optimization},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/nikhilbarhate99/PPO-PyTorch}},
}

Figures

Reproducing all experiments can take about several thousands of GPU hour. If you only want to compare with our results, we publish all result files in ./results. For visualization (tables, figures) reported in our paper, refer to ./plot.

Reproduce our figures

We provide code to reproduce our all reported figures in our paper. Please refer to this notebook file

Extra experiments

We provide descriptions for our extra experiments in our appendix. Please refer to this notebook file

alh's People

Contributors

nbtpj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.