Git Product home page Git Product logo

bremen's Introduction

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

This codebase implements a deployment-efficient algorithm, BREMEN, proposed in Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization.

We modified ME-TRPO repository for deployment-efficient or offline settings.

Dependencies

We recommend you to use Docker.

You can use Python 3.6. You must download MuJoCo 1.31 from https://www.roboti.us/, and then install package dependencies.

pip install -r requirements.txt

(Option) Collect data with another codebase

You must use Behavior Regularized Offline Reinforcement Learning codebase for data collection. Follow their instruction and collect 1M transitions with each noise strategies (pure, eps1, eps3, gaussian1, gaussian3). If you are interested in deployment-efficient settings, it is enough to collect transitions with pure strategy. After the data collection, put data.data-00000-of-00001 and data.index to the ./data/<Agent name>/pure/

e.g. ./data/Ant/pure/data.data-00000-of-00001, ./data/Ant/pure/data.index

Note: This procedure is needed for offline experiments. If you just run deployment-efficient experiments, you can skip. However, this must be done if you want to save video of your policy (because of the normalization of state and action).

(Option) Visualize deployment-efficient RL results

This repository contains pre-trained policies of BREMEN in deployment-efficient settings with batch size 200k (Top row in Figure 2). Save video for the visualization of the results using the following command:

e.g.

python save_video.py --env ant --param_path configs/params_ant_offline.json --video_dir <relative path to the video save dir> --restore_path ./weights/Ant/policy.ckpt --restore_policy_variables --n_train 50000

You can use four pre-trained policies of BREMEN ant, half_cheetah, hopper, walker2d. (This process requires offline data for the normalization of state and action.)

Run deployment-efficient experiments

Run BREMEN in deployment-efficient experiments using the following command:

python recursive.py --env <env_name> --exp_name <experiment_name> --sub_exp_name <exp_save_dir> --param_path configs/params_<env_name>_offline.json --bc_init --random_seeds 0 --target_kl 0.01 --max_path_length 1000
  • env_name: ant, half_cheetah, hopper, walker2d, cheetah_run
  • exp_name: what you want to call your experiment
  • sub_exp_name: partial path for saving experiment logs and results
  • param_path: path to config json file
  • target_kl: delta in TRPO objective
  • max_path_length: length of an imaginary rollout
  • bc_init: enable behavior-initialization
  • alpha: coefficient of explicit KL value penalty (0 is the default)

Experiment results will be logged to ./log/<env_name>/<exp_save_dir>/<experiment_name>/<experiment_name><seed>/

e.g.

python recursive.py --env ant --exp_name recursive_example --sub_exp_name BREMEN_demo --param_path configs/params_ant_offline.json --bc_init --random_seeds 0 --target_kl 0.05 --max_path_length 250 --gaussian 0.1 --const_sampling

python recursive.py --env half_cheetah --exp_name recursive_example --sub_exp_name BREMEN_demo --param_path configs/params_half_cheetah_offline.json --bc_init --random_seeds 0 --target_kl 0.1 --max_path_length 250 --gaussian 0.1 --const_sampling

python recursive.py --env cheetah_run --exp_name recursive_example --sub_exp_name BREMEN_demo --param_path configs/params_cheetah_run_offline.json --bc_init --random_seeds 0 --target_kl 0.1 --max_path_length 250 --gaussian 0.1 --const_sampling

python recursive.py --env hopper --exp_name recursive_example --sub_exp_name BREMEN_demo --param_path configs/params_hopper_offline.json --bc_init --random_seeds 0 --target_kl 0.05 --max_path_length 1000 --gaussian 0.1 --const_sampling --n_train 2000000 --onpol_iters 2400 --interval 240

python recursive.py --env walker2d --exp_name recursive_example --sub_exp_name BREMEN_demo --param_path configs/params_walker2d_offline.json --bc_init --random_seeds 0 --target_kl 0.05 --max_path_length 1000 --gaussian 0.1 --const_sampling --n_train 2000000 --onpol_iters 800

Run offline experiments

Run BREMEN in offline experiments using the following command:

python offline.py --env <env_name> --exp_name <experiment_name> --sub_exp_name <exp_save_dir> --param_path configs/params_<env_name>_offline.json --bc_init --random_seeds 0 --target_kl 0.01 --max_path_length 1000
  • env_name: ant, half_cheetah, hopper, walker2d
  • exp_name: what you want to call your experiment
  • sub_exp_name: partial path for saving experiment logs and results
  • param_path: path to config json file
  • target_kl: delta in TRPO objective
  • max_path_length: length of an imaginary rollout
  • bc_init: enable behavior-initialization
  • alpha: coefficient of explicit KL value penalty (0 is the default)
  • onpol_iters: number of outer iteration (inner iteration is set to 25).
  • noise: (pure, eps1, eps3, gaussian1, gaussian3, random), default is pure

Experiment results will be logged to ./log/<env_name>/<exp_save_dir>/<experiment_name>/<experiment_name><seed>/

e.g.

python offline.py --env ant --exp_name offline_example --sub_exp_name BREMEN_demo --param_path configs/params_ant_offline.json --bc_init --random_seeds 0 --target_kl 0.05 --max_path_length 250 --gaussian 0.1 --const_sampling --onpol_iters 250

python offline.py --env half_cheetah --exp_name offline_example --sub_exp_name BREMEN_demo --param_path configs/params_half_cheetah_offline.json --bc_init --random_seeds 0 --target_kl 0.1 --max_path_length 250 --gaussian 0.1 --const_sampling  --onpol_iters 250

python offline.py --env cheetah_run --exp_name offline_example --sub_exp_name BREMEN_demo --param_path configs/params_cheetah_run_offline.json --bc_init --random_seeds 0 --target_kl 0.1 --max_path_length 250 --gaussian 0.1 --const_sampling  --onpol_iters 250

python offline.py --env hopper --exp_name offline_example --sub_exp_name BREMEN_demo --param_path configs/params_hopper_offline.json --bc_init --random_seeds 0 --target_kl 0.05 --max_path_length 1000 --gaussian 0.1 --const_sampling --onpol_iters 250

python offline.py --env walker2d --exp_name offline_example --sub_exp_name BREMEN_demo --param_path configs/params_walker2d_offline.json --bc_init --random_seeds 0 --target_kl 0.05 --max_path_length 1000 --gaussian 0.1 --const_sampling  --onpol_iters 250

Citation

Please use the following bibtex for citations:

@article{matsushima2020deploy,
    title={Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization},
    author={Tatsuya Matsushima and Hiroki Furuta and Yutaka Matsuo and Ofir Nachum and Shixiang Shane Gu},
    year={2020},
    journal={arXiv preprint arXiv:2006.03647},
}

bremen's People

Contributors

frt03 avatar l3str4nge avatar yusuke0519 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.