Git Product home page Git Product logo

action_priors's Introduction

Action Priors for Large Action Spaces in Robotics

This repository contains the source code for Action Priors for Large Action Spaces in Robotics. Our implementation uses PyTorch for neural nets and PyBullet for robotic simulation. See helping_hands_rl_envs (also included in ap/helping_hands_rl_envs) for our PyBullet environments.

Image of our simulated block stacking setup

Setup Python

Set up a Python 3.6 virtual environment.

env_path=~/ap_env
python3 -m venv $env_path
source ${env_path}/bin/activate

Installed required packages.

pip install -r requirements.txt

Install pytorch and torchvision. I used torch==1.5.1 and torchvision==0.6.1 but you should make sure that the version of pytorch you are using matches your CUDA and cuDNN versions. Fruits World models might be trainable on a CPU, but if you want to replicate our block stacking experiments you will need to use a GPU.

(Optional) Setup MongoDB

We use sacred with MongoDB backend to store the results of our experiments. You can either set up MongoDB or run our scripts without it. If you decide on the latter our scripts will print the results in the terminal, but you will not be able to generate aggregated results (e.g. running action priors and DQNs ten times and comparing their mean rewards).

Install MongoDB: general guide (Ubuntu-specific guide).

Start MongoDB (you should run this inside of a screen so that the DB keeps running when you exit command line).

mkdir -p data/mongo_db
cd data/mongo_db
mongod --dbpath .

Create a database.

mongo                           # connect mongodb client
use experiments                 # use experiments DB
db.test.insert({test: "test"})  # add a single entry in order to create DB

Open ap/constants.py and change the last two lines from

MONGO_URI = None
DB_NAME = None

to

MONGO_URI = "mongodb://localhost"
DB_NAME = "experiments"

Replicating Fruits World Experiments

Running Experiments

There are versions of the experiment: picking up fruits in any order (ap/envs/fruits.py) and picking up fruits in a specific order (ap/envs/fruits_seq.py). We instantiate 30 tasks in the former and 20 in the latter. You can find lists of tasks in ap/cfg/tasks.

Train an expert DQN for each task individually:

python -m ap.sh.fruits_comb.fruits_DQN_models

python -m ap.sh.fruits_seq.fruits_seq_DQN_models

Collection and join datasets (used to train action priors):

python -m ap.sh.fruits_comb.fruits_DQN_models_collect_data
python -m ap.sh.fruits_comb.fruits_DQN_join_datasets_limit

python -m ap.sh.fruits_seq.fruits_seq_DQN_models_collect_data
python -m ap.sh.fruits_seq.fruits_seq_DQN_join_datasets_limit

Train action priors. We will get as many action prior networks as there are tasks. For example, data/fruits_seq_AAF_cv/model_5.pt is an action prior that was trained on all tasks except for the 5th one. In our leave-one-out experiment, we train an action prior (or Actor-Mimic) on N-1 tasks, and then use it to learn a previously unseen Nth task.

python -m ap.sh.fruits_comb.fruits_AAF_cv

python -m ap.sh.fruits_seq.fruits_seq_AAF_cv

Train Actor-Mimic students (baseline). Same logic as above.

python -m ap.sh.fruits_comb.actor_mimic_v2

python -m ap.sh.fruits_seq.seq_actor_mimic_v2

Perform transfer learning with action priors. The transfer script will repeat each experiment ten times. You can edit them if you want less runs.

python -m ap.sh.fruits_comb.fruits_DQN_AAF

python -m ap.sh.fruits_seq.fruits_seq_DQN_AAF

Get results for baselines:

python -m ap.sh.fruits_comb.fruits_DQN_baselines    # no transfer DQN
python -m ap.sh.fruits_comb.fruits_DQN_transfer     # various actor-mimic alternatives
python -m ap.sh.fruits_comb.fruits_DQN_freeze_transfer
python -m ap.sh.fruits_comb.fruits_DQN_side_transfer

python -m ap.sh.fruits_seq.fruits_seq_DQN_baselines    # no transfer DQN
python -m ap.sh.fruits_seq.fruits_seq_DQN_transfer     # various actor-mimic alternatives
python -m ap.sh.fruits_seq.fruits_seq_DQN_freeze_transfer
python -m ap.sh.fruits_seq.fruits_seq_DQN_side_transfer

Analyzing Results

Recreate Figure 2. This script will first show a figure for fruits combinations and then fruits sequences.

python -m ap.scr.analyze.fruits_figure

Recreate Table 3 in the appendix.

  • fruits_DQN_AAF: action priors transfer
  • fruits_DQN_baselines: no transfer
  • other: actor-mimic transfer, see description in the paper

Num. fruits refer to the difficulty of the task. Mid/fin point/window refer to the performance in the middle or at the end of training. Mid window is averaged over a small window of training steps.

python -m ap.scr.analyze.fruits_comb_table

python -m ap.scr.analyze.fruits_seq_table

Replicating Block Stacking Experiments

Selecting GPUs and Parallelization

Scripts for running simulated block stacking in pybullet can be found in ap/sh/blocks. Each script has at least two arguments:

  • --gpu-list: list of GPUs to use (e.g. --gpu-list 0 1)
  • --jobs-per-gpu: number of jobs to run on each GPU (e.g. --jobs-per-gpu 2)

These arguments control the parallelization over the 16 different tasks we have. For example

python -m ap.sh.blocks.blocks_sdqfd_c --gpu-list 0 1 --jobs-per-gpu 2

will train 16 different SDQfD agents (one for each task). It will train two agents in parallel on the first GPU and two agents in parallel on the second GPU resulting in four agents being trained at the same time.

To generate the results in our paper, we used a machine with four NVIDIA RTX 2080 Ti GPUs, 256 GB of RAM and an AMD Rizen Threadripper CPU. Each experiment took about two days when we ran two agents on each GPU (i.e. eight agents trained in parallel). SDQfD training takes about one day, and collecting deconstruction episodes takes a couple of hours.

You can reduce the training time by training on fewer tasks. Simply delete some tasks from ap/cfg/tasks/block_stacking.json.

Running Experiments

Collect deconstruction episodes. It's around 50 GB of data for 16 tasks.

python -m ap.sh.blocks.blocks_collect_dec_c

Train SDQfD experts on each task:

python -m ap.sh.blocks.blocks_sdqfd_c

Collect training data for action priors using experts. While running, these scripts might need up to 200 GB of storage for intermediate files (we are storing a bunch of uncompressed images and binary maps). However, the final dataset size is going to be much smaller.

python -m ap.sh.blocks.blocks_sdqfd_c_collect
python -m ap.sh.blocks.blocks_sdqfd_c_add_labels
python -m ap.sh.blocks.blocks_sdqfd_c_join_labels

Train action priors:

python -m ap.sh.blocks.blocks_aaf_c_cv

Perform transfer learning with action priors and baselines:

python -m ap.sh.blocks.blocks_DQN_AAF   # action prior exploration
python -m ap.sh.blocks.blocks_DQN_HS    # heuristic exploration
python -m ap.sh.blocks.blocks_DQN_RS    # random exploration

Analyzing Results

Print final per-task performance for action priors exploration (blocks_DQN_AAF), heuristic exploration (blocks_DQN_HS) and random exploration (blocks_DQN_RS).

python -m ap.scr.analyze.blocks_table

Contributors

Citation

@inproceedings{biza21action,
    title={Action Priors for Large Action Spaces in Robotics}, 
    author={Ondrej Biza and Dian Wang and Robert Platt and Jan-Willem van de Meent and Lawson L. S. Wong},
    year = {2021},
    publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
    address = {Richland, SC},
    booktitle = {Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems},
    location = {Online},
    series = {AAMAS '21}
}

action_priors's People

Contributors

ondrejbiza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

action_priors's Issues

Suggest to loosen the dependency on sacred

Hi, your project action_priors(commit id: c7ee56d) requires "sacred==0.8.1" in its dependency. After analyzing the source code, we found that the following versions of sacred can also be suitable, i.e., sacred 0.8.0, since all functions that you directly (5 APIs: sacred.ingredient.Ingredient.add_config, sacred.experiment.Experiment.log_scalar, sacred.observers.mongo.MongoObserver.init, sacred.experiment.Experiment.init, sacred.ingredient.Ingredient.capture) or indirectly (propagate to 33 sacred's internal APIs and 19 outsider APIs) used from the package have not been changed in these versions, thus not affecting your usage.

Therefore, we believe that it is quite safe to loose your dependency on sacred from "sacred==0.8.1" to "sacred>=0.8.0,<=0.8.1". This will improve the applicability of action_priors and reduce the possibility of any further dependency conflict with other projects.

May I pull a request to further loosen the dependency on sacred?

By the way, could you please tell us whether such an automatic tool for dependency analysis may be potentially helpful for maintaining dependencies easier during your development?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.