Git Product home page Git Product logo

diffusion-for-shared-autonomy's Introduction

To the Noise and Back: Diffusion for Shared Autonomy

Takuma Yoneda, Luzhe Sun, Ge Yang, Bradly C. Stadie, Matthew R. Walter

[Paper] [Website]

fwd-rev-diffusion

Prerequisite

Our script uses Weights and Biases (WandB) to log metrics. Please make sure to set the environment variable WANDB_API_KEY.

export WANDB_API_KEY=<your WANDB key>

If you don't plan to use WandB, you can simply set the environment variable to WANDB_MODE=disable, which makes all WandB functions behave as no-ops. You should then replace wandb.log(...) with your logging method.

Quickstart ๐Ÿš€

git clone https://github.com/ripl/diffusion-for-shared-autonomy.git
cd diffusion-for-shared-autonomy
mkdir output-dir
export CODE_DIR=`pwd`
export DATA_DIR=$CODE_DIR/data-dir  # Pretrained models are stored here
export OUT_DIR=$CODE_DIR/output-dir

You can pull our docker image and run a container via

docker run -it --gpus all -e WANDB_API_KEY=$WANDB_API_KEY -v $CODE_DIR:/code -v $DATA_DIR:/data -v $OUT_DIR:/outdir --workdir /code ripl/diffusion-for-shared-autonomy bash
If you use Singularity
# Pull the image from dockerhub
singularity pull diffusion-for-shared-autonomy.sif docker://ripl/diffusion-for-shared-autonomy:latest
export SINGULARITYENV_WANDB_API_KEY=$WANDB_API_KEY  # Singularity way to pass an envvar into a container
singularity run --nv --containall --writable-tmpfs -B $CODE_DIR:/code -B $DATA_DIR:/data -B $OUT_DIR:/outdir --pwd /code diffusion-for-shared-autonomy.sif bash

Now, inside of the container, you can run surrogate pilots with / without the assistance of our diffusion models.

python -m diffusha.diffusion.evaluation.eval_assistance --env-name LunarLander-v1 --out-dir /outdir --save-video

This takes ~30 min.

--env-name can be one of the following

  • LunarLander-v1 (Lunar Reacher)
  • LunarLander-v5 (Lunar Lander)
  • BlockPushMultimodal-v1

This will generate and log videos on WandB console, and saves evaluation statistics under output-dir.

NOTE: The following errors can be safely ignored

ALSA lib confmisc.c:767:(parse_card) cannot find card '0'
ALSA lib conf.c:4732:(_snd_config_evaluate) function xxxxx returned error: No such file or directory


Going through the entire pipeline ๐Ÿข

The following sections describe how to manually run all the steps of our pipeline. Specifically,

  1. Pretraining expert policies
  2. Collecting demonstrations
  3. Training a diffusion model

Preparing the dataset for training a diffusion model

You can either follow the instruction below, or you can also download the resulting dataset:

You can also download the resulting dataset
cd diffusion-for-shared-autonomy
wget https://dl.ttic.edu/diffusion-for-shared-autonomy.tar.gz
tar xfvz diffusion-for-shared-autonomy
mv hosted_data/* data-dir/

Pretraining expert policies

We first need to train an expert policy for each task (we use Soft Actor-Critic (SAC)). (You can also download the pretrained SAC models from the link in the Quickstart section above.

Lunar Reacher

python -m diffusha.data_collection.train_sac --env-name LunarLander-v1 --steps 3000000

Lunar Lander

python -m diffusha.data_collection.train_sac --env-name LunarLander-v5 --steps 3000000

Block Pushing

python -m diffusha.data_collection.train_sac --env-name BlockPushMultimodal-v1 --steps 1000000

The checkpoints will be saved under Path(Args.sac_model_dir) / args.env_name.lower() / wandb.run.project / wandb.run.id. Args.sac_model_dir can be configured in diffusha/config/default_args.py.

wandb.run.project and wandb.run.id are here only to make sure a unique id is given for each run. You may simply replace these with other strings if you don't use wandb.

Rollout expert policies to collect demonstration

Once expert policies are obtained, we use them to collect expert demonstrations.

You can also download the generated demonstrations from the link (See the previous section).

  1. Store the pretrained expert models in the following locations:
  • Lunar Reacher: $DATA_DIR/experts/lunarlander/v1
  • Lunar Lander: $DATA_DIR/experts/lunarlander/v5
  • Block Pushing: $DATA_DIR/experts/blockpush
  1. Run generate_data.py as follows: Lunar Reacher You can run the script with a sweep file: python -m diffusha.data_collection.generate_data -l 0 --sweep-file diffusha/data_collection/config/sweep/sweep_lander-v1.jsonl.

Running this^ is equivalent to running: python -m diffusha.data_collection.generate_data after manually editing diffusha/data_collection/config/default_args.py:

class DCArgs(PrefixProto):
    ...
    env_name = 'LunarLander-v1'
    valid_return_threshold = 800
    randp = 0.0
    ...

Lunar Lander Landing python -m diffusha.data_collection.generate_data -l 0 --sweep-file diffusha/data_collection/config/sweep/sweep_lander-v5.jsonl

Block Pushing python -m diffusha.data_collection.generate_data -l 0 --sweep-file diffusha/data_collection/config/sweep/sweep-blockpush.jsonl

Block Pushing (Running flip-replay)
For the Block Pushing task, we only collect demonstrations that push a block to one of the goals. We obtain demonstrations that reach the other goal by "flipping" the collected trajectories.

python -m diffusha.data_collection.flip_replay /data/replay/blockpush/target/randp_0.0 /data/replay/blockpush/target-flipped/randp_0.0

Training a diffusion model

After obtaining demonstrations, we are finally ready to train a diffusion model. Store the generated demonstrations in the correct locations:

  • Lunar Reacher: $DATA_DIR/replay/lunarlander/v1/randp_0.0
  • Lunar Lander: $DATA_DIR/replay/lunarlander/v5/randp_0.0
  • Block Pushing:
    • $DATA_DIR/replay/blockpush/target/randp_0.0
    • $DATA_DIR/replay/blockpush/target-flipped/randp_0.0 (flipped replay)

Run the following command from the root of the project directory:

Lunar Lander / Reacher

  • reacher task: python -m diffusha.diffusion.train --sweep-file diffusha/config/sweep/sweep-lunarlander.jsonl -l 0
  • lander task: python -m diffusha.diffusion.train --sweep-file diffusha/config/sweep/sweep-lunarlander.jsonl -l 1

Block Pushing

  • python -m diffusha.diffusion.train --sweep-file diffusha/config/sweep/sweep-blockpush.jsonl -l 0

If you find our work useful in your research, please consider citing the paper as follows:

@article{yoneda2023diffusha,
        doi = {10.48550/ARXIV.2302.12244},
        url = {https://arxiv.org/abs/2302.12244},
        author = {Yoneda, Takuma and Sun, Luzhe and Stadie, Bradly and Yang, Ge and Walter, Matthew R.},
        keywords = {Robotics (cs.RO), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
        title = {To the Noise and Back: {D}iffusion for Shared Autonomy},
        year = {2023},
        journal={arXiv preprint arXiv:2302.12244},
    }

diffusion-for-shared-autonomy's People

Contributors

yosider avatar takuma-yoneda avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.