Lessons from Learning to Spin “Pens”

This repository contains a reference PyTorch implementation of the paper:

Lessons from Learning to Spin “Pens”
Jun Wang*, Ying Yuan*, Haichuan Che*, Haozhi Qi*, Yi Ma, Jitendra Malik, Xiaolong Wang
[Website] [Paper]

Installation

See installation instructions.

Introduction

Our pen spinning method contains the following four steps.

Learn a oracle policy with privileged information, point-clouds, and tactile sensor output with RL in simulation.
Learn a student policy using the rollout of the oracle policy, also in simulation.
Rollout trajectories generated by the oracle policy in a real robot, with initial state distribution matched. The success trajectories are collected while failures are discarded.
Finetune the student policy in step 2 with the real-world successful trajectories.

The following session only provides example script of our method. For baselines, checkout baselines.

Step 0: Visualize a Pre-trained Oracle Policy

cd outputs/AllegroHandHora
gdown 1LCRFE6lvKSUDPpUfEATOmpDUPDbB7n8d
unzip demo.zip -d ./
cd ../../
scripts/vis_teacher.sh demo

Step 1: Oracle Policy training

To train an oracle policy $f$ with RL, run

# 0 is GPU is
# 42 is experiment seed
scripts/train_teacher.sh 0 42 output_name

After training your oracle policy, you can visualize it as follows:

scripts/vis_teacher.sh output_name

Step 2: Student Policy Pretraining

In this section, we train a proprioceptive student policy by distilling from our trained oracle policy $f$.

Note we use the teacher rollout to train student policy, in contrast to DAgger in previous works.

scripts/train_student_sim.sh train.ppo.is_demon=True train.demon_path=ORACLE_CHECKPOINT_PATH

We have provided a reference teacher checkpoint in Google Drive.

Step 3: Open-Loop Replay in Real Hardware

To generate open-loop replay data for the student policy $\pi$, run

python real/robot_controller/teacher_replay.py --data-collect --exp=0 --replay_data_dir=REPLAY_DATA_DIR

where REPLAY_DATA_DIR is the directory to save the replay data.

Then process the replay data.

Step 4: Real-world Fine-tuning

To fine-tune the student policy $\pi$ using real data, run

scripts/finetune_ppo.sh --real-dataset-folder=REAL_DATA_PATH --checkpoint-path=YOUR_CHECKPOINTPATH

Real Data Download

Please download the real reference data from Google Drive.

Real data:
  real_data.h5 is in the format of h5 file, which contains the following keys:
  -replay_demon_{idx}: the idx-th replay demonstration data
    - qpos: the current qpos of the robot
    - action: the delta action applied to the robot
    - current_target_qpos: the target qpos of the robot

  real_data_full.h5 is a full version of real_data.h5, which contains the following keys:
  -replay_demon_{idx}: the idx-th replay demonstration data
    - qpos: the current qpos of the robot
    - action: the delta action applied to the robot
    - current_target_qpos: the target qpos of the robot
    - rgb_ori: the original rgb image
    - rgb_c2d: the rgb image after camera2depth image processing
    - depth: the depth image
    - pc: the point cloud
    - obj_ends: the position of object ends

Acknowledgement

Note: This repository is built based on Hora and IsaacGymEnvs.

Citing

If you find PenSpin or this codebase helpful in your research, please consider citing:

@article{wang2024penspin,
  author={Wang, Jun and Yuan, Ying and Che, Haichuan and Qi, Haozhi and Ma, Yi and Malik, Jitendra and Wang, Xiaolong},
  title={Lessons from Learning to Spin “Pens”},
  journal={arXiv:2405.07391},
  year={2024}
}

wangyenjen / penspin Goto Github PK

penspin's Introduction

Lessons from Learning to Spin “Pens”

Installation

Introduction

Step 0: Visualize a Pre-trained Oracle Policy

Step 1: Oracle Policy training

Step 2: Student Policy Pretraining

Step 3: Open-Loop Replay in Real Hardware

Step 4: Real-world Fine-tuning

Real Data Download

Acknowledgement

Citing

penspin's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent