Git Product home page Git Product logo

manigaussian's Introduction

ManiGaussian

๐Ÿฆพ ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang

[Project Page] | [Paper]

ManiGaussian is an end-to-end behavior cloning agent that learns to perform various language-conditioned robotic manipulation tasks, which consists of a dynamic Gaussian Splatting framework and a Gaussian world model to model scene-level spatiotemporal dynamics. The dynamic Gaussian Splatting framework models the propagation of semantic features in the Gaussian embedding space for manipulation, and the Gaussian world model parameterizes distributions to provide supervision by reconstructing the future scene.

๐Ÿ“ TODO

  • Release pretrained checkpoints.
  • Provide more results (csv files).
  • Provide a Dockerfile for installation.

๐Ÿ’ป Installation

NOTE: ManiGaussian is mainly built upon the GNFactor repo by Ze et al.

See INSTALL.md for installation instructions.

See ERROR_CATCH.md for error catching.

๐Ÿ› ๏ธ Usage

The following steps are structured in order.

๐Ÿฆ‰ Generate Demonstrations

To generate demonstrations for all 10 tasks we use in our paper, run:

bash scripts/gen_demonstrations_all.sh

๐Ÿ“ˆ Training

We use wandb to log some curves and visualizations. Login to wandb before running the scripts.

wandb login

To train our ManiGaussian without semantic features and deformation predictor (the fastest version), run:

bash scripts/train_and_eval_w_geo.sh ManiGaussian_BC 0,1 12345 ${exp_name}

where the exp_name can be specified as you like. You can also train other baselines such as GNFACTOR_BC and PERACT_BC.

To train our ManiGaussian without semantic features, run:

bash scripts/train_and_eval_w_geo_dyna.sh ManiGaussian_BC 0,1 12345 ${exp_name}

To train our ManiGaussian without deformation predictor, run:

bash scripts/train_and_eval_w_geo_sem.sh ManiGaussian_BC 0,1 12345 ${exp_name}

To train our vanilla ManiGaussian, run:

bash scripts/train_and_eval_w_geo_sem_dyna.sh ManiGaussian_BC 0,1 12345 ${exp_name}

We train our ManiGaussian on two NVIDIA RTX 4090 GPUs for ~1 day.

๐Ÿงช Evaluation

To evaluate the checkpoint, you can use:

bash scripts/eval.sh ManiGaussian_BC ${exp_name} 0

NOTE: The performances on push_buttons and stack_blocks may fluctuate slightly due to different variations.

๐Ÿ“Š Analyze Evaluation Results

After evaluation, the following command is used to compute the average success rates. For example, to compute the average success rate of our provided csv files, run:

python scripts/compute_results.py --file_paths ManiGaussian_results/w_geo/0.csv ManiGaussian_results/w_geo/1.csv ManiGaussian_results/w_geo/2.csv --method last

๐Ÿท๏ธ License

This repository is released under the MIT license.

๐Ÿ™ Acknowledgement

Our code is built upon GNFactor, LangSplat, GPS-Gaussian, splatter-image, PerAct, RLBench, pixelNeRF, ODISE, and CLIP. We thank all these authors for their nicely open sourced code and their great contributions to the community.

๐Ÿฅฐ Citation

If you find this repository helpful, please consider citing:

@article{lu2024manigaussian,
      title={ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation}, 
      author={Lu, Guanxing and Zhang, Shiyi and Wang, Ziwei and Liu, Changliu and Lu, Jiwen and Tang, Yansong},
      journal={arXiv preprint arXiv:2403.08321},
      year={2024}
}

manigaussian's People

Contributors

guanxinglu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.