guanxinglu / manigaussian Goto Github PK

[ECCV 2024] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

License: MIT License

Python 98.11% Shell 1.89%

manigaussian's Introduction

ManiGaussian

🦾 ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang

[Project Page] | [Paper]

ManiGaussian is an end-to-end behavior cloning agent that learns to perform various language-conditioned robotic manipulation tasks, which consists of a dynamic Gaussian Splatting framework and a Gaussian world model to model scene-level spatiotemporal dynamics. The dynamic Gaussian Splatting framework models the propagation of semantic features in the Gaussian embedding space for manipulation, and the Gaussian world model parameterizes distributions to provide supervision by reconstructing the future scene.

📝 TODO

Release pretrained checkpoints.
Provide a Dockerfile for installation.

💻 Installation

NOTE: ManiGaussian is mainly built upon the GNFactor repo by Ze et al.

See INSTALL.md for installation instructions.

See ERROR_CATCH.md for error catching.

🛠️ Usage

The following steps are structured in order.

🦉 Generate Demonstrations

To generate demonstrations for all 10 tasks we use in our paper, run:

bash scripts/gen_demonstrations_all.sh

📈 Training

We use wandb to log some curves and visualizations. Login to wandb before running the scripts.

wandb login

To train our ManiGaussian without semantic features and deformation predictor (the fastest version), run:

bash scripts/train_and_eval_w_geo.sh ManiGaussian_BC 0,1 12345 ${exp_name}

where the exp_name can be specified as you like. You can also train other baselines such as GNFACTOR_BC and PERACT_BC.

To train our ManiGaussian without semantic features, run:

bash scripts/train_and_eval_w_geo_dyna.sh ManiGaussian_BC 0,1 12345 ${exp_name}

To train our ManiGaussian without deformation predictor, run:

bash scripts/train_and_eval_w_geo_sem.sh ManiGaussian_BC 0,1 12345 ${exp_name}

To train our vanilla ManiGaussian, run:

bash scripts/train_and_eval_w_geo_sem_dyna.sh ManiGaussian_BC 0,1 12345 ${exp_name}

We train our ManiGaussian on two NVIDIA RTX 4090 GPUs for ~1 day.

🧪 Evaluation

To evaluate the checkpoint, you can use:

bash scripts/eval.sh ManiGaussian_BC ${exp_name} 0

NOTE: The performances on push_buttons and stack_blocks may fluctuate slightly due to different variations.

📊 Analyze Evaluation Results

After evaluation, the following command is used to compute the average success rates. For example, to compute the average success rate of our provided csv files, run:

python scripts/compute_results.py --file_paths ManiGaussian_results/w_geo/0.csv ManiGaussian_results/w_geo/1.csv ManiGaussian_results/w_geo/2.csv --method last

🌴 Checkpoint

Checkpoint. After downloading it to your logs/ folder, you can run the following command to check the result:

python scripts/compute_results.py --file_paths logs/gs_rgb_emb_001_dyna_01_0305/seed0/eval_data.csv --method last

🏷️ License

This repository is released under the MIT license.

🙏 Acknowledgement

Our code is built upon GNFactor, LangSplat, GPS-Gaussian, splatter-image, PerAct, RLBench, pixelNeRF, ODISE, and CLIP. We thank all these authors for their nicely open sourced code and their great contributions to the community.

🥰 Citation

If you find this repository helpful, please consider citing:

@article{lu2024manigaussian,
      title={ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation}, 
      author={Lu, Guanxing and Zhang, Shiyi and Wang, Ziwei and Liu, Changliu and Lu, Jiwen and Tang, Yansong},
      journal={arXiv preprint arXiv:2403.08321},
      year={2024}
}

manigaussian's People

Contributors

Stargazers

Watchers

Forkers

abecid dingpx 0iui0 whuhxb hiyyg brucewaiwai

manigaussian's Issues

Question about rlbench depth data

I found that the generated depth data from gen_demonstration was quite different from other depth data. Do you think it is a intended result?

Question about lr_scheduler

Hi Guanxing,

I notice that lr_scheduler is set to be False in ManiGaussian/conf/method/ManiGaussian_BC.yaml config file, resulting in a constant LR of 0.0005 during the training. But the paper says, "We also adopt a cosine scheduler with a warmup in the first 3k steps".
I wonder which lr_scheduler setting can produce the best performance, as shown in the paper?

Regards,
Bowen

About scene.step()

"I found that in YARR.yarr.utils.video_utils.py, there is a scene.step() in NeRFTaskRocorder which updates the scene multiple times in a for loop. However, this causes the ball objects to roll on the table. I suspect that this is due to the high frequency of scene updates, amplifying the effect of certain random forces in the scene. However, if I comment out scene.step(), the images of the nerf camera view won't update, although the data can be generated without error. Is there any good solution to this?"

About Visualization

Nice work!

I'm not very familiar with RLBench, so I have a visualization question to consult with you: how can I visualize the execution process based on eval.py in RLBench or save a top-view video like the one presented in your paper? Thank you very much.

Error in eval

eval on test set
[2024-08-16 15:10:19,124][root][INFO] - Using env device cuda:0. (this is just always 0)
[2024-08-16 15:10:19,194][root][INFO] - Evaluating seed 0.
use_neural_rendering: False
[NeuralRenderer] mask_gt_rgb: False
Last weight: [100000]
Device count: 1
Using device: cuda:0
device: cuda:0
[NeuralRenderer]: False
CLIP model loaded: RN50

Error: signal 11:

/data0/zxt/project/2024/1/mngs/CoppeliaSim_Player_V4_1_0_Ubuntu18_04/libcoppeliaSim.so.1(_Z11_segHandleri+0x2b)[0x7fb5611cca4b]
/lib/x86_64-linux-gnu/libc.so.6(+0x3ef10)[0x7fb69eb15f10]
/data0/zxt/project/2024/1/mngs/CoppeliaSim_Player_V4_1_0_Ubuntu18_04/libcoppeliaSim.so(_Z29simGetInt32Parameter_internaliPi+0x305)[0x7fb427526a55]
/data0/zxt/project/2024/1/mngs/CoppeliaSim_Player_V4_1_0_Ubuntu18_04/libsimExtCustomUI.so(_Z21simGetInt32ParameterEi+0x26)[0x7fb39dbe1366]
/data0/zxt/project/2024/1/mngs/CoppeliaSim_Player_V4_1_0_Ubuntu18_04/libsimExtCustomUI.so(_ZN6Plugin7onStartEv+0x23)[0x7fb39dbc0843]
/data0/zxt/project/2024/1/mngs/CoppeliaSim_Player_V4_1_0_Ubuntu18_04/libsimExtCustomUI.so(simStart+0xb5)[0x7fb39dbbb655]
/data0/zxt/project/2024/1/mngs/CoppeliaSim_Player_V4_1_0_Ubuntu18_04/libcoppeliaSim.so.1(_ZN7CPlugin4loadEv+0xa3)[0x7fb561372cf3]
/data0/zxt/project/2024/1/mngs/CoppeliaSim_Player_V4_1_0_Ubuntu18_04/libcoppeliaSim.so.1(ZN16CPluginContainer9addPluginEPKcS1+0x403)[0x7fb56137a0a3]
/data0/zxt/project/2024/1/mngs/CoppeliaSim_Player_V4_1_0_Ubuntu18_04/libcoppeliaSim.so.1(ZN9CUiThread27__executeCommandViaUiThreadEP16SUIThreadCommandS1+0x1d4a)[0x7fb56130e1da]
/data0/zxt/project/2024/1/mngs/CoppeliaSim_Player_V4_1_0_Ubuntu18_04/libcoppeliaSim.so.1(ZN9CUiThread25executeCommandViaUiThreadEP16SUIThreadCommandS1+0x24)[0x7fb56130eaa4]
QMutex: destroying locked mutex
eclipsed time 50s

_pickle.UnpicklingError: invalid load key, '\x00'.

Dear Author,

I encountered an error when running bash scripts/train_and_eval_w_geo.sh ManiGaussian_BC 0,1,2,3,4,5 5678 ${try_without_tmux} .
I run this code with 6 RTX3090s in Ubuntu20.04, torch==2.0.0+cu117.

However, it shows an error like this during training:

Error executing job with overrides: ['method=ManiGaussian_BC', 'rlbench.task_name=ManiGaussian_BC_20240627', 'rlbench.demo_path=/home/gjf/codes/ManiGaussian/data/train_data', 'replay.path=/home/gjf/codes/ManiGaussian/replay/ManiGaussian_BC_20240627', 'framework.start_seed=0', 'framework.use_wandb=False', 'method.use_wandb=False', 'framework.wandb_group=ManiGaussian_BC_20240627', 'framework.wandb_name=ManiGaussian_BC_20240627', 'ddp.num_devices=6', 'replay.batch_size=1', 'ddp.master_port=5678', 'rlbench.tasks=[close_jar,open_drawer,sweep_to_dustpan_of_size,meat_off_grill,turn_tap,slide_block_to_color_target,put_item_in_drawer,reach_and_drag,push_buttons,stack_blocks]', 'rlbench.demos=20', 'method.neural_renderer.render_freq=2000']
Traceback (most recent call last):
  File "/home/gjf/codes/ManiGaussian/train.py", line 96, in main
    run_seed_fn.run_seed(
  File "/home/gjf/codes/ManiGaussian/run_seed_fn.py", line 147, in run_seed
    train_runner.start()
  File "/home/gjf/codes/ManiGaussian/third_party/YARR/yarr/runners/offline_train_runner.py", line 200, in start
    batch = self.preprocess_data(data_iter)
  File "/home/gjf/codes/ManiGaussian/third_party/YARR/yarr/runners/offline_train_runner.py", line 121, in preprocess_data
    sampled_batch = next(data_iter) # may raise StopIteration
  File "/home/gjf/miniconda3/envs/manigaussian/lib/python3.9/site-packages/lightning/fabric/wrappers.py", line 178, in __iter__
    for item in self._dataloader:
  File "/home/gjf/miniconda3/envs/manigaussian/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/home/gjf/miniconda3/envs/manigaussian/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 678, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/gjf/miniconda3/envs/manigaussian/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 41, in fetch
    data = next(self.dataset_iter)
  File "/home/gjf/codes/ManiGaussian/third_party/YARR/yarr/replay_buffer/wrappers/pytorch_replay_buffer.py", line 17, in _generator
    yield self._replay_buffer.sample_transition_batch(pack_in_dict=True)
  File "/home/gjf/codes/ManiGaussian/third_party/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 722, in sample_transition_batch
    store = self._get_from_disk(
  File "/home/gjf/codes/ManiGaussian/third_party/YARR/yarr/replay_buffer/uniform_replay_buffer.py", line 391, in _get_from_disk
    d = pickle.load(f)
_pickle.UnpicklingError: invalid load key, '\x00'.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
  0%|▎                                                                                                                                                                                                              | 134/100010 [03:10<35:31:44,  1.28s/it]/home/gjf/miniconda3/envs/manigaussian/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 11 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '`

After that, one of the GPU stopped work, the whole program stuck at this place even I pressed Ctrl + C. This happened every time soon after training.

By the way, I did not use tmux or wandb, would this matter?

Could you please help me with this issue?

Conflict in hydra-core version requirements between yarr 0.1 and detectron

When trying to use yarr 0.1 and detectron together, facing a version conflict with hydra-core:

yarr 0.1 requires hydra-core==1.0.5.
detectron requires hydra-core to be >=1.1.

Do you have any plan to release glm?

May I ask if you can provide the GLM file? I reported error glm/glm.hpp no such file or directory here.
Considering that the diff-gaussian-rasterization you provided has been modified from the Guassian Splatting, may I ask if your glm is different from the one in https://github.com/g-truc/glm/tree/5c46b9c07008ae65cb81ab79cd677ecc1934b903. Has it been modified?

By the way, where are the checkpoints saved during training?

Bad demo. Error in task weighing_scales. Demo was completed, but was not successful.

When I run gen_demonstrations.sh

There are errors like this:

Bad demo. Error in task weighing_scales. Demo was completed, but was not successful. Attempts left: 8
Bad demo. Error in task weighing_scales. Demo was completed, but was not successful. Attempts left: 7
Bad demo. Error in task weighing_scales. Demo was completed, but was not successful. Attempts left: 6
......
Is it normal?

Error on pickle.load()

Is there any other option for the code below? It keeps raising an error.

ManiGaussian/third_party/YARR/yarr/runners/offline_train_runner.py line 391 d = pickle.load(f)

        # Here we fake a mini store (buffer)
        store = {store_element.name: {}
                 for store_element in self._storage_signature}
        if start_index % self._replay_capacity < end_index % self._replay_capacity:
            for i in range(start_index, end_index):
                    with open(join(self._save_dir, '%d.replay' % i), 'rb') as f:
                        d = pickle.load(f)  
                        # FIXME: _pickle.UnpicklingError: invalid load key, '\x00' # FIXME: replay file 9656
                        for k, v in d.items():
                            store[k][i] = v

:root:key _neural_renderer.gs_model.code._freqs is found in checkpoint, but not found in current model.

I got this warning when evaluating the ManiGaussian result.

Do you know what kind of problem has occured?

I will list the work that still needs to be done after installing the library functions in requirements.txt

Install CoppeliaSim, Pyrep and RLBench. After installation, there may be problems that "Pyrep.utils" or "Pyrep.utils.video_utils" cannot be found, and "VARIATIONS_ALL_FOLDER" cannot be found. In this case, you need to use Pyrep and RLBench in the third_party directory to replace the corresponding package in the environment.
Intall YARR. Go to the corresponding github web page to download and install. Do not use pip.
Install opencv
The following error may occur during operation: "libGL error: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory", please follow what this website says Do it: https://blog.csdn.net/peng_258/article/details/132500323
Then the following error may appear: "mesa-loader: failed to open swrast: /lib/x86_64-linux-gnu/libllvm-12.so.1: undefined symbol: ffi_type_sint32", please do what this person said: https ://github.com/elerac/EasyPySpin/issues/12#issuecomment-1467610326
Install hydra, pytorch.

If there is anything missing, you are welcome to add it.

CUDA_HOME not set

When I was installing gaussian splatting renderer of step 12, I got this error:
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Any hint for it?

Error in swrast

About the following error:

libGL error: failed to load driver: swrast

I solved this by using:

sudo apt-get install --reinstall libffi7
conda install -c conda-forge libffi

And Link can help.

I note this for anyone else who has the same error as me.

Can't load tokenizer for 'openai/clip-vit-large-patch14'

when I run training, the error "Can't load tokenizer for 'openai/clip-vit-large-patch14'" occurs.

No module named 'yarr.utils.video_utils'

About real world experiments

Hi,

I noticed that there are no real-world experiments for this method. I was wondering if there are issues with real-world implementation or if there is another reason.

Thanks.

Questions about multi-task test results

Hello, the success rates reported by your paper for 10 tasks are 28%, 76%, 64%, 60%, 56%, 24%, 16%, 92%, 20% and 12%. However, after evaluating, I get the success rates as 24%, 68%, 28%, 44%, 48%, 12%, 8%, 44%, 20% and 4%. Do you know the reasons? Thanks.

Bus error (core dumped) when evaluate the result

Hi, I got a Bus error when executing the evaluation codes..

Do you know how to handle this? I think the training process was finished properly, and the data exist too. Have you faced this problem?

About scene.step()

About Training Time

Dear Authors,

How long does your method need to train?

Thanks

Plans for releasing Docker

Do you have any plans for releasing docker?

) Do you know how many time it will take to generate all demonstrations?

I'm using 1 L40S gpu and there is no more progress than the shown Images (Almost 1 hr has passed).. Does it take longer than 1hr for close jar demo? Or there is some problem with generating the demonstrations?

There's no output in the train_data folder.

When to release pretrained checkpoints?

Thank you for your great job. I'd like to replicate your work, but I don't seem to have the equipment for training. When will you release pretrained checkpoints?

Error in training