Git Product home page Git Product logo

gaussctrl's Introduction

[ECCV 2024] GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

Jing Wu*1 , Jia-Wang Bian*2 , Xinghui Li1, Guangrun Wang1, Ian Reid2, Philip Torr1, Victor Adrian Prisacariu1
* denotes equal contribution
1University of Oxford,
2Mohamed bin Zayed University of Artificial Intelligence

teaser

⚙️ Installation

  • Tested on CUDA11.8 + Ubuntu22.04 + NeRFStudio1.0.0 (NVIDIA RTX A5000 24G)

Clone the repo.

git clone https://github.com/ActiveVisionLab/gaussctrl.git
cd gaussctrl

1. NeRFStudio and Lang-SAM

conda create -n gaussctrl python=3.8
conda activate gaussctrl
conda install cuda -c nvidia/label/cuda-11.8.0

GaussCtrl is built upon NeRFStudio, follow this link to install NeRFStudio first. If you are failing to build tiny-cuda-nn, try building from scratch, see here. We recommend using NeRFStudio v1.0.0 with gsplat v0.1.3.

pip install nerfstudio==1.0.0

Install Lang-SAM for mask extraction.

pip install -U git+https://github.com/luca-medeiros/lang-segment-anything.git

pip install -r requirements.txt

2. Install GaussCtrl

pip install -e .

3. Verify the install

ns-train -h

🗄️ Data

Use Our Preprocessed Data

Our preprocessed data are under the data folder, where

We thank these authors for their great work!

Customize Your Data

We recommend to pre-process your data to 512x512, and following this page to process your data.

▶️ Get Started

Method

1. Train a 3DGS

To get started, you first need to train your 3DGS model. We use splatfacto from NeRFStudio.

ns-train splatfacto --output-dir {output/folder} --experiment-name EXPEIMENT_NAME nerfstudio-data --data {path/to/your/data}

2. Edit your model

Once you finish training the splatfacto model, the checkpoints will be saved to output/folder/EXPEIMENT_NAME folder.

Start editing your model by running:

ns-train gaussctrl --load-checkpoint {output/folder/.../nerfstudio_models/step-000029999.ckpt} --experiment-name EXPEIMENT_NAME --output-dir {output/folder} --pipeline.datamanager.data {path/to/your/data} --pipeline.prompt "YOUR PROMPT" --pipeline.guidance_scale 5 --pipeline.chunk_size {batch size of images during editing} --pipeline.langsam_obj 'OBJECT TO BE EDITED' 

Please note that the Lang-SAM is optional here. If you are editing the environment, please remove this argument.

ns-train gaussctrl --load-checkpoint {output/folder/.../nerfstudio_models/step-000029999.ckpt} --experiment-name EXPEIMENT_NAME --output-dir {output/folder} --pipeline.datamanager.data {path/to/your/data} --pipeline.prompt "YOUR PROMPT" --pipeline.guidance_scale 5 --pipeline.chunk_size {batch size of images during editing} 

Here, --pipeline.guidance_scale denotes the classifier-free guidance used when editing the images. --pipeline.chunk_size denotes the number of images edited together during 1 batch. We are using NVIDIA RTX A5000 GPU (24G), and the maximum chunk size is 3. (~22G)

Control the number of reference views using --pipeline.ref_view_num, by default, it is set to 4.

Small Tips

  • If your editings are not as expected, please check the images edited by ControlNet.
  • Normally, conditioning your editing on the good ControlNet editing views is very helpful, which means choosing those good ControlNet editing views as reference views is better.

🔧 Reproduce Our Results

Experiments in the main paper are included in the scripts folder. To reproduce the results, first train the splatfacto model. We take the bear case as an example here.

ns-train splatfacto --output-dir unedited_models --experiment-name bear nerfstudio-data --data data/bear

Then edit the 3DGS by running:

ns-train gaussctrl --load-checkpoint {unedited_models/bear/splatfacto/.../nerfstudio_models/step-000029999.ckpt} --experiment-name bear --output-dir outputs --pipeline.datamanager.data data/bear --pipeline.prompt "a photo of a polar bear in the forest" --pipeline.guidance_scale 5 --pipeline.chunk_size 3 --pipeline.langsam_obj 'bear' 

In our experiments, We sampled 40 views randomly from the entire dataset to accelerate the method, which is set in gc_datamanager.py by default. We split the entire set into 4 subsets, and randomly sampled 10 images in each subset split. Feel free to decrease/increase the number to see the difference by modifying --pipeline.datamanager.subset-num and --pipeline.datamanager.sampled-views-every-subset. Set --pipeline.datamanager.load-all to True, if you want to edit all the images in the dataset.

📷 View Results Using NeRFStudio Viewer

ns-viewer --load-config {outputs/.../config.yml} 

🎥 Render Your Results

  • Render all the dataset views.
ns-gaussctrl-render dataset --load-config {outputs/.../config.yml} --output_path {render/EXPEIMENT_NAME} 
  • Render a mp4 of a camera path
ns-gaussctrl-render camera-path --load-config {outputs/.../config.yml} --camera-path-filename data/EXPEIMENT_NAME/camera_paths/render-path.json --output_path render/EXPEIMENT_NAME.mp4

Evaluation

We use this code to evaluate our method.

Citation

If you find this code or find the paper useful for your research, please consider citing:

@article{gaussctrl2024,
author = {Wu, Jing and Bian, Jia-Wang and Li, Xinghui and Wang, Guangrun and Reid, Ian and Torr, Philip and Prisacariu, Victor},
title = {{GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing}},
booktitle = {ECCV},
year = {2024},
}

gaussctrl's People

Contributors

jingwu2121 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gaussctrl's Issues

The training cost

Hi, nice work!
Just curious about the type and number of GPU used for training, and the time needed for the original training phase.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.