📸 Chat with NeRF: Grounding 3D Objects in Neural Radiance Field through Dialog

💡 Highlight

Open-Vocabulary 3D Localization. Locate anything with natural language dialog!
Interactive Grounding. Humans will be able to chat with an agent to localize novel objects.

🔥 News

[2023-09-21] We released a paper LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent that includes more quantitative evaluations of our pipeline!
[2023-05-31] We improved the demo by adding grounding result visualization in 3D, taking pictures in real time, and speeding up inference by parallelization. Try out the new demo!
[2023-05-15] The first version of chat-with-nerf is available now! Please try out demo!

🏷️ TODO

A faster process to determine camera poses and rendering pictures. See discussion #15. Implemented in #17.
Use LLaVA to replace BLIP-2 for better image captioning.
Improve the foundation model (currently CLIP is used) used in LERF for grounding, which can potentially improve spatial and affordance understanding. Potential candidate: LLaVA, BLIP-2, OWL-ViT.

🛠️ Install

To install the dependencies we provide a Dockerfile:

docker build -t chat-with-nerf:latest .

Or if you want to pull remote image from Dockerhub to save significant time, please try:

docker pull jedyang97/chat-with-nerf:latest

Otherwise, if you prefer build it locally:

conda create --name nerfstudio -y python=3.8
conda activate nerfstudio
pip install torch==1.13.1 torchvision functorch --extra-index-url https://download.pytorch.org/whl/cu117
pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install nerfstudio

git clone https://github.com/kerrj/lerf
python -m pip install -e .
ns-train -h

Note that specific CUDA 11.3 is required. For further information, please check nerfstudio installation guide.

Then locally you need to run

git clone https://github.com/sled-group/chat-with-nerf.git

Download and construct the llava-13b-v0 checkpoint (see LLaVA's documentation on how to construct the checkpoint). Then assuming you store the constructed llava-13b-v0 checkpoint under <my_path_to_llava>/llava-13b-v0, move the checkpoint to /chat-with-nerf/pre-trained-weights/LLaVA.

cd chat-with-nerf
mkdir -p pre-trained-weights/LLaVA
cd pre-trained-weights/LLaVA
mv <my_path_to_llava>/llava-13b-v0 .

Alternatively, you can supply a different version of LLaVA checkpoint and change LLAVA_PATH's value in chat_with_nerf/settings.py:

    LLAVA_PATH = "/workspace/pre-trained-weights/LLaVA/<my_llava_checkpoint>"

Open up your directory's permission for the docker container:

cd <parent_path_chat-with-nerf>
chmod -R 777 .

If using Docker, you can use the following command to spin up a docker container with chat-with-nerf mounted under workspace

docker run --gpus "device=0" -v /<parent_path_chat-with-nerf>/:/workspace/ -v /home/<your_username>/.cache/:/home/user/.cache/ --rm -it --shm-size=12gb chat-with-nerf:latest

Then install Chat with NeRF dependencies

cd /workspace/chat-with-nerf
pip install -e .
pip install -e .[dev]

(or use your favorite virtual environment manager)

To run the demo:

cd /workspace/chat-with-nerf
export $(cat .env | xargs); gradio chat_with_nerf/app.py

Extracting openscene embeddings

For extracting the openscene embeddings, we used the pre-trained Distillation model checkpoint, shared by the Openscene Authors for generating the representation. To generate the corresponding representations, kindly refer to the guidelines provided in the Openscene GitHub repository, specifically focusing on the Data Preparation and Run Sections.

https://github.com/pengsongyou/openscene#data-preparation
https://github.com/pengsongyou/openscene#run

Related Work

Citation

 @misc{chat-with-nerf-2023,
    title = {Chat with NeRF: Grounding 3D Objects in Neural Radiance Field through Dialog},
    url = {https://github.com/sled-group/chat-with-nerf},
    author = {Yang, Jianing and Chen, Xuweiyi and Qian, Shengyi and Fouhey, David and Chai, Joyce},
    month = {May},
    year = {2023}
}

roboticsintelligence / chat-with-nerf Goto Github PK

chat-with-nerf's Introduction

📸 Chat with NeRF: Grounding 3D Objects in Neural Radiance Field through Dialog

💡 Highlight

🔥 News

🏷️ TODO

🛠️ Install

Extracting openscene embeddings

Related Work

Citation

chat-with-nerf's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent