Git Product home page Git Product logo

chat-with-nerf's Introduction

๐Ÿ“ธ Chat with NeRF: Grounding 3D Objects in Neural Radiance Field through Dialog

Project Paper Video Demo Embark

Demo of Chat-with-NeRF

๐Ÿ’ก Highlight

  • Open-Vocabulary 3D Localization. Locate anything with natural language dialog!
  • Interactive Grounding. Humans will be able to chat with an agent to localize novel objects.

๐Ÿ”ฅ News

๐Ÿท๏ธ TODO

  • A faster process to determine camera poses and rendering pictures. See discussion #15. Implemented in #17.
  • Use LLaVA to replace BLIP-2 for better image captioning.
  • Improve the foundation model (currently CLIP is used) used in LERF for grounding, which can potentially improve spatial and affordance understanding. Potential candidate: LLaVA, BLIP-2, OWL-ViT.

๐Ÿ› ๏ธ Install

To install the dependencies we provide a Dockerfile:

docker build -t chat-with-nerf:latest .

Or if you want to pull remote image from Dockerhub to save significant time, please try:

docker pull jedyang97/chat-with-nerf:latest

Otherwise, if you prefer build it locally:

conda create --name nerfstudio -y python=3.8
conda activate nerfstudio
pip install torch==1.13.1 torchvision functorch --extra-index-url https://download.pytorch.org/whl/cu117
pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install nerfstudio

git clone https://github.com/kerrj/lerf
python -m pip install -e .
ns-train -h

Note that specific CUDA 11.3 is required. For further information, please check nerfstudio installation guide.

Then locally you need to run

git clone https://github.com/sled-group/chat-with-nerf.git

Download and construct the llava-13b-v0 checkpoint (see LLaVA's documentation on how to construct the checkpoint). Then assuming you store the constructed llava-13b-v0 checkpoint under <my_path_to_llava>/llava-13b-v0, move the checkpoint to /chat-with-nerf/pre-trained-weights/LLaVA.

cd chat-with-nerf
mkdir -p pre-trained-weights/LLaVA
cd pre-trained-weights/LLaVA
mv <my_path_to_llava>/llava-13b-v0 .

Alternatively, you can supply a different version of LLaVA checkpoint and change LLAVA_PATH's value in chat_with_nerf/settings.py:

    LLAVA_PATH = "/workspace/pre-trained-weights/LLaVA/<my_llava_checkpoint>"

Open up your directory's permission for the docker container:

cd <parent_path_chat-with-nerf>
chmod -R 777 .

If using Docker, you can use the following command to spin up a docker container with chat-with-nerf mounted under workspace

docker run --gpus "device=0" -v /<parent_path_chat-with-nerf>/:/workspace/ -v /home/<your_username>/.cache/:/home/user/.cache/ --rm -it --shm-size=12gb chat-with-nerf:latest

Then install Chat with NeRF dependencies

cd /workspace/chat-with-nerf
pip install -e .
pip install -e .[dev]

(or use your favorite virtual environment manager)

To run the demo:

cd /workspace/chat-with-nerf
export $(cat .env | xargs); gradio chat_with_nerf/app.py
Extracting openscene embeddings

For extracting the openscene embeddings, we used the pre-trained Distillation model checkpoint, shared by the Openscene Authors for generating the representation. To generate the corresponding representations, kindly refer to the guidelines provided in the Openscene GitHub repository, specifically focusing on the Data Preparation and Run Sections.

https://github.com/pengsongyou/openscene#data-preparation
https://github.com/pengsongyou/openscene#run

Related Work

Citation

 @misc{chat-with-nerf-2023,
    title = {Chat with NeRF: Grounding 3D Objects in Neural Radiance Field through Dialog},
    url = {https://github.com/sled-group/chat-with-nerf},
    author = {Yang, Jianing and Chen, Xuweiyi and Qian, Shengyi and Fouhey, David and Chai, Joyce},
    month = {May},
    year = {2023}
}

chat-with-nerf's People

Contributors

jedyang97 avatar xuweiyichen avatar madaan-nikhil avatar jasonqsy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.