OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang · Xiaoyang Wu · Xi Chen · Hengshuang Zhao Lei Zhu · Joan Lasenby

Paper | Video | Project Page

TL;DR: OpenIns3D proposes a "mask-snap-lookup" scheme to achieve 2D-input-free 3D open-world scene understanding, which attains SOTA performance across datasets, even with fewer input prerequisites. 🚀✨

device to watch BBC news furniture that is capable of producing music Ma Long's domain of excellence

most comfortable area to sit in the room penciling down ideas during brainstorming furniture offers recreational enjoyment with friends

OpenIns3D pipeline

Highlights

6 Jan, 2024: We have released a major revision, incorporating S3DIS and ScanNet benchmark code. Try out the latest version here 🔥🔥.

31 Dec, 2023 We release the batch inference code on ScanNet.

31 Dec, 2023 We release the zero-shot inference code， test it on your own data!

Sep, 2023: OpenIns3D is released on arXiv, alongside with explanatory video, project page. We will release the code at end of this year.

Overview

Installation

Zero-Shot Scene Understanding

Benchmarking on ScanNetv2 and S3DIS

Citation

Acknowledgement

Installation

Requirements

CUDA: 11.6

PyTorch: 11.3

Hardware: one 24G memory GPU or better

(Note: that several scenes in S3DIS are very large and may lead to RAM collapse if 24GB GPU is used)

Setup

Install dependencies by running:

conda create -n openins3d python=3.9 conda activate openins3d conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia conda install pytorch3d -c pytorch3d conda install lightning -c conda-forge conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' conda install nltk cd third_party/pointnet2 python setup.py install cd ../ # install MinkowskiEngine for MPM git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine" # clone the repo to third_party cd MinkowskiEngine git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228 python setup.py install --force_cuda --blas=openblas cd ../../ # install ODISE as 2D detectors git clone https://github.com/NVlabs/ODISE.git cd ODISE pip install -e . cd .. pip install torch_scatter gdown==v4.6.3 loguru open3d plyfile pyviz3d python-dotenv omegaconf==2.1.1 iopath==0.1.8

Zero-Shot Scene Understanding

To achieve zero-shot scene understanding with OpenIns3D, follow these two steps:

Download Checkpoint for Mask Proposal Module: - we recommend downloading scannet200_val.ckpt here and placing it under checkpoints/.

Run python zero_shot.py by specifying a) pcd_path: the path of the colored point cloud. b）vocab: vocabulary list that is searching for. ODISE is the 2D detector, so the format of vocab is followed ODISE

We provide several sample datasets from Replica, Mattarport3d, and S3DIS, Scannet for quick testing. Run the following code to download demo data

pip install gdown==v4.6.3 cd demo; python download_demo_scenes.py

(If you are experiencing issues downloading the demo scene files, please ensure that you have the correct version of gdown)

Example of testing:

# replica demo python zero_shot.py \ --pcd_path 'demo/demo_scene/replica/replica_scene3.ply' \ --vocab "lamp; blinds; chair; table; door; bowl; window; switch; bottle; indoor-plant; pillow; vase; handrail; basket; bin; shelf; tv-screen; sofa; blanket; bike; sink; bed; stair; refrigerator" \ --dataset replica # scannet demo python zero_shot.py \ --pcd_path 'demo/demo_scene/scannet_scene1.ply' \ --vocab "cabinet; bed; chair; sofa; table; door; window; bookshelf; picture; counter; desk; curtain; refrigerator; showercurtain; toilet; sink; bathtub" \ --dataset scannet # mattarport3d demo python zero_shot.py \ --pcd_path 'demo/demo_scene/mattarport3d/mp3d_scene1.ply' \ --vocab "chair; window; ceiling; picture; floor; lighting; table; cabinet; curtain; plant; shelving; sink; mirror; stairs; counter; stool; bed; sofa; shower; toilet; TV; clothes; bathtub; blinds; board" \ --dataset mattarport3d # s3dis demo python zero_shot.py \ --pcd_path 'demo/demo_scene/s3dis/s3dis_scene3.npy' \ --vocab "floor; wall; beam; column; window; door; table; chair; sofa; bookcase; board" \ --dataset s3dis # cuosmtized data python zero_shot.py \ --pcd_path 'path/to/your/own/3dscene' \ --vocab "vocabulary list to be used" \

The dataset flag is only for adjusting the loading for different .ply files. For customizing the dataset, use 'scanent' as the default. Let us know if you encounter any issues! 📣

Visulize the results

You can check out the detection results as well as the Snap images, Class_Lookup_Dict, and final results under demo_saved.

When using your coustmize dataset:

feel free to change the three parameters [lift_cam, zoomout, remove_lip] under adjust_camera to optimise the snap images for better detection.

Benchmarking on ScanNetv2 and S3DIS

Here we provide instructions to reproduce the results on ScanNetv2 and S3DIS.

(Note: first time run will take a while 🕙 to download checkpoint of 2D detector ODISE automatically)

ScanNetv2:

Download ScanNetv2. (Note: No need to download the .sens file as 2D images are not used)

Pre-process the ScanNetv2 dataset by following the same code in Mask3d, as follows:

python -m openins3d.mask3d.datasets.preprocessing.scannet_preprocessing preprocess \ --data_dir="PATH_TO_RAW_SCANNET_DATASET" \ --save_dir="input_data/processed/scannet" \ --git_repo="PATH_TO_SCANNET_GIT_REPO" \ --scannet200=false

Download the pre-trained Mask Proposal weights from here and place it under checkpoints.

Double-check three paths under scannet_benchmark.sh: include SCANNET_PROCESSED_DIR, SCAN_PATH, and MPM_CHECKPOINT. Change them accordingly. Once changes are made, run the bash file. The bash file will first generate a class-agnostic mask proposal for the 312 scenes, each maks stored as a sparse tensor. Then, Snap and Lookup modules will be implemented under inference_openins3d.py. Eventually, evaluate.py can be called to evaluate the performance by calculating the AP values of the mask detections.

sh scannet_benchmark.sh

S3DIS

Download S3DIS data by filling out this Google form. Download the Stanford3dDataset_v1.2.zip file and unzip it.

Preprocess the dataset with the following code:

python -m openins3d.mask3d.datasets.preprocessing.s3dis_preprocessing preprocess \ --data_dir="PATH_TO_Stanford3dDataset_v1.2" \ --save_dir="input_data/processed/s3dis"

If you encounter issues in preprocessing due to bugs in the S3DIS dataset file, please refer to this issue in the Mask3D repo to fix it.

Download the pre-trained Mask proposal from here and place it under checkpoints.

Double-check two file paths under s3dis_benchmark.sh: include S3DIS_PROCESSED_DIR and MPM_CHECKPOINT. Change them accordingly and then run:

sh s3dis_benchmark.sh

(Note that several scenes in S3DIS are very large and may lead to RAM complications if 24GB is used. Large VRAM is recommended.)

To do

Release the batch inference code on STPLS3D

Release checkpoints for limited supervision on S3DIS, ScanNetV2

Release Evaluation Script for 3D Open-world Object Detection

Citation

If you find OpenIns3D useful for your research, please cite our work as a form of encouragement. 😊

@article{huang2023openins3d, title={OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation}, author={Zhening Huang and Xiaoyang Wu and Xi Chen and Hengshuang Zhao and Lei Zhu and Joan Lasenby}, journal={arXiv preprint}, year={2023} }

Acknowlegement

The mask proposal model is modified from Mask3D, and we heavily used the easy setup version of it for MPM. Thanks again for the great work! 🙌 We also drew inspiration from LAR and ContrastiveSceneContexts when developing the code. 🚀


device to watch BBC news	furniture that is capable of producing music	Ma Long's domain of excellence

most comfortable area to sit in the room	penciling down ideas during brainstorming	furniture offers recreational enjoyment with friends

skimslozo / openins3d Goto Github PK

openins3d's Introduction

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Paper | Video | Project Page

OpenIns3D pipeline

Highlights

Overview

Installation

Requirements

Setup

Zero-Shot Scene Understanding

Visulize the results

Benchmarking on ScanNetv2 and S3DIS

ScanNetv2:

S3DIS

To do

Citation

Acknowlegement

openins3d's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org