Git Product home page Git Product logo

paris3d's Introduction

PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model

PARIS3D is accepted to ECCV 2024!

paris3d_video.mp4

This is the official implementation of "PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model". We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations corresponding to 3D object segmentation requests. Experiments show that our method achieves competitive performance to models that use explicit queries, with the additional abilities to identify part concepts, reason about them, and complement them with world knowledge. teaser results on RPSeg dataset

real_pc results on real-world point clouds

Abstract

Recent advancements in 3D perception systems have significantly improved their ability to perform visual recognition tasks such as segmentation. However, these systems still heavily rely on explicit human instruction to identify target objects or categories, lacking the capability to actively reason and comprehend implicit user intentions. We introduce a novel segmentation task known as reasoning part segmentation for 3D objects, aiming to output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object.

teaser PARIS3D Architecture

Installation

Create a conda envrionment and install dependencies.

conda env create -f environment.yml
conda activate paris3d
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Install PyTorch3D

We utilize PyTorch3D for rendering point clouds. Please install it by the following commands or its official guide:

pip install "git+https://github.com/facebookresearch/pytorch3d.git" 

Install cut-pursuit

We utilize cut-pursuit for computing superpoints. Please install it by the following commands or its official guide:

CONDAENV=YOUR_CONDA_ENVIRONMENT_LOCATION
cd partition/cut-pursuit
mkdir build
cd build
cmake .. -DPYTHON_LIBRARY=$CONDAENV/lib/libpython3.9.so -DPYTHON_INCLUDE_DIR=$CONDAENV/include/python3.9 -DBOOST_INCLUDEDIR=$CONDAENV/include -DEIGEN3_INCLUDE_DIR=$CONDAENV/include/eigen3
make

Quick-Demo

Download pretrained checkpoints

You can find the pre-trained checkpoints from here.

Inference

After downloading the checkpoint file, you can use the following command to run inference for them.

python3 run.py

The script will generate the following files:

rendered_img/: rendering of the input point cloud from 10 different views.
paris3d_pred/: 2D masks generated by PARIS3D for each view.
superpoint.ply: Generated super points for the input point cloud for converting masks to 3D segmentation. Different super points are in different colours.
semantic_seg/: visualization of semantic segmentation results for each part. Coloured in white or black.

Evaluation

sem_seg_eval.py provides a script to calculate the mIoUs reported in the paper.

RPSeg Dataset

Our dataset comprises 2624 3D objects and over 60k instructions. We use 718 objects and their corresponding instructions as the train set, and the remaining 1906 objects along with their instructions are used for testing. For reliable and fair assessment, we have aligned the 3D objects with those from PartNet-Ensemble, annotating them with implicit text instructions and using ground truth labels to generate high-quality target masks. You can find the dataset used in our paper from here.

train: This is the reasoning data used for training the model. 
test: The test data is similar to [PartSLIP](https://arxiv.org/abs/2212.01558) but supported with text instructions.
Explanatory: A JSON file that supports the training data with detailed explanations.
PartNetE_meta.json: part names trained and evaluated of all 45 categories.

If you find our work helpful, please cite:

@misc{kareem2024paris3d,
      title={PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model}, 
      author={Amrin Kareem and Jean Lahoud and Hisham Cholakkal},
      year={2024},
      eprint={2404.03836},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

Our work is heavily based on PartSLIP and LISA.

paris3d's People

Contributors

amrinkareem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

cfcy-zch

paris3d's Issues

cmake failed

Hello,

Thank you for sharing your work!
When I run the command cmake .. -DPYTHON_LIBRARY=$CONDAENV/lib/libpython3.9.so -DPYTHON_INCLUDE_DIR=$CONDAENV/include/python3.9 -DBOOST_INCLUDEDIR=$CONDAENV/include -DEIGEN3_INCLUDE_DIR=$CONDAENV/include/eigen3, the error occured:
image
So did you not upload the corresponding CMakeLists.txt?

Thanks in advance!

A question for demo

Hello!
Thanks for your sharing!I'm very interested in this work.
But I meet some problems when trying to run the demo. After running python3, it shows that FileNotFoundError: [Errno 2] No such file or directory: because there isn't idx.npy, coor.npy ,sp.npy. In the dataset, I also can't find the idx.npy, coor.npy , sp.npy. Can you tell me how can I get the idx.npy, coor.npy ,sp.npy. Thank you very much!

Very Inspiring Work!

BTW, could you please enlighten me about how to train this code? Did you use LISA's pretrained weights?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.