Git Product home page Git Product logo

agile3d's Introduction

AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Yuanwen Yue , Sabarinath Mahadevan , Jonas Schult , Francis Engelmann
Bastian Leibe , Konrad Schindler , Theodora Kontogianni

ICLR 2024

AGILE3D supports interactive multi-object 3D segmentation, where a user collaborates with a deep learning model to segment multiple 3D objects simultaneously, by providing interactive clicks.

News ๐Ÿ“ข

  • [2024/02/05] Benchmark data, training and evaluation code were released.
  • [2024/01/19] Our interactive segmentation tool was released. Try your own scans! ๐Ÿ˜ƒ
  • [2024/01/16] AGILE3D was accepted to ICLR 2024 ๐ŸŽ‰
Table of Contents
  1. Installation
  2. Interactive Tool
  3. Benchmark Setup
  4. Training
  5. Evaluation
  6. Citation
  7. Acknowledgment

Installation ๐Ÿ”จ

Foe training and evaluation, please follow the installation.md to set up the environments.

Interactive Tool ๐ŸŽฎ

Please follow this instruction to play with the interactive tool yourself. It also works without GPU.

We present an interactive tool that allows users to segment/annotate multiple 3D objects together, in an open-world setting. Although the model was only trained on ScanNet training set, it can also segment unseen datasets like S3DIS, ARKitScenes, and even outdoor scans like KITTI-360. Please check the project page for more demos. Also try your own scans ๐Ÿ˜ƒ

Benchmark Setup ๐ŸŽฏ

We conduct evaluation in both interactive single-object 3D segmentation and interactive multi-object 3D segmentation. For the former, we adopt the protocol from InterObject3D. For the latter, we propose our own setup since there was no prior work.

Our quantitative evaluation involves the following datasets: ScanNet (inc. ScanNet40 and ScanNet20), S3DIS and KITTI-360. We provide the processed data in the required format for both benchmarks. You can download the data from here. Please unzip them to the data folder.

If you want to learn more about the benchmark setup, explanations for the processed data, and data processing scripts, see the benchmark document.

Training ๐Ÿš€

We train a single model in multi-object setup on ScanNet40 training set. Once trained, we evaluate the model on both multi-object and single-object setups on ScanNet40, S3DIS, KITTI-360.

The command for training AGILE3D with iterative training on ScanNet40 is as follows:

./scripts/train_multi_scannet40.sh

Note: in the paper we also conducted one experiment where we train AGILE3D on ScanNet20 and evaluate the model on ScanNet40 (1st row in Tab. 1). Instructions for this setup will come later.

Evaluation ๐Ÿ“ˆ

Download the pretrained model and move it to the weights folder.

Evaluation on interactive multi-object 3D segmentation:

  • ScanNet40:
./scripts/eval_multi_scannet40.sh
  • S3DIS:
./scripts/eval_multi_s3dis.sh
  • KITTI-360:
./scripts/eval_multi_kitti360.sh

Evaluation on interactive single-object 3D segmentation:

  • ScanNet40:
./scripts/eval_single_scannet40.sh
  • S3DIS:
./scripts/eval_single_s3dis.sh
  • KITTI-360:
./scripts/eval_single_kitti360.sh

Citation ๐ŸŽ“

If you find our code or paper useful, please cite:

@inproceedings{yue2023agile3d,
  title     = {{AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation}},
  author    = {Yue, Yuanwen and Mahadevan, Sabarinath and Schult, Jonas and Engelmann, Francis and Leibe, Bastian and Schindler, Konrad and Kontogianni, Theodora},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2024}
}

Acknowledgment ๐Ÿ™

We sincerely thank all volunteers who participated in our user study! Francis Engelmann and Theodora Kontogianni are postdoctoral research fellows at the ETH AI Center. This project is partially funded by the ETH Career Seed Award - Towards Open-World 3D Scene Understanding, NeuroSys-D (03ZU1106DA) and BMBF projects 6GEM (16KISK036K).

Parts of our code are built on top of Mask3D and InterObject3D. We also thank Anne Marx for the help in the initial version of the GUI.

agile3d's People

Contributors

ilya-fradlin avatar ywyue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agile3d's Issues

OOM memory and how to make this runnable

Hey, I am running this on 4060TI (16GB) and looking into how I can downgrade the backend to make this runnable for smaller GPUs in runtime or possibly CPUs. Can you give any tips? Many thanks

code release

Hello yuanwen !
I wonder when will the code be released?

Backbone and Training details

Hi,
Thanks for the wonderful job and open-sourced codes!

I have some confusion about the details though:

  1. The paper mentioned the feature extraction backbone is based on the Minkowski Engine, which actually seems to be a mix of ME and another version of Resnet arc in Mask3D when checking the codes. What is the consideration behind this?
  2. The experimental settings said that the training is conducted on a single RTX TITAN with 24GB memory. Though I encountered a memory issue: "MemoryError: std::bad_alloc: cudaErrorMemoryAllocation: out of memory" when I run the training script using a single RTX4090 with same 24GB memory. Is there any adjustments on the released traning codes? or is it just something wrong with my server.

Thanks for the possible help in any kind!

Training problem - list index out of range

Hi,

Firstly, thank you for your amazing work!

I'm having a problem with the training script. When I run it, I get this error:

pos_enc = pos_encodings_pcd[hlevel][0][b]# [num_points, 128]
IndexError: list index out of range

I noticed that the shape of pos_enc is (5, 1, 1). This causes the error with a batch size more than 1 (by default batch size 5). It works fine with a batch size 1.

Do you have any suggestions on how to fix this so it works with a batch size of more than 1?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.