Git Product home page Git Product logo

monotta's Introduction

๐ŸŒ  MonoTTA

This is the official project repository for MonoTTA: Fully Test-Time Adaptation for Monocular 3D Object Detection

Demo

sourceonly_vs_monotta_defocus1.mp4
fog_1_so_vs_monotta.mp4
gaussian_noise_1_so_vs_monotta.mp4

Note

  • 1๏ธโƒฃ With a single NVIDIA RTX 4090, MonoTTA only requires ~45ms to adapt one test image (1280X384) on KITTI-C, i.e., fps >= 15.
  • 2๏ธโƒฃ MonoTTA conducts reliable adaptation, thereby improving TP and reducing FN. Meanwhile, MonoTTA prevents the model from overfitting to noisy predictions and falling into trivial solutions via negative regularization, thus reducing FP.

Abstract

Monocular 3D object detection (Mono 3Det) aims to identify 3D objects from a single RGB image. However, existing methods often assume training and test data follow the same distribution, which may not hold in real-world test scenarios. To address the out-of-distribution (OOD) problems, we explore a new adaptation paradigm for Mono 3Det, termed Fully Test-time Adaptation. It aims to adapt a well-trained model to unlabeled test data by handling potential data distribution shifts at test time without access to training data and test labels. However, applying this paradigm in Mono 3Det poses significant challenges due to OOD test data causing a remarkable decline in object detection scores. This decline conflicts with the pre-defined score thresholds of existing detection methods, leading to severe object omissions (i.e., rare positive detections and many false negatives). Consequently, the limited positive detection and plenty of noisy predictions cause test-time adaptation to fail in Mono 3Det.

image

MonoTTA consists of:

  • 1๏ธโƒฃ Reliability-driven adaptation: we empirically find that high-score objects are still reliable and the optimization of high-score objects can enhance confidence across all detections. Thus, we devise a self-adaptive strategy to identify reliable objects for model adaptation, which discovers potential objects and alleviates omissions.
  • 2๏ธโƒฃ Noise-guard adaptation: since high-score objects may be scarce, we develop a negative regularization term to exploit the numerous low-score objects via negative learning, preventing overfitting to noise and trivial solutions.

Data Preparation

Tianyi Cloud: https://cloud.189.cn/t/Jn2INvaQFJBr (Password: q7st)

Google Drive: https://drive.google.com/file/d/1Se_0wpGCV4-pEEmLVY59J_H1VkkjfWTX/view?usp=sharing

Installation

We recommend reproducing experiments of MonoFlex (https://github.com/zhangyp15/MonoFlex) due to its relatively stable performance

  1. We adopt torch1.7.1+cu110, by
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

pip install -r requirements.txt 
  1. Then
cd model/backbone/DCNv2

. make.sh

cd ../../..

python setup develop
  1. For the source-only setting (with the official checkpoint [https://drive.google.com/drive/folders/1U60gUYp4JFOkG0VMefc4aVEMxtGM-AMu?usp=sharing]). Don't forget to config the data path in './config/paths_catalog.py'
CUDA_VISIBLE_DEVICES=0 python tools/plain_train_net.py --config runs/monoflex.yaml --ckpt YOUR_CKPT  --eval

--Docker Env--

If you find that the environment above conflicts with your GPU server (e.g., conflicting cuda or gcc versions), it is recommended to use Docker to build the environment:

cd docker/pytorch1.7.1_cu110
docker build -t monotta:pytorch1.7.1-cuda11.0 .

Then, you can initilize a docker container with the built image "monotta:pytorch1.7.1-cuda11.0" by

docker run --gpus all -it --shm-size=64g monotta:pytorch1.7.1-cuda11.0

or (with directory mounting)

docker run --gpus all -it --shm-size=64g -v /dir_local:/dir_docker monotta:pytorch1.7.1-cuda11.0

Next, you can continue configuring the environment in the Docker container mentioned above.

Usage

You can run MonoTTA by

CUDA_VISIBLE_DEVICES=0 python tools/tta_monotta.py --config runs/monoflex.yaml --ckpt model_moderate_best_soft.pth --eval --output kitti-c/gaussian1

Results

MonoFlex(Gaussian_Noise_1) Car Pedestrian Cyclist Avg.
No adapt 4.82 0.23 0.34 1.8
MonoTTA (ours) 21.15 6.54 3.01 10.27

Note: The small difference between this repo and the main paper is due to the randomness in generating perturbed data.

Correspondence

Please contact Hongbin Lin by [[email protected]] if you have any questions. ๐Ÿ“ฌ

Citation

If our MonoTTA method or fully TTA for Monocular 3D Object Detection settings are helpful in your research, please consider citing our paper:

@inproceedings{lin2024fully,
  title={Fully Test-Time Adaptation for Monocular 3D Object Detection},
  author={Lin, Hongbin and Zhang, Yifan and Niu, Shuaicheng and Cui, Shuguang and Li, Zhen},
  booktitle = {European Conference on Computer Vision},
  year = {2024}
}

Acknowledgment

The code is inspired by the MonoFlex ๐Ÿ”— Tent ๐Ÿ”—, EATA ๐Ÿ”—.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.