Git Product home page Git Product logo

did-m3d's Introduction

DID-M3D

Introduction

This is the PyTorch implementation of the paper DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection, In ECCV'22, Liang Peng, Xiaopei Wu, Zheng Yang, Haifeng Liu, and Deng Cai. [paper]

Abstract

Monocular 3D detection has drawn much attention from the community due to its low cost and setup simplicity. It takes an RGB image as input and predicts 3D boxes in the 3D space. The most challenging sub-task lies in the instance depth estimation. Previous works usually use a direct estimation method.  However, in this paper we point out that the instance depth on the RGB image is non-intuitive. It is coupled by visual depth clues and instance attribute clues, making it hard to be directly learned in the network. Therefore, we propose to reformulate the instance depth to the combination of the instance visual surface depth (visual depth) and the instance attribute depth (attribute depth). The visual depth is related to objects' appearances and positions on the image. By contrast, the attribute depth relies on objects' inherent attributes, which are invariant to the object affine transformation on the image. Correspondingly, we decouple the 3D location uncertainty into visual depth uncertainty and attribute depth uncertainty. By combining different types of depths and associated uncertainties, we can obtain the final instance depth. Furthermore,  data augmentation in monocular 3D detection is usually limited due to the physical nature, hindering the boost of performance.  Based on the proposed instance depth disentanglement strategy, we can alleviate this problem. Evaluated on KITTI, our method achieves new state-of-the-art results, and extensive ablation studies validate the effectiveness of each component in our method.

Overview

Installation

Installation Steps

a. Clone this repository.

git clone https://github.com/SPengLiang/DID-M3D

b. Install the dependent libraries as follows:

  • Install the dependent python libraries:

    pip install torch==1.10.0 torchvision==0.11.1 pyyaml scikit-image opencv-python numba tqdm
  • We test this repository on Nvidia 3080Ti GPUs and Ubuntu 18.04. You can also follow the install instructions in GUPNet (This respository is based on it) to perform experiments with lower PyTorch/GPU versions.

Getting Started

Dataset Preparation

DID-M3D
├── data
│   │── KITTI3D
|   │   │── training
|   │   │   ├──calib & label_2 & image_2 & depth_dense
|   │   │── testing
|   │   │   ├──calib & image_2
├── config
├── ...
  • You can also choose to link your KITTI dataset path by

    KITTI_DATA_PATH=~/data/kitti_object
    ln -s $KITTI_DATA_PATH ./data/KITTI3D
    
  • To ease the usage, we provide the pre-generated dense depth files at: Google Drive

Training & Testing

Test and evaluate the pretrained models

CUDA_VISIBLE_DEVICES=0 python tools/train_val.py --config config/kitti.yaml -e   

Train a model

CUDA_VISIBLE_DEVICES=0,1,2,3 python tools/train_val.py --config config/kitti.yaml

Pretrained Model

To ease the usage, we provide the pre-trained model at: Google Drive

Here we give the comparison.

Models Car@BEV IoU=0.7 Car@3D IoU=0.7
Easy Mod Hard Easy Mod Hard
original paper 31.10 22.76 19.50 22.98 16.12 14.03
this repo 33.91 24.00 19.52 25.38 17.07 14.06

Citation

@inproceedings{peng2022did,
  title={DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection},
  author={Peng, Liang and Wu, Xiaopei and Yang, Zheng and Liu, Haifeng and Cai, Deng},
  booktitle={European Conference on Computer Vision},
  year={2022}
}

Acknowledgements

This respository is mainly based on GUPNet, and it also benefits from Second. Thanks for their great works!

did-m3d's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.