CAPE: Camera View Position Embedding for Multi-View 3D Object Detection (CVPR2023)

This repository is an official implementation of CAPE

CAPE is a simple yet effective method for multi-view 3D object detection. CAPE forms the 3D position embedding under the local camera-view system rather than the global coordinate system, which largely reduces the difficulty of the view transformation learning. And CAPE supports temporal modeling by exploiting the fusion between separated queries for multi frames.

Preparation

This implementation is built upon PETR, and can be constructed as the install.md.

Environments
Linux, Python==3.7.9, CUDA == 11.2, pytorch == 1.9.1, mmdet3d == 0.17.1
Detection Data
Follow the mmdet3d to process the nuScenes dataset (https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/data_preparation.md).
Pretrained weights
To verify the performance on the val set, we provide the pretrained V2-99 weights. The V2-99 is pretrained on DDAD15M (weights) and further trained on nuScenes train set with FCOS3D. For the results on test set in the paper, we use the DD3D pretrained weights. The ImageNet pretrained weights of other backbone can be found here. Please put the pretrained weights into ./ckpts/.

After preparation, you will be able to see the following directory structure:

CAPE
├── mmdetection3d
├── projects
│   ├── configs
│   ├── mmdet3d_plugin
├── tools
├── data
│   ├── nuscenes
│     ├── samples
│     ├── ...
├── ckpts
├── README.md

Train & inference

cd CAPE

You can train the model following:

sh train.sh

You can evaluate the model following:

sh test.sh

Main Results

config	mAP	NDS	config	download
cape_r50_1408x512_24ep_wocbgs_imagenet_pretrain	34.7%	40.6%	config	log / checkpoint
capet_r50_704x256_24ep_wocbgs_imagenet_pretrain	31.8%	44.2%	config	log / checkpoint
capet_VoV99_800x320_24ep_wocbgs_load_dd3d_pretrain	44.7%	54.36%	config	log / checkpoint

Acknowledgement

Many thanks to the authors of mmdetection3d. Special thanks to the authors of PETR.

Citation

If you find this project useful for your research, please consider citing:

@article{Xiong2023CAPE,
  title={CAPE: Camera View Position Embedding for Multi-View 3D Object Detection},
  author={Kaixin Xiong, Shi Gong, Xiaoqing Ye, Xiao Tan, Ji Wan, Errui Ding, Jingdong Wang, Xiang Bai},
  booktitle={Computer Vision and Pattern Recognition},
  year={2023}
}

Contact

If you have any questions, feel free to open an issue or contact us at [email protected] or [email protected] or [email protected].

dikubab / cape Goto Github PK

cape's Introduction

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection (CVPR2023)

Preparation

Train & inference

Main Results

Acknowledgement

Citation

Contact

cape's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent