jaraxxus-me / airdet Goto Github PK

View Code? Open in Web Editor NEW

70.0 4.0 11.0 97 KB

Full conference version of AirDet: Few-Shot Detection without Fine-tuning for Autonomous Exploration

Home Page: https://jaraxxus-me.github.io/ECCV2022_AirDet/

License: BSD 2-Clause "Simplified" License

Python 99.09% Shell 0.91%

few-shot-object-detection online robotics

airdet's Introduction

AirDet: Few-Shot Detection without Fine-tuning for Autonomous Exploration

Bowen Li, Chen Wang, Pranay Reddy, Seungchan Kim, and Sebastian Scherer*

European Conference on Computer Vision (ECCV2022)

Project web

Abstract

Few-shot object detection has attracted increasing attention and rapidly progressed in recent years. However, the requirement of an exhaustive offline fine-tuning stage in existing methods is time-consuming and significantly hinders their usage in online applications such as autonomous exploration of low-power robots. We find that their major limitation is that the little but valuable information from a few support images is not fully exploited. To solve this problem, we propose a brand new architecture, AirDet, and surprisingly find that, by learning class-agnostic relation with the support images in all modules, including cross-scale object proposal network, shots aggregation module, and localization network, AirDet without fine-tuning achieves comparable or even better results than the exhaustively fine-tuned methods, reaching up to 30-40% improvements. We also present solid results of onboard tests on real-world exploration data from the DARPA Subterranean Challenge, which strongly validate the feasibility of AirDet in robotics. To the best of our knowledge, AirDet is the first feasible few-shot detection method for autonomous exploration of low-power robots. The source code and pre-trained models are released.

Overview

We provide official implementation here to reproduce the results w/o fine-tuning of ResNet101 backbone on:

COCO-2017 validation
VOC-2012 validation dataset

Installation

Please create a python environment including:

Python 3.6.9

numpy 1.19.2

detectron2 0.2 (higher version is also OK)

CUDA compiler CUDA 10.2

PyTorch 1.5.1

Pillow 8.3.1

torchvision 0.6.0

fvcore 0.1.5

cv2 4.5.4

We also provide the official docker image (v4) for faster reproduction.

ROS wrapper is also provided here.

Dataset Preparation

1. Download official datasets

MS COCO 2017

PASCAL VOC

COCO format VOC annotations

Expected dataset Structure:

coco/
  annotations/
    instances_{train,val}2017.json
    person_keypoints_{train,val}2017.json
  {train,val}2017/

VOC20{12}/
  annotations/
  	json files
  JPEGImages/

2. Generate supports

Download and unzip support (COCO json files) MEGA/BaiduNet(pwd:1134) in

datasets/
  coco/
    new_annotations/

Download and unzip support (VOC json files) MEGA/BaiduNet(pwd:1134) in

datasets/
  voc/
    new_annotations/

Run the script

cd datasets
bash generate_support_data.sh

You may modify 4_gen_support_pool_10_shot.py line 190, 213, and 269 with different shots (default is 1 shot).

Reproduce

Base training

Download base R-101 model in /output

start training

bash train.sh

We also provide official trained model MEGA/BaiduNet(pwd:1134)

Put the model in /output/R101/

Inference w/o fine-tuning

bash test.sh

You'll get the results in /log

Citation

If our work motivates/helps your work, please cite us as:

@inproceedings{Li2022ECCV,
      author    = {Li, Bowen and Wang, Chen and Reddy, Pranay and Kim, Seungchan and Scherer, Sebastian},
      title     = {AirDet: Few-Shot Detection without Fine-tuning for Autonomous Exploration},
      booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
      year      = {2022}
  }

Acknowledgement

Our code is built on top of FewX, we express our sincere gratitude for the authors.

airdet's People

Contributors

Stargazers

Watchers

Forkers

xiaodongdreams wolfworld6 erica-yang xiaojake cv-det lqj-creator amose-yao sweetpotato0213 kesseewise keshav340 qjsqjs

airdet's Issues

The trained model is not available

The trained model is not available now. How to get the trained model?

The fine-tuning code

When is the fine-tuned code updated? I think it is also important to reproduce your performance.

How to train on self-define dataset

Test result of COCO dataset

I tried to reproduce according to the steps in the README, and finally got the test results as shown in the figure.
I don't know whether this result is correct. If not, where did I go wrong? And what else do I need to do?

IndexError: list index out of range

您好，
使用数据集coco_2017_train_nonvoc时可以进行训练。
使用数据集coco_2017_train_voc_1_shot时会报错。

请问，这是什么问题造成的？

Correct supports cannot be generated in the VOC dataset

When I execute this line of code:bash generate_support_data.sh
I found that adjusting the value of the shot only generated 1-shot，it may be that my file structure is not right, can you tell me the full file structure for me to check ？

Inference

Is there anyway that I can use your weight to do inference via webcam? Would it be prossible if you could provide inference part.
I have tried to use the demo code from detectron2 https://github.com/facebookresearch/detectron2/tree/main/demo.
I copied whole demo folder into AirDet folder;
Then I changed the config file from "configs/quick_schedules/mask_rcnn_R_50_FPN_inference_acc_test.yaml" to "./configs/fsod/R101/test_R_101_C4_1x_coco3.yaml", which is from your config folder.
But, I got
KeyError: 'Non-existent config key: DATASETS.TESTSHOTS'
when code runs self.merge_from_other_cfg(loaded_cfg)

The weight of R101 provided

In the paper, R101 is frozen in stem and ResNet1 block. However, I print the weights and they are different from the in1k R101 model. Have they finetuned?

cuda out of memory

Training code crashed in inference. It seems all test images will be loaded as support images.

https://github.com/Jaraxxus-Me/AirDet/blob/master/fewx/modeling/fsod/fsod_rcnn.py#L320
And a large amount of test images will crash the code.

The performance of other methods in Table 1

Hi, in Table 1, you compare your model with some other methods on COCO 2017. In previous works, other methods perform experiments on COCO 2014 and report 10-shot and 30-shot results. I want to know how you obtain the results of other methods in Table 1, do you re-implement them by yourself?

Efficiency

When I used resnet50 as backbone training, the time per iteration was about 0.8s (bitchsize=4, gpu=4), and it was also about 0.8s per image at inference, which is a 10x difference from the paper(0.08s). Is this normal? The gpu is 2080ti.

train log:

inference log: