Git Product home page Git Product logo

airdet's Introduction

AirDet: Few-Shot Detection without Fine-tuning for Autonomous Exploration

Bowen Li, Chen Wang, Pranay Reddy, Seungchan Kim, and Sebastian Scherer*

European Conference on Computer Vision (ECCV2022)

Abstract

Few-shot object detection has attracted increasing attention and rapidly progressed in recent years. However, the requirement of an exhaustive offline fine-tuning stage in existing methods is time-consuming and significantly hinders their usage in online applications such as autonomous exploration of low-power robots. We find that their major limitation is that the little but valuable information from a few support images is not fully exploited. To solve this problem, we propose a brand new architecture, AirDet, and surprisingly find that, by learning class-agnostic relation with the support images in all modules, including cross-scale object proposal network, shots aggregation module, and localization network, AirDet without fine-tuning achieves comparable or even better results than the exhaustively fine-tuned methods, reaching up to 30-40% improvements. We also present solid results of onboard tests on real-world exploration data from the DARPA Subterranean Challenge, which strongly validate the feasibility of AirDet in robotics. To the best of our knowledge, AirDet is the first feasible few-shot detection method for autonomous exploration of low-power robots. The source code and pre-trained models are released.

Overview

We provide official implementation here to reproduce the results w/o fine-tuning of ResNet101 backbone on:

  • COCO-2017 validation
  • VOC-2012 validation dataset

Installation

Please create a python environment including:

Python 3.6.9

numpy 1.19.2

detectron2 0.2 (higher version is also OK)

CUDA compiler CUDA 10.2

PyTorch 1.5.1

Pillow 8.3.1

torchvision 0.6.0

fvcore 0.1.5

cv2 4.5.4

We also provide the official docker image (v4) for faster reproduction.

ROS wrapper is also provided here.

Dataset Preparation

1. Download official datasets

MS COCO 2017

PASCAL VOC

COCO format VOC annotations

Expected dataset Structure:

coco/
  annotations/
    instances_{train,val}2017.json
    person_keypoints_{train,val}2017.json
  {train,val}2017/
VOC20{12}/
  annotations/
  	json files
  JPEGImages/

2. Generate supports

Download and unzip support (COCO json files) MEGA/BaiduNet(pwd:1134) in

datasets/
  coco/
    new_annotations/

Download and unzip support (VOC json files) MEGA/BaiduNet(pwd:1134) in

datasets/
  voc/
    new_annotations/

Run the script

cd datasets
bash generate_support_data.sh

You may modify 4_gen_support_pool_10_shot.py line 190, 213, and 269 with different shots (default is 1 shot).

Reproduce

Base training

Download base R-101 model in /output

start training

bash train.sh

We also provide official trained model MEGA/BaiduNet(pwd:1134)

Put the model in /output/R101/

Inference w/o fine-tuning

bash test.sh

You'll get the results in /log

Citation

If our work motivates/helps your work, please cite us as:

@inproceedings{Li2022ECCV,
      author    = {Li, Bowen and Wang, Chen and Reddy, Pranay and Kim, Seungchan and Scherer, Sebastian},
      title     = {AirDet: Few-Shot Detection without Fine-tuning for Autonomous Exploration},
      booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
      year      = {2022}
  }

Acknowledgement

Our code is built on top of FewX, we express our sincere gratitude for the authors.

airdet's People

Contributors

jaraxxus-me avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

airdet's Issues

The fine-tuning code

When is the fine-tuned code updated? I think it is also important to reproduce your performance.

Test result of COCO dataset

I tried to reproduce according to the steps in the README, and finally got the test results as shown in the figure.
I don't know whether this result is correct. If not, where did I go wrong? And what else do I need to do?
WechatIMG124
WechatIMG125

IndexError: list index out of range

您好,
使用数据集coco_2017_train_nonvoc时可以进行训练。
使用数据集coco_2017_train_voc_1_shot时会报错。
1660212244325
请问,这是什么问题造成的?

Correct supports cannot be generated in the VOC dataset

When I execute this line of code:bash generate_support_data.sh
I found that adjusting the value of the shot only generated 1-shot,it may be that my file structure is not right, can you tell me the full file structure for me to check ?

Inference

Is there anyway that I can use your weight to do inference via webcam? Would it be prossible if you could provide inference part.
I have tried to use the demo code from detectron2 https://github.com/facebookresearch/detectron2/tree/main/demo.
I copied whole demo folder into AirDet folder;
Then I changed the config file from "configs/quick_schedules/mask_rcnn_R_50_FPN_inference_acc_test.yaml" to "./configs/fsod/R101/test_R_101_C4_1x_coco3.yaml", which is from your config folder.
But, I got
KeyError: 'Non-existent config key: DATASETS.TESTSHOTS'
when code runs self.merge_from_other_cfg(loaded_cfg)

The weight of R101 provided

In the paper, R101 is frozen in stem and ResNet1 block. However, I print the weights and they are different from the in1k R101 model. Have they finetuned?

The performance of other methods in Table 1

Hi, in Table 1, you compare your model with some other methods on COCO 2017. In previous works, other methods perform experiments on COCO 2014 and report 10-shot and 30-shot results. I want to know how you obtain the results of other methods in Table 1, do you re-implement them by yourself?

Efficiency

When I used resnet50 as backbone training, the time per iteration was about 0.8s (bitchsize=4, gpu=4), and it was also about 0.8s per image at inference, which is a 10x difference from the paper(0.08s). Is this normal? The gpu is 2080ti.

train log:
image

inference log:
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.