Git Product home page Git Product logo

few-shot-object-detection's Introduction

Wild Few-Shot Object Detection (WFsDet)

For over 6 months we have been busy customising the FsDet implementation. Although it works great with the datasets used for benchmarking purposes, we have found difficulties when trying to apply it to custom datasets. Hence, our interest in continuing the work and making a framework available that should help both academia and industry.

Although our paper is still in the works, we thought it would be interesting to make the code and dataset already available, so other Deep Learning and Computer Vision enthusiasts could start giving it a try and reviewing what we did.

We are keeping the original documentation available so the readers can understand our motivation. Once we are done with our paper and get it published (fingers crossed), we will see if creating a pull request towards the original repository and keep everything in one place.

Few-Shot Object Detection (FsDet)

Language grade: Python

FsDet contains the official few-shot object detection implementation of the ICML 2020 paper Frustratingly Simple Few-Shot Object Detection. TFA Figure

In addition to the benchmarks used by previous works, we introduce new benchmarks on three datasets: PASCAL VOC, COCO, and LVIS. We sample multiple groups of few-shot training examples for multiple runs of the experiments and report evaluation results on both the base classes and the novel classes. These are described in more detail in Data Preparation.

We also provide benchmark results and pre-trained models for our two-stage fine-tuning approach (TFA). In TFA, we first train the entire object detector on the data-abundant base classes, and then only fine-tune the last layers of the detector on a small balanced training set. See Models for our provided models and Getting Started for instructions on training and evaluation.

FsDet is well-modularized so you can easily add your own datasets and models. The goal of this repository is to provide a general framework for few-shot object detection that can be used for future research.

If you find this repository useful for your publications, please consider citing our paper.

@article{wang2020few,
    title={Frustratingly Simple Few-Shot Object Detection},
    author={Wang, Xin and Huang, Thomas E. and  Darrell, Trevor and Gonzalez, Joseph E and Yu, Fisher}
    booktitle = {International Conference on Machine Learning (ICML)},
    month = {July},
    year = {2020}
}

Updates

The code has been upgraded to detectron2 v0.2.1. If you need the original released code, please checkout the release v0.1 in the tag.

Table of Contents

Installation

Requirements

  • Linux with Python >= 3.6
  • PyTorch >= 1.4
  • torchvision that matches the PyTorch installation
  • CUDA 10.0, 10.1, 10.2
  • GCC >= 4.9

Build WFsDet

  • Install Conda
  • Create and activate the environment
conda env create -f environment.yml
conda activate wfsdet

After the environment is created, we have to install the Detectron2 package. Unfortunately, due to the way the Detectron2 package is setup, PyTorch must be already installed. Hence, just adding Detectron2 to the Conda environment file won't do it for us. Please, to proceed, execute the command below:

conda instal -c conda-forge detectron2

Code Structure

  • configs: Configuration files
  • datasets: Dataset files (see Data Preparation for more details)
  • fsdet
    • checkpoint: Checkpoint code.
    • config: Configuration code and default configurations.
    • engine: Contains training and evaluation loops and hooks.
    • layers: Implementations of different layers used in models.
    • modeling: Code for models, including backbones, proposal networks, and prediction heads.
  • tools
    • train_net.py: Training script.
    • test_net.py: Testing script.
    • ckpt_surgery.py: Surgery on checkpoints.
    • run_experiments.py: Running experiments across many seeds.
    • aggregate_seeds.py: Aggregating results from many seeds.

Data Preparation

We evaluate our models on three datasets:

  • PASCAL VOC: We use the train/val sets of PASCAL VOC 2007+2012 for training and the test set of PASCAL VOC 2007 for evaluation. We randomly split the 20 object classes into 15 base classes and 5 novel classes, and we consider 3 random splits. The splits can be found in fsdet/data/datasets/builtin_meta.py.
  • COCO: We use COCO 2014 and extract 5k images from the val set for evaluation and use the rest for training. We use the 20 object classes that are the same with PASCAL VOC as novel classes and use the rest as base classes.
  • LVIS: We treat the frequent and common classes as the base classes and the rare categories as the novel classes.

See datasets/README.md for more details.

Models

We provide a set of benchmark results and pre-trained models available for download in MODEL_ZOO.md.

Getting Started

Inference Demo with Pre-trained Models

  1. Pick a model and its config file from model zoo, for example, COCO-detection/faster_rcnn_R_101_FPN_ft_all_1shot.yaml.
  2. We provide demo.py that is able to run builtin standard models. Run it with:
python3 -m demo.demo --config-file configs/COCO-detection/faster_rcnn_R_101_FPN_ft_all_1shot.yaml \
  --input input1.jpg input2.jpg \
  [--other-options]
  --opts MODEL.WEIGHTS fsdet://coco/tfa_cos_1shot/model_final.pth

The configs are made for training, therefore we need to specify MODEL.WEIGHTS to a model from model zoo for evaluation. This command will run the inference and show visualizations in an OpenCV window.

For details of the command line arguments, see demo.py -h or look at its source code to understand its behavior. Some common arguments are:

  • To run on your webcam, replace --input files with --webcam.
  • To run on a video, replace --input files with --video-input video.mp4.
  • To run on cpu, add MODEL.DEVICE cpu after --opts.
  • To save outputs to a directory (for images) or a file (for webcam or video), use --output.

Training & Evaluation in Command Line

To train a model, run

python3 -m tools.train_net --num-gpus 8 \
        --config-file configs/PascalVOC-detection/split1/faster_rcnn_R_101_FPN_base1.yaml

To evaluate the trained models, run

python3 -m tools.test_net --num-gpus 8 \
        --config-file configs/PascalVOC-detection/split1/faster_rcnn_R_101_FPN_ft_all1_1shot.yaml \
        --eval-only

For more detailed instructions on the training procedure of TFA, see TRAIN_INST.md.

Multiple Runs

For ease of training and evaluation over multiple runs, we provided several helpful scripts in tools/.

You can use tools/run_experiments.py to do the training and evaluation. For example, to experiment on 30 seeds of the first split of PascalVOC on all shots, run

python3 -m tools.run_experiments --num-gpus 8 \
        --shots 1 2 3 5 10 --seeds 0 30 --split 1

After training and evaluation, you can use tools/aggregate_seeds.py to aggregate the results over all the seeds to obtain one set of numbers. To aggregate the 3-shot results of the above command, run

python3 -m tools.aggregate_seeds --shots 3 --seeds 30 --split 1 \
        --print --plot

few-shot-object-detection's People

Contributors

xinw1012 avatar wilderrodrigues avatar thomasehuang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.