Git Product home page Git Product logo

fasterrcnn's Introduction

Faster R-CNN / Mask R-CNN on COCO

This example provides a minimal (2k lines) and faithful implementation of the following object detection / instance segmentation papers:

with the support of:

This is likely the best-performing open source TensorFlow reimplementation of the above papers.

Dependencies

  • Python 3.3+; OpenCV
  • TensorFlow ≥ 1.6
  • pycocotools: pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
  • Pre-trained ImageNet ResNet model from tensorpack model zoo
  • COCO data. It needs to have the following directory structure:
COCO/DIR/
  annotations/
    instances_train201?.json
    instances_val201?.json
  train201?/
    COCO_train201?_*.jpg
  val201?/
    COCO_val201?_*.jpg

You can use either the 2014 version or the 2017 version of the dataset. To use the common "trainval35k + minival" split for the 2014 dataset, just download the annotation files instances_minival2014.json, instances_valminusminival2014.json from here to annotations/ as well.

Note that train2017==trainval35k==train2014+val2014-minival2014, and val2017==minival2014

Usage

Train:

On a single machine:

./train.py --config \
    MODE_MASK=True MODE_FPN=True \
    DATA.BASEDIR=/path/to/COCO/DIR \
    BACKBONE.WEIGHTS=/path/to/ImageNet-R50-AlignPadding.npz

To run distributed training, set TRAINER=horovod and refer to HorovodTrainer docs.

Options can be changed by either the command line or the config.py file (recommended). Some reasonable configurations are listed in the table below.

Inference:

To predict on an image (needs DISPLAY to show the outputs):

./train.py --predict input.jpg --load /path/to/Trained-Model-Checkpoint --config SAME-AS-TRAINING

To evaluate the performance of a model on COCO:

./train.py --evaluate output.json --load /path/to/Trained-Model-Checkpoint \
    --config SAME-AS-TRAINING

Several trained models can be downloaded in the table below. Evaluation and prediction will need to be run with the corresponding configs used in training.

Results

These models are trained on trainval35k and evaluated on minival2014 using mAP@IoU=0.50:0.95. All models are fine-tuned from ImageNet pre-trained R50/R101 models in tensorpack model zoo, unless otherwise noted. All models are trained with 8 NVIDIA V100s, unless otherwise noted.

Performance in Detectron can be roughly reproduced.

Backbone mAP
(box;mask)
Detectron mAP 1
(box;mask)
Time (on 8 V100s) Configurations
(click to expand)
R50-C4 33.5 17h
super quickMODE_MASK=False FRCNN.BATCH_PER_IM=64
PREPROC.TRAIN_SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024
TRAIN.LR_SCHEDULE=[150000,230000,280000]
R50-C4 36.6 36.5 44h
standardMODE_MASK=False
R50-FPN 37.4 37.9 23h
standardMODE_MASK=False MODE_FPN=True
R50-C4 38.2;33.3 ⬇️ 37.8;32.8 49h
standardthis is the default
R50-FPN 38.4;35.1 ⬇️ 38.6;34.5 27h
standardMODE_FPN=True
R50-FPN 42.0;36.3 36h
+CascadeMODE_FPN=True FPN.CASCADE=True
R50-FPN 39.5;35.2 39.5;34.42 31h
+ConvGNHeadMODE_FPN=True
FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head
R50-FPN 40.0;36.2 ⬇️ 40.3;35.7 33h
+GNMODE_FPN=True
FPN.NORM=GN BACKBONE.NORM=GN
FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head
FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head
R101-C4 41.4;35.2 ⬇️ 60h
standardBACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]
R101-FPN 40.4;36.6 ⬇️ 40.9;36.4 37h
standardMODE_FPN=True
BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]
R101-FPN 46.5;40.1 ⬇️ 3 73h
3x+Cascade+TrainAugMODE_FPN=True FPN.CASCADE=True
BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]
TEST.RESULT_SCORE_THRESH=1e-4
PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]
TRAIN.LR_SCHEDULE=[420000,500000,540000]
R101-FPN
(From Scratch)
47.5;41.2 ⬇️ 47.4;40.54 45h (on 48 V100s)
9x+GN+Cascade+TrainAugMODE_FPN=True FPN.CASCADE=True
BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]
FPN.NORM=GN BACKBONE.NORM=GN
FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head
FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head
PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]
TRAIN.LR_SCHEDULE=[1500000,1580000,1620000]
BACKBONE.FREEZE_AT=0

1: Numbers taken from Detectron Model Zoo. We comapre models that have identical training & inference cost between the two implementation. However their numbers can be different due to many small implementation details. For example, our FPN models are sometimes slightly worse in box AP, which is probably due to batch size.

2: Numbers taken from Table 5 in Group Normalization

3: Our mAP is 10+ point better than the official model in matterport/Mask_RCNN with the same R101-FPN backbone.

4: This entry does not use ImageNet pre-training. Detectron numbers are taken from Fig. 5 in Rethinking ImageNet Pre-training. Note that our training strategy is slightly different: we enable cascade throughout the entire training.

Notes

NOTES.md has some notes about implementation details & speed.

fasterrcnn's People

Contributors

8bitvn69 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

feiward

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.