Faster R-CNN / Mask R-CNN on COCO

This example provides a minimal (2k lines) and faithful implementation of the following object detection / instance segmentation papers:

with the support of:

Multi-GPU / distributed training, multi-GPU evaluation
Cross-GPU BatchNorm (aka Sync-BN, from MegDet: A Large Mini-Batch Object Detector)
Group Normalization
Training from scratch (from Rethinking ImageNet Pre-training)

This is likely the best-performing open source TensorFlow reimplementation of the above papers.

Dependencies

Python 3.3+; OpenCV
TensorFlow ≥ 1.6
pycocotools: pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
Pre-trained ImageNet ResNet model from tensorpack model zoo
COCO data. It needs to have the following directory structure:

COCO/DIR/
  annotations/
    instances_train201?.json
    instances_val201?.json
  train201?/
    COCO_train201?_*.jpg
  val201?/
    COCO_val201?_*.jpg

You can use either the 2014 version or the 2017 version of the dataset. To use the common "trainval35k + minival" split for the 2014 dataset, just download the annotation files instances_minival2014.json, instances_valminusminival2014.json from here to annotations/ as well.

^{Note that train2017==trainval35k==train2014+val2014-minival2014, and val2017==minival2014}

Usage

Train:

On a single machine:

./train.py --config \
    MODE_MASK=True MODE_FPN=True \
    DATA.BASEDIR=/path/to/COCO/DIR \
    BACKBONE.WEIGHTS=/path/to/ImageNet-R50-AlignPadding.npz

To run distributed training, set TRAINER=horovod and refer to HorovodTrainer docs.

Options can be changed by either the command line or the config.py file (recommended). Some reasonable configurations are listed in the table below.

Inference:

To predict on an image (needs DISPLAY to show the outputs):

./train.py --predict input.jpg --load /path/to/Trained-Model-Checkpoint --config SAME-AS-TRAINING

To evaluate the performance of a model on COCO:

./train.py --evaluate output.json --load /path/to/Trained-Model-Checkpoint \
    --config SAME-AS-TRAINING

Several trained models can be downloaded in the table below. Evaluation and prediction will need to be run with the corresponding configs used in training.

Results

These models are trained on trainval35k and evaluated on minival2014 using mAP@IoU=0.50:0.95. All models are fine-tuned from ImageNet pre-trained R50/R101 models in tensorpack model zoo, unless otherwise noted. All models are trained with 8 NVIDIA V100s, unless otherwise noted.

Performance in Detectron can be roughly reproduced.

Backbone	mAP (box;mask)	Detectron mAP ¹ (box;mask)	Time (on 8 V100s)	Configurations (click to expand)
R50-C4	33.5		17h	super quick `MODE_MASK=False FRCNN.BATCH_PER_IM=64` `PREPROC.TRAIN_SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024` `TRAIN.LR_SCHEDULE=[150000,230000,280000]`
R50-C4	36.6	36.5	44h	standard `MODE_MASK=False`
R50-FPN	37.4	37.9	23h	standard `MODE_MASK=False MODE_FPN=True`
R50-C4	38.2;33.3 ⬇️	37.8;32.8	49h	standard this is the default
R50-FPN	38.4;35.1 ⬇️	38.6;34.5	27h	standard `MODE_FPN=True`
R50-FPN	42.0;36.3		36h	+Cascade `MODE_FPN=True FPN.CASCADE=True`
R50-FPN	39.5;35.2	39.5;34.4²	31h	+ConvGNHead `MODE_FPN=True` `FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`
R50-FPN	40.0;36.2 ⬇️	40.3;35.7	33h	+GN `MODE_FPN=True` `FPN.NORM=GN BACKBONE.NORM=GN` `FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` `FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`
R101-C4	41.4;35.2 ⬇️		60h	standard `BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`
R101-FPN	40.4;36.6 ⬇️	40.9;36.4	37h	standard `MODE_FPN=True` `BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`
R101-FPN	46.5;40.1 ⬇️ ³		73h	3x+Cascade+TrainAug `MODE_FPN=True FPN.CASCADE=True` `BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` `TEST.RESULT_SCORE_THRESH=1e-4` `PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]` `TRAIN.LR_SCHEDULE=[420000,500000,540000]`
R101-FPN (From Scratch)	47.5;41.2 ⬇️	47.4;40.5⁴	45h (on 48 V100s)	9x+GN+Cascade+TrainAug `MODE_FPN=True FPN.CASCADE=True` `BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` `FPN.NORM=GN BACKBONE.NORM=GN` `FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` `FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` `PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]` `TRAIN.LR_SCHEDULE=[1500000,1580000,1620000]` `BACKBONE.FREEZE_AT=0`

1: Numbers taken from Detectron Model Zoo. We comapre models that have identical training & inference cost between the two implementation. However their numbers can be different due to many small implementation details. For example, our FPN models are sometimes slightly worse in box AP, which is probably due to batch size.

2: Numbers taken from Table 5 in Group Normalization

3: Our mAP is 10+ point better than the official model in matterport/Mask_RCNN with the same R101-FPN backbone.

4: This entry does not use ImageNet pre-training. Detectron numbers are taken from Fig. 5 in Rethinking ImageNet Pre-training. Note that our training strategy is slightly different: we enable cascade throughout the entire training.

Notes

NOTES.md has some notes about implementation details & speed.

8bitvn69 / fasterrcnn Goto Github PK

fasterrcnn's Introduction

Faster R-CNN / Mask R-CNN on COCO

Dependencies

Usage

Train:

Inference:

Results

Notes

fasterrcnn's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent