Git Product home page Git Product logo

nearthlab / image-segmentation Goto Github PK

View Code? Open in Web Editor NEW
64.0 3.0 20.0 36.14 MB

Mask R-CNN, FPN, LinkNet, PSPNet and UNet with multiple backbone architectures support readily available

Home Page: http://nearthlab.com/en/

License: MIT License

Python 99.86% Shell 0.14%
mask-rcnn fpn linknet pspnet unet tensorflow keras semantic-segmentation instance-segmentation pretrained pre-trained image-segmantation object-detection

image-segmentation's Introduction

image-segmentation

This repository includes:

  • A re-implementation of matterport/Mask_RCNN with multiple backbone support (with imagenet pretrained weights) using the implementations of various backbone models in qubvel/classification_models. (See here for available backbone architectures)
  • Unified training, inference and evaluation codes for Mask R-CNN and some semantic segmentation models (from qubvel/segmentation_models), for which you can easily modify various parameters with simple configuration file interface.
  • COCO dataset and KITTI dataset viewers
  [Available segmentation models]
  Instance:
    'maskrcnn'
  Semantic:
    'fpn', 'linknet', 'pspnet', 'unet'
  
  [Available backbone architectures]
  MobileNet:
    'mobilenetv2', 'mobilenet' 
  DenseNet:
    'densenet121', 'densenet169', 'densenet201'  
  ResNet:
    'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152'
  ResNext:
    'resnext50', 'resnext101'
  SE-Net:
    'seresnet18', 'seresnet34', 'seresnet50', 'seresnet101', 'seresnet152', 'seresnext50', 'seresnext101', 'senet154'
  Resnet V2:
    'resnet50v2', 'resnet101v2', 'resnet152v2'   
  Inception:
    'inceptionv3', 'inceptionresnetv2', 'xception'
  NASNet:
    'nasnetmobile', 'nasnetlarge'
  VGG:
    'vgg16', 'vgg19'
  


UNet with SeResNext101 backbone, trained on a synthetic dataset using this repository unet
UNet with SeResNext50 backbone, trained on only 500 images + augmentation using this repository
(left: input image, middle: ground truth, right: prediction) defects
FPN with ResNet18 backbone, trained on only 180 images using this repository fpn
MaskRCNN with ResNet101 backbone, trained on COCO dataset, weights file ported from matterport/Mask_RCNN. See Custom Backbone for more details. maskrcnn

Installation

i. How to set up a virtual environment and install on it

  sudo apt-get install virtualenv
  virtualenv -p python3 venv
  git clone https://github.com/nearthlab/image-segmentation
  cd image-segmentation
  source activate 
  cat requirements.txt | while read p; do pip install $p; done

You should run the following commands every time you open a new terminal in order to run any of python files

  cd /path/to/image-segmentation
  source activate
  # the second line is equivalent to: 
  # source ../venv/bin/activate && export PYTHONPATH=`pwd`/image-segmentation
  # i.e. activating the virtual environment and add the image-segmentation/image-segmentation folder to the PYTHONPATH

ii. How to install without a virtual environment
Note that working on a virtual environment is highly recommended. But if you insist on not using it, you can still do so:

  git clone https://github.com/nearthlab/image-segmentation
  cd image-segmentation
  cat requirements.txt | while read p; do pip install --user $p; done
  • You may need to reinstall tensorflow(-gpu) if the automatically installed one is not suitable for your local environment.

Requirements

1. Python 3.5+
2. segmentation-models==0.2.0
3. keras>=2.2.0
4. keras-applications>=1.0.7 
5. tensorflow(-gpu)=>1.8.0 (tested on 1.10.0)

How to run examples

Please read the instruction written in READ.md files in each example folder

  1. Custom Backbone
    This example illustrates how to build MaskRCNN with your custom backbone architecture. In particular, I adopted matterport's implementation of ResNet, which is slightly different from qubvel's. Moreover, you can run the inference using the pretrained MaskRCNN_coco.h5. (I slightly modified the 'mask_rcnn_coco.h5' in matterport/Mask_RCNN/releases to make this example work: the only differences are layer names)

  2. Imagenet Classification
    This example shows the imagenet classification results for various backbone architectures.

  3. Create KITTI Label
    This example is a code that I used to simplify some of the object class labels in KITTI dataset. (For instance, I merged the 5 separate classes 'car', 'truck', 'bus', 'caravan' and 'trailer' into a single class called 'vehicle')

  4. Configurations
    Some example cfg files that describes the segmentation models and training processes

How to train your own FPN / LinkNet / PSPNet / UNet model on KITTI dataset

i. Download the modified KITTI dataset from release page (or make your own dataset into the same format) and place it under datasets folder.

  • KITTI dataset is a public dataset available online. I simply splitted the dataset into training and validation sets and simplified the labels using create_kitti_label.py.

  • Note that this dataset is very small containing only 180 training images and 20 validation images. If you want to train a model for a serious purpose, you should consider using much more larger dataset.

  • To view the KITTI dataset, run:

  python kitti_viewer.py -d=datasets/KITTI

ii. Choose your model and copy corresponding cfg files from examples/configs. For example, if you want to train a Unet model,

  cd /path/to/image-segmentation
  mkdir -p plans/unet
  cp examples/configs/unet/*.cfg plans/unet

iii. [Optional] Tune some model and training parameters in the config files that you have just copied. Read the comments in the example config files for what each parameter means. [Note that you have to declare a variable in .cfg file in the format {type}-{VARIABLE_NAME} = {value}]

iv. Run train.py:

  python train.py -s plans/unet -d datasets/KITTI \
  -m plans/unet/unet.cfg \
  -t plans/unet/train_unet_decoder.cfg plans/unet/train_unet_all.cfg

This script will train the unet model in two stages with training information in plans/unet/train_unet_decoder.cfg followed by plans/unet/train_unet_all.cfg. The idea is: we first train the decoder part only while freezing the backbone with imagenet-pretrained weights loaded, and then fine tune the entire model in the second stage. You can provide as many training cfg files as you wish, dividing training into multiple stages.

Once the training is done, you can find the three files: 'class_names.json', 'infer.cfg' and 'best_model.h5', which you can use later for the inference

v. KITTI Evaluation:

  python evaluate_kitti.py -c /path/to/infer.cfg -w /path/to/best_model.h5 -l /path/to/class_names.json

How to train your own MaskRCNN model on COCO dataset

i. Download the COCO dataset. To do this, simply run:

  cd /path/to/image-segmentation/datasets
  ./download_coco.sh
  • To view the COCO dataset, run:
  python coco_viewer.py -d=datasets/coco

ii. Copy the example cfg files from examples/configs/maskrcnn.

  cd /path/to/image-segmentation
  mkdir -p plans/maskrcnn
  cp examples/configs/maskrcnn/*.cfg plans/maskrcnn

iii. [Optional] Tune some model and training parameters in the config files that you have just copied. Read the comments in the example config files for what each parameter means. [Note that you have to declare a variable in .cfg file in the format {type}-{VARIABLE_NAME} = {value}]

iv. Run train.py:

  python train.py -s plans/maskrcnn -d datasets/coco \
  -m plans/maskrcnn/maskrcnn.cfg \
  -t plans/maskrcnn/train_maskrcnn_heads.cfg plans/maskrcnn/train_maskrcnn_stage3up.cfg plans/maskrcnn/train_maskrcnn_all.cfg

This will train your MaskRCNN model in 3 stages (heads โ†’ stage3+ โ†’ all) as suggested in matterport/Mask_RCNN) Likewise, you can find the three files: 'class_names.json', 'infer.cfg' and 'best_model.h5', which you can use later for the inference

v. COCO Evaluation:

  python evaluate_coco.py -c /path/to/infer.cfg -w /path/to/best_model.h5 -l /path/to/class_names.json

How to visualize inference

You can visualize your model's inference in a pop-up window:

python infer_gui.py -c=/path/to/infer.cfg -w=/path/to/best_model.h5 -l=/path/to/class_names.json \
-i=/path/to/a/directory/containing/image_files

or save the results as image files [This will create a directory named 'results' under the directory you provided in -i option, and write the viusalized inference images in it]:

python infer.py -c=/path/to/infer.cfg -w=/path/to/best_model.h5 -l=/path/to/class_names.json \
-i=/path/to/a/directory/containing/image_files

References

@misc{matterport_maskrcnn_2017,
  title={Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow},
  author={Waleed Abdulla},
  year={2017},
  publisher={Github},
  journal={GitHub repository},
  howpublished={\url{https://github.com/matterport/Mask_RCNN}},
}

image-segmentation's People

Contributors

jiwoong-choi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

image-segmentation's Issues

Training does not match Matterport

I'm training a resnet50 (custom backbone/configs/maskrcnn.cfg) with a grocery products dataset.
Even after several attempts the model stagnates at a certain val_loss close to 3.5, however the same training with the same parameters using Matterport is very different (better).

In addition the weights have a relative difference, while in Nearthlab is 172MB, with Matterport is 180MB.

Given that the difference between the two frameworks in the case of resnet50 is just the name of the layers, what could be causing this problem?

Any optimization during training that modifies weights?

Because I realized that Matterpport's resnet50 doesn't work on a Jetson Nano, whereas with Nearthlab it does.

Am I doing something wrong? Should I set something else to work?

Logs:

Nearthlab :

Epoch 1/100
100/100 [==============================] - 113s 1s/step - loss: 9.2259 - val_loss: 6.9288
Epoch 2/100
100/100 [==============================] - 58s 580ms/step - loss: 7.7932 - val_loss: 5.3660
Epoch 3/100
100/100 [==============================] - 59s 590ms/step - loss: 6.9681 - val_loss: 4.7230
Epoch 4/100
100/100 [==============================] - 59s 593ms/step - loss: 6.2129 - val_loss: 4.0161
Epoch 5/100
100/100 [==============================] - 59s 587ms/step - loss: 5.5432 - val_loss: 3.9557
Epoch 6/100
100/100 [==============================] - 59s 586ms/step - loss: 5.0773 - val_loss: 3.4220
Epoch 7/100
100/100 [==============================] - 58s 582ms/step - loss: 4.5639 - val_loss: 3.5953
Epoch 8/100
100/100 [==============================] - 59s 590ms/step - loss: 4.0430 - val_loss: 3.5199
Epoch 9/100
100/100 [==============================] - 58s 582ms/step - loss: 3.8537 - val_loss: 3.5656
Epoch 10/100
100/100 [==============================] - 57s 574ms/step - loss: 3.8129 - val_loss: 3.4385
..... val_loss stagnates
Epoch 15/100
100/100 [==============================] - 58s 576ms/step - loss: 3.1963 - val_loss: 3.9125
.....
Epoch 38/100
100/100 [==============================] - 61s 610ms/step - loss: 2.8903 - val_loss: 3.3682
.....

Matterport:

Epoch 1/100
100/100 [==============================] - 112s 1s/step - loss: 3.8189 - rpn_class_loss: 0.3061 - rpn_bbox_loss: 0.8670 - mrcnn_class_loss: 1.1758 - mrcnn_bbox_loss: 0.8026 - mrcnn_mask_loss: 0.6674 - val_loss: 3.4789 - val_rpn_class_loss: 0.1830 - val_rpn_bbox_loss: 0.6418 - val_mrcnn_class_loss: 1.2534 - val_mrcnn_bbox_loss: 0.7054 - val_mrcnn_mask_loss: 0.6953
Epoch 2/100
100/100 [==============================] - 95s 945ms/step - loss: 3.3036 - rpn_class_loss: 0.1090 - rpn_bbox_loss: 0.5533 - mrcnn_class_loss: 1.2661 - mrcnn_bbox_loss: 0.6808 - mrcnn_mask_loss: 0.6943 - val_loss: 3.3427 - val_rpn_class_loss: 0.0787 - val_rpn_bbox_loss: 0.5562 - val_mrcnn_class_loss: 1.3297 - val_mrcnn_bbox_loss: 0.6842 - val_mrcnn_mask_loss: 0.6938
Epoch 3/100
100/100 [==============================] - 94s 936ms/step - loss: 3.1970 - rpn_class_loss: 0.0686 - rpn_bbox_loss: 0.4841 - mrcnn_class_loss: 1.2985 - mrcnn_bbox_loss: 0.6518 - mrcnn_mask_loss: 0.6940 - val_loss: 3.0237 - val_rpn_class_loss: 0.0571 - val_rpn_bbox_loss: 0.4428 - val_mrcnn_class_loss: 1.1865 - val_mrcnn_bbox_loss: 0.6436 - val_mrcnn_mask_loss: 0.6938
Epoch 4/100
100/100 [==============================] - 95s 947ms/step - loss: 3.1986 - rpn_class_loss: 0.0728 - rpn_bbox_loss: 0.4853 - mrcnn_class_loss: 1.3238 - mrcnn_bbox_loss: 0.6261 - mrcnn_mask_loss: 0.6905 - val_loss: 3.0983 - val_rpn_class_loss: 0.0509 - val_rpn_bbox_loss: 0.4753 - val_mrcnn_class_loss: 1.2831 - val_mrcnn_bbox_loss: 0.5961 - val_mrcnn_mask_loss: 0.6928
Epoch 5/100
100/100 [==============================] - 91s 914ms/step - loss: 3.0763 - rpn_class_loss: 0.0421 - rpn_bbox_loss: 0.4181 - mrcnn_class_loss: 1.3182 - mrcnn_bbox_loss: 0.6055 - mrcnn_mask_loss: 0.6924 - val_loss: 3.0088 - val_rpn_class_loss: 0.0330 - val_rpn_bbox_loss: 0.4035 - val_mrcnn_class_loss: 1.2702 - val_mrcnn_bbox_loss: 0.6103 - val_mrcnn_mask_loss: 0.6918
Epoch 6/100
100/100 [==============================] - 90s 902ms/step - loss: 3.0606 - rpn_class_loss: 0.0542 - rpn_bbox_loss: 0.4186 - mrcnn_class_loss: 1.3160 - mrcnn_bbox_loss: 0.5800 - mrcnn_mask_loss: 0.6919 - val_loss: 2.8770 - val_rpn_class_loss: 0.0678 - val_rpn_bbox_loss: 0.4394 - val_mrcnn_class_loss: 1.1240 - val_mrcnn_bbox_loss: 0.5563 - val_mrcnn_mask_loss: 0.6894
Epoch 7/100
100/100 [==============================] - 94s 940ms/step - loss: 2.9551 - rpn_class_loss: 0.0400 - rpn_bbox_loss: 0.4016 - mrcnn_class_loss: 1.2583 - mrcnn_bbox_loss: 0.5634 - mrcnn_mask_loss: 0.6917 - val_loss: 2.9014 - val_rpn_class_loss: 0.0453 - val_rpn_bbox_loss: 0.3620 - val_mrcnn_class_loss: 1.2578 - val_mrcnn_bbox_loss: 0.5445 - val_mrcnn_mask_loss: 0.6919
Epoch 8/100
100/100 [==============================] - 91s 908ms/step - loss: 2.9892 - rpn_class_loss: 0.0621 - rpn_bbox_loss: 0.4153 - mrcnn_class_loss: 1.2762 - mrcnn_bbox_loss: 0.5460 - mrcnn_mask_loss: 0.6895 - val_loss: 2.7237 - val_rpn_class_loss: 0.0407 - val_rpn_bbox_loss: 0.3425 - val_mrcnn_class_loss: 1.1253 - val_mrcnn_bbox_loss: 0.5290 - val_mrcnn_mask_loss: 0.6863
Epoch 9/100
100/100 [==============================] - 92s 920ms/step - loss: 2.7157 - rpn_class_loss: 0.0341 - rpn_bbox_loss: 0.3368 - mrcnn_class_loss: 1.1446 - mrcnn_bbox_loss: 0.5110 - mrcnn_mask_loss: 0.6891 - val_loss: 2.8300 - val_rpn_class_loss: 0.0302 - val_rpn_bbox_loss: 0.3824 - val_mrcnn_class_loss: 1.1554 - val_mrcnn_bbox_loss: 0.5717 - val_mrcnn_mask_loss: 0.6903
Epoch 10/100
100/100 [==============================] - 90s 903ms/step - loss: 2.8210 - rpn_class_loss: 0.0414 - rpn_bbox_loss: 0.3935 - mrcnn_class_loss: 1.1850 - mrcnn_bbox_loss: 0.5123 - mrcnn_mask_loss: 0.6889 - val_loss: 2.5782 - val_rpn_class_loss: 0.0272 - val_rpn_bbox_loss: 0.3071 - val_mrcnn_class_loss: 1.0568 - val_mrcnn_bbox_loss: 0.4993 - val_mrcnn_mask_loss: 0.6878

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.