Git Product home page Git Product logo

semantic-segmentation-crf's Introduction

Semantic-Segmentation-CRF

Dependencies


  • Python 3.x
  • TensorFlow 1.10.0
  • CUDA 9.0

Get deeplabv3 and deeplabv3+ results


I've used three models that are deeplabv3 trained on PASCALVOC2012 train+aug dataset and its backbone is resnet_v2_101、deeplabv3+ trained on PASCAL VOC2012 train+aug and its backbone is xception_65、deeplabv3+ trained on PASCAL VOC2012 train+val and its backbone is Xception_65.To be honest, deeplabv3 is my own work,and I refer to this repo about deeplabv3+:deeplabv3+

PASCAL dataset


Deeplabv3


Training

First, we convert original data to the Tensorflow TFRecord format to accelerate training seep.

python create_tf_record.py --data_dir DATA_DIR \ 
                           --image_data_dir IMAGE_DATA_DIR \
                           --label_data_dir LABEL_DATA_DIR

then start training model as follow:

python train.py --model_dir MODEL_DIR \
                --pre_trained_model PRE_TRAINED_MODEL \ 
                --batch_size 16 \
                --train_epochs 46 \
                --data_dir DATA_DIR

MODEL_DIR is the directory contains checkpoints.--batch_size 16 because I use TITAN V(16GB)

evaluate

To evaluate how model perform, you can do this with saved checkpoints:

python evaluate.py --image_data_dir IMAGE_DATA_DIR \
                   --label_data_dir IMAGE_DATA_DIR \
                   --evaluation_data_list EVALUATION_DATA_LIST \
                   --model_dir MODEL_DIR

The current best model build by this implementation achieves 75.72% mIOU on the PASCAL VOC 2012 test dataset. I also try to train this model on MS_COCO dataset, respectively used all images and only 21-class images.

Method Dataset OS mIOU
paper deeplabv3 PASCAL VOC 2012 train 16 77.21%
repo deeplabv3 PASCAL VOC 2012 train+aug 16 75.72%
repo deeplabv3 MS-COCO 21-class 16 69.11%
repo deeplabv3 MS-COCO 91-class 16 56.34%

inference

To apply semantic segmentation to your image, you can do as follow:

python inference.py --data_dir DATA_DIR \
                    --infer_data_list INFER_DATA_LIST \
                    --model_dir MODEL_DIR \
                    --output_dir OUTPUT_DIR

Deeplabv3+


Running the train/eval/vis jobs

A local training job using xception_65 can be run with the following command:

# From tensorflow/models/research/
python deeplab/train.py \
    --logtostderr \
    --training_number_of_steps=30000 \
    --train_split="train" \
    --model_variant="xception_65" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --train_crop_size=513 \
    --train_crop_size=513 \
    --train_batch_size=1 \
    --dataset="pascal_voc_seg" \
    --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
    --train_logdir=${PATH_TO_TRAIN_DIR} \
    --dataset_dir=${PATH_TO_DATASET}

where ${PATH_TO_INITIAL_CHECKPOINT} is the path to the initial checkpoint (usually an ImageNet pretrained checkpoint), ${PATH_TO_TRAIN_DIR} is the directory in which training checkpoints and events will be written to, and ${PATH_TO_DATASET} is the directory in which the PASCAL VOC 2012 dataset resides.

Note that for {train,eval,vis}.py:

  1. In order to reproduce our results, one needs to use large batch size (> 12), and set fine_tune_batch_norm = True. Here, we simply use small batch size during training for the purpose of demonstration. If the users have limited GPU memory at hand, please fine-tune from our provided checkpoints whose batch norm parameters have been trained, and use smaller learning rate with fine_tune_batch_norm = False.

  2. The users should change atrous_rates from [6, 12, 18] to [12, 24, 36] if setting output_stride=8.

  3. The users could skip the flag, decoder_output_stride, if you do not want to use the decoder structure.

A local evaluation job using xception_65 can be run with the following command:

# From tensorflow/models/research/
python deeplab/eval.py \
    --logtostderr \
    --eval_split="val" \
    --model_variant="xception_65" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --eval_crop_size=513 \
    --eval_crop_size=513 \
    --dataset="pascal_voc_seg" \
    --checkpoint_dir=${PATH_TO_CHECKPOINT} \
    --eval_logdir=${PATH_TO_EVAL_DIR} \
    --dataset_dir=${PATH_TO_DATASET}

where ${PATH_TO_CHECKPOINT} is the path to the trained checkpoint (i.e., the path to train_logdir), ${PATH_TO_EVAL_DIR} is the directory in which evaluation events will be written to, and ${PATH_TO_DATASET} is the directory in which the PASCAL VOC 2012 dataset resides.

A local visualization job using xception_65 can be run with the following command:

# From tensorflow/models/research/
python deeplab/vis.py \
    --logtostderr \
    --vis_split="val" \
    --model_variant="xception_65" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --vis_crop_size=513 \
    --vis_crop_size=513 \
    --dataset="pascal_voc_seg" \
    --checkpoint_dir=${PATH_TO_CHECKPOINT} \
    --vis_logdir=${PATH_TO_VIS_DIR} \
    --dataset_dir=${PATH_TO_DATASET}

where ${PATH_TO_CHECKPOINT} is the path to the trained checkpoint (i.e., the path to train_logdir), ${PATH_TO_VIS_DIR} is the directory in which evaluation events will be written to, and ${PATH_TO_DATASET} is the directory in which the PASCAL VOC 2012 dataset resides. Note that if the users would like to save the segmentation results for evaluation server, set also_save_raw_predictions = True.

To do stacking, you should use different checkpoints and dataset to get 2 results.

Stacking


now you get three results, and then do stacking. The stacking rule is vote, choose the most possible label about every pixel through above three results. Do as follow:

python pixel_stacking.py --path_val PATH_VAL \
                         --path_aug PATH_AUG \
                         --path_ori PATH_ORI \
                         --path_stacking PATH_STACKING

PATH_VAL、PATH_AUG、PATH_ORI are the directories contains three results, PATH_STACKING is the path to output.

denseCRF


Do as follow, then you can get the result after denseCRF.

python densecrf_inference.py --image_data_dir IMAGE_DATA_DIR \
                             --label_data_dir LABEL_DATA_DIR \
                             --output_dir OUTPUT_DIR \
                             --test_data_list TEST_DATA_LIST

IMAGE_DATA_DIR is the path of original images, LABEL_DATA_DIR is the path of segmentation results. you can view all results through the following table.

Method Dataset OS mIOU
paper deeplabv3 PASCAL VOC 2012 train 16 77.21%
repo deeplabv3 PASCAL VOC 2012 train+aug 16 75.72%
repo deeplabv3 MS-COCO 21-class 16 69.11%
repo deeplabv3 MS-COCO 31-class 16 56.34%
paper deeplabv3+ PASCAL VOC 2012 train+aug 16 83.68
paper deeplabv3+ PASCAL VOC 2012 train+val 16 87.8
repo deeplabv3+deeplabv3++deeplabv3+ - - 88.1
repo deeplabv3+deeplabv3++deeplabv3++denseCRF - - 84.12

You can get more details in my blog

Acknowledgements


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.