Git Product home page Git Product logo

semantic_human_matting's Introduction

Semantic_Human_Matting

The project is my reimplement of paper (Semantatic Human Matting) from Alibaba, it proposes a new end-to-end scheme to predict human alpha from image. SHM is the first algorithm that learns to jointly fit both semantic information and high quality details with deep networks.

One of the main contributions of the paper is that: A large scale high quality human matting dataset is created. It contains 35,513 unique human images with corresponding alpha mattes. But, the dataset is not avaiable.

I collected 6k+ images as my dataset of the project. Worth noting that, the architecture of my network, which builded with mobilenet and shallow encoder-decoder net, is a light version compaired to original implement.

update 2019/04/08

πŸ‘ πŸ‘ The company ηˆ±εˆ†ε‰² shared their dataset recently !

Requirements

  • python3.5 / 3.6
  • pytorch >= 0.4
  • opencv-python

Usage

Directory structure of the project:

Semantic_Human_Matting
β”‚   README.md
β”‚   train.py
β”‚   train.sh
|   test_camera.py
|   test_camera.sh
└───model
β”‚   β”‚   M_Net.py
β”‚   β”‚   T_Net.py
β”‚   β”‚   network.py
└───data
    β”‚   dataset.py
    β”‚   gen_trimap.py
    |   gen_trimap.sh
    |   knn_matting.py
    |   knn_matting.sh
    └───image
    └───mask
    └───trimap
    └───alpha

Step 1: prepare dataset

./data/train.txt contain image names according to 6k+ images(./data/image) and corresponding masks(./data/mask).

Use ./data/gen_trimap.sh to get trimaps of the masks.

Use ./data/knn_matting.sh to get alpha mattes(it will take long time...).

Step 2: build network

SHM

  • Trimap generation: T-Net

    The T-Net plays the role of semantic segmentation. I use mobilenet_v2+unet as T-Net to predict trimap.

  • Matting network: M-Net

    The M-Net aims to capture detail information and generate alpha matte. I build M-Net same as the paper, but reduce channels of the original net.

  • Fusion Module

    Probabilistic estimation of alpha matte can be written as

Step 3: build loss

The overall prediction loss for alpha_p at each pixel is

The total loss is

Read papers for more details, and my codes for two loss functions:

    # -------------------------------------
    # classification loss L_t
    # ------------------------
    criterion = nn.CrossEntropyLoss()
    L_t = criterion(trimap_pre, trimap_gt[:,0,:,:].long())

    # -------------------------------------
    # prediction loss L_p
    # ------------------------
    eps = 1e-6
    # l_alpha
    L_alpha = torch.sqrt(torch.pow(alpha_pre - alpha_gt, 2.) + eps).mean()

    # L_composition
    fg = torch.cat((alpha_gt, alpha_gt, alpha_gt), 1) * img
    fg_pre = torch.cat((alpha_pre, alpha_pre, alpha_pre), 1) * img
    L_composition = torch.sqrt(torch.pow(fg - fg_pre, 2.) + eps).mean()
    L_p = 0.5*L_alpha + 0.5*L_composition

Step 4: train

Firstly, pre_train T-Net, use ./train.sh as :

python3 train.py \
	--dataDir='./data' \
	--saveDir='./ckpt' \
	--trainData='human_matting_data' \
	--trainList='./data/train.txt' \
	--load='human_matting' \
	--nThreads=4 \
	--patch_size=320 \
	--train_batch=8 \
	--lr=1e-3 \
	--lrdecayType='keep' \
	--nEpochs=1000 \
	--save_epoch=1 \
	--train_phase='pre_train_t_net'

Then, train end to end, use ./train.sh as:

python3 train.py \
	--dataDir='./data' \
	--saveDir='./ckpt' \
	--trainData='human_matting_data' \
	--trainList='./data/train.txt' \
	--load='human_matting' \
	--nThreads=4 \
	--patch_size=320 \
	--train_batch=8 \
	--lr=1e-4 \
	--lrdecayType='keep' \
	--nEpochs=2000 \
	--save_epoch=1 \
	--finetuning \
	--train_phase='end_to_end'

Test

run ./test_camera.sh

semantic_human_matting's People

Contributors

lizhengwei1992 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.