Git Product home page Git Product logo

mtcnn's Introduction

MTCNN

Repository for "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks", implemented with Caffe, C++ interface. Compared with Cascade CNN, MTCNN integrates the detection net and calibration net into one net. Moreover, face alignment is also applied in the same net.

Result

Face Detection

The results of each procedure in MTCNN are contained in result folder. The final results are shown in the flowing. These two pictures are collected by FDDB separately. The MTCNN fails to detect a face which is contained in FDDB in the left picture, while it detect a face in the right picture which is not contained in FDDB. (Better than the benchmark in some cases.)

Face Alignment

Time Cost

The average time cost is faster than Cascade CNN which is 0.197 s/frame. The result is generated by testing a 1080P live video.

Accuracy

The accuracy in FDDB which is higher than 0.9. The model contained in as a pretrain model and improve the result

How to train

A hdf5 dataset is necessray to train a model which has multiple labels in a Caffe model. A sample of script which can generate .hdf5 file is list here.

In this sample, you need to prepare 4 txt file, which contains label (0 or 1), landmark (ratio in the cropped image), regression box (ratio), and cropped image pathes. A database contained landmarks infomation is needed to generate the sample with these multiple attribute, such as CelebA. Then change the pathes in the sample code. The output hdf5 file is shown in train_file_path. Notice the image size needs to be changed to generate suitable data for you net.

label_path = '../dataset/label.txt'
landmark_path = '../dataset/landmark.txt'
regression_box_path = '../dataset/regression_box.txt'
crop_image_path = '../dataset/crop_image.txt'
train_file_path = '../dataset/train_24.hd5'

Then, write the path of the hdf5 file into a txt and contain the txt in prototxt file.

hdf5_data_param {
   source: "/Users/Young/Documents/Programming/MTCNN/MTCNN_train/test_48.txt"
   batch_size: 100
 }

Run the shell file and train the model.

Train strategy

The images are used to train the CNN divide into 3 groups: positive face, negative face and part of the face. When the Intersection-over-Union (IoU) ratio of the cropped image is higher than 0.7, lower than 0.3, or between 0.3 and 0.7 to any ground truth provided by dataset, it belongs to the positive face, the negative face or the part of the face. (You can decide the thresholds by yourself.) The general train process is shown in following figure, while the detialed train processes are also listed.

  1. Cropping the images from the dataset randomly, and dividing into positive face, negative face and part of the face based on the IoU between the ground truth and cropped image;
  2. Training the P-Net based on the randomly cropped image;
  3. Cropping the image from the dataset based on the detected bounding box of the P-Net, dividing it and utilizing it to fine-tuning the P-Net;
  4. Training the R-Net based on the data used to fine-tuning the P-Net;
  5. Cropping the image from the dataset based on the detected bounding box of the R-Net, dividing it and utilizing it to fine-tuning the R-Net;
  6. Training the O-Net based on the data used to fine-tuning the P-Net and R-Net;
  7. Cropping the image from the dataset based on the detected bounding box of the O-Net, dividing it and utilizing it to fine-tuning the O-Net.

The example label for the 3 types data:

positive face:

  • face detection: 1;
  • face landmark: [0.1,0.2,0.3,0.4,0.5,0.1,0.2,0.3,0.4,0.5];
  • face regression: [0.1,0.1,0.1,0.1].

part of the face:

  • face detection: 1;
  • face landmark: [0.1,0.2,0.3,0.4,0.5,0.1,0.2,0.3,0.4,0.5];
  • face regression: [0.4,0.4,0.4,0.4].

negative face:

  • face detection: 0;
  • face landmark: [0,0,0,0,0,0,0,0,0,0];
  • face regression: [0,0,0,0].

You can also train the face detection and regression for the dataset without landmark label. The model is then used to train the face landmark.

mtcnn's People

Contributors

foreveryounggithub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mtcnn's Issues

Can I know how you suppress loss that are not used?

According to the MTCNN paper, "some of the losses are not used". For example, for a negative example, not bounding box and landmark points will be detected. Therefore, regression loss and landmark loss are not used. Can I know which part of your code does that? Thanks.

license?

hello! nice implementation, is this under the BSD 3-Clause license?

generate hdf5 file?

labels = np.concatenate((label, regression_box, landmark), axis = 1)
with h5py.File(train_file_path, 'w') as f:
f['data'] = a
f['labels'] = labels
f['regression'] = regression_box
f['landmark'] = landmark

there are no need to use f['regression'] = regression_box f['landmark'] = landmark
because labels = np.concatenate((label, regression_box, landmark), axis = 1) has regression and landmark

does right?

Fine tuned landmark

Hi @foreverYoungGitHub,

Based on your sentence in README "You can also train the face detection and regression for the dataset without landmark label. The model is then used to train the face landmark."
Can we fine tune the ONet only using landmark data ?.

Thanks

regression box bug?

            bbox.height = bounding_box_[j].height + regression_box_temp_[4*j+2] * bounding_box_[j].height;
            bbox.width = bounding_box_[j].width + regression_box_temp_[4*j+3] * bounding_box_[j].width;

看mathlab 代码,regression box 对应的应该是 x1,y1; x2,y2 两个坐标,而不是height,width 造成最终结果的bbox位置会有点歪,改正后效果更好些

train data label

Hi Liuyang,could you give us a sample about three types of image's labels ?

Training

Cool project! I'm looking at using either this project or your Cascade CNN detector for a general object detector. I had a couple questions, hoping you can help :)

  1. How fast is MTCNN on CPU?
  2. Is it possible to upload your training scripts? Or samples of the input for the lmdb databases that Caffe needs? I'll need to create my own training data -- not quite sure how to do it.

Thanks!

what's the format of label.txt

Hi ,

I wander what's the format of label.txt & landmark.txt & regression_box.txt & crop_image.txt.
Could you kindly tell me?

Thank you very much!

why loss of landmark task does not descend?

I have trained your codes many times, but the loss of landmark task does not converge. I don' t known what is wrong. When i only train face classification and regression of bounding boxes, losses of these tasks both descend. Why?

pts loss 层 如何忽略负样本

数据处理时,我最终为下面的格式:(每行:图片位置 1个label 4个box 10个landmark)
pos1.jpg 1 0.1 0.2 0.3 0.4 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
neg1.jpg 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
part1.jpg -1 0.1 0.2 0.3 0.4 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
landmark1.jpg -1 -1 -1 -1 -1 0.1 0.3 0.3 0.4 0.5 0.5 0.3 0.2 0.7 0.7
我在算landmark loss层时,如何忽略其他样本,您是重写的loss层么还是? 看prototxt您并没有重写,还是我前面忽略了什么信息,谢谢。

little offset in face alignment

For trump.jpg, compared with your result, there is little offset in face alignment phase when the main.cpp calls MTCNN.detection_TEST(), could you tell me how can I improve the face alignmnet precision?

Train the network

I want to train MTCNN using my own dataset. can you please give me some hint to do that?, for example:

  1. Format of the dataset, like naming the folder and image.
  2. Train the network to produce the model

train data label

@foreverYoungGitHub hi, my dear liuyang, based on your lastest reply if an image is H*W, ground truth is (x0,y0,x1,y1), (x0,y0) is left top point of the gruoud truth , a boundbox is (x2,y2,x3,y3),IOU with groud truth >0.65, we consider it as an positive example, does the train label is
( (x0-x2)/W, (y0-y2)/H, (x1-x3)/W, (y1-y3)/H ) ?
But if this is right, the train labels are near 0, we alse set the negative regression values to (0, 0, 0, 0), Does it is ok? i mean both negative and positive samples' regression values are near 0

how to use the mtcnn.cpp

Hi:
i use the detect code for detecting face and alignment face. but i got error. and the code only output
face detect result, not output alignment result.
can you provide a complete example?
thanks

about the label

you give an example of labels of the true ,part,negtive face image. true is 1,part 1,neg 0
why true and part have the same label.some other Programmers set part as -1,
is it practical to train models?

如何忽略多余的标签

训练人脸检测的时候是3部分训练数据,2个任务进行的,非人脸,人脸,部分人脸,分类标签分别为0,1,-1,请问如何忽略掉-1的分类标签进行分类训练呢,又是如何忽略掉非人脸的回归标签进行回归任务训练的呢,

文本文件格式内容是什么?

label_path = '../dataset/label.txt'

landmark_path = '../dataset/landmark.txt'

regression_box_path = '../dataset/regression_box.txt'

crop_image_path = '../dataset/crop_image.txt'
你能发个样本吗,我想这样就可以按照你的流程可以训练了

problem of training models

sorry to bother you .I have trained three mofdels,det1,det2,det3 seperately .but the detetion performance is not as well as yours. the location of landmark is inaccurate especially.
I crop the image and then generate hdf5 file of different sizes ,1212,2424,4848.
and then train det1,det2,det3 by the corresponding hdf5 file 12
12,2424,4848.,
am I wrong? should I train the det2 model based on the trained det1 model?

bulk detect version?

mtcnn detect slow in mobile device, only CPU, 720p image avg 500ms (SAMSUNG S7 EDGE)
maybe we cant implements a bulk version, it can be more faster ?

memory leak?

in celeba_crop.cpp
char *cstr = new char[path[i].length() + 1];
no delete

which version of the caffe be used?

When I run the detection on my pc(windows).
I find some function such as Forword() is not the same definition as my caffe.
give the errors:
Error 11 error C2661: 'caffe::Net::Net' : no overloaded function takes 2 arguments D:\my programs13.0\MTCNN_foreverYoung\detection\MTCNN.cpp 30 1 MTCNN
Error 26 error C2661: 'caffe::Net::Forward' : no overloaded function takes 0 arguments D:\my programs13.0\MTCNN_foreverYoung\detection\MTCNN.cpp 346 1 MTCNN
Error 27 error C2661: 'caffe::Net::Forward' : no overloaded function takes 0 arguments D:\my programs13.0\MTCNN_foreverYoung\detection\MTCNN.cpp 382 1 MTCNN
Error 12 error C2660: 'std::shared_ptr<caffe::Net>::reset' : function does not take 1 arguments D:\my programs13.0\MTCNN_foreverYoung\detection\MTCNN.cpp 30 1 MTCNN

linking error

I'am using caffe-1.0 release version,but a linking error come across。
what's the original caffe version did you use??
@foreverYoungGitHub

CMakeFiles/MTCNN.dir/MTCNN.cpp.o: In function MTCNN::MTCNN(std::vector<std::string, std::allocator<std::string> >, std::vector<std::string, std::allocator<std::string> >)': MTCNN.cpp:(.text+0x6df): undefined reference to caffe::Net::Net(std::string const&, caffe::Phase, int, std::vector<std::string, std::allocatorstd::string > const*)'
collect2: error: ld returned 1 exit status
make[2]: *** [MTCNN] Error 1
make[1]: *** [CMakeFiles/MTCNN.dir/all] Error 2
make: *** [all] Error 2

Face Alignment

Can I know if the code does face alignment after detection
If so can I know the location from where the face alignment code starts

GPU version

When I try to run this code in GPU version, it always shows errors like this (even I try in a different GPU server).

...
}
I1030 17:39:38.964071 13238 layer_factory.hpp:77] Creating layer input
I1030 17:39:38.964112 13238 net.cpp:84] Creating Layer input
I1030 17:39:38.964131 13238 net.cpp:380] input -> data
I1030 17:39:38.965209 13238 net.cpp:122] Setting up input
I1030 17:39:38.966150 13238 net.cpp:129] Top shape: 1 3 12 12 (432)
I1030 17:39:38.966163 13238 net.cpp:137] Memory required for data: 1728
I1030 17:39:38.966179 13238 layer_factory.hpp:77] Creating layer conv1
I1030 17:39:38.966215 13238 net.cpp:84] Creating Layer conv1
I1030 17:39:38.966230 13238 net.cpp:406] conv1 <- data
I1030 17:39:38.966775 13238 net.cpp:380] conv1 -> conv1
Segmentation fault

How many training examples are you use?

How many training examples are you use? I extract 6 non-face patches, 2 part-face patches, and 2 positive patches every image, so I get totally 1 billion training data. Is this too many?

关于绘制pr图

你好!我想问下作者提供的pr图是怎么作出来的,我用o-net输出的bbox和score 拿到widerface 的eval-tools上运行,效果非常差……所以应该怎样选取bbox和score

train data proportion

Hi @foreverYoungGitHub , in your reply : For example, the positive : part : negative= 1 : 3 : 3 at the beginning, while it will change to positive : part : negative= 1 : 5 : 3 in the next iteration.
you mean in a train process, we need to change every batchsize‘s data proportion ? or you mean we use 1:3:3 to train a model A.caffemodel ,then,we use this model to generate train data, the proportion set to 1:5:3 ,finetune on A.caffemodel ? then we will get B.caffemodel

image label calculate

hei Liuyang many thanks for you . but how to calculate the face regression train labels? as you give-- face regression: [0.1,0.1,0.1,0.1].,,what‘s the calculation method to get these values: 0.1 0.1 0.1 0.1

when I use the celebA database to train the P_Net network will report an error.

Hi:
When I use the celebA database to train the P_Net network, the following error occurs:
I0520 22:22:14.954217 3971 net.cpp:84] Creating Layer loss_label
I0520 22:22:14.954236 3971 net.cpp:406] loss_label <- conv4-1
I0520 22:22:14.954241 3971 net.cpp:406] loss_label <- label
I0520 22:22:14.954246 3971 net.cpp:380] loss_label -> loss_label
I0520 22:22:14.954262 3971 layer_factory.hpp:77] Creating layer loss_label
F0520 22:22:14.954509 3971 softmax_loss_layer.cpp:47] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (4900 vs. 100) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be NHW, with integer values in {0, 1, ..., C-1}.
*** Check failure stack trace: ***
@ 0x7fefd7991daa (unknown)
@ 0x7fefd7991ce4 (unknown)
@ 0x7fefd79916e6 (unknown)
@ 0x7fefd7994687 (unknown)
@ 0x7fefd804eee0 caffe::SoftmaxWithLossLayer<>::Reshape()
@ 0x7fefd7fc1bd5 caffe::Net<>::Init()
@ 0x7fefd7fc3ad2 caffe::Net<>::Net()
@ 0x7fefd810e0d0 caffe::Solver<>::InitTrainNet()
@ 0x7fefd810f023 caffe::Solver<>::Init()
@ 0x7fefd810f2ff caffe::Solver<>::Solver()
@ 0x7fefd7fa2a31 caffe::Creator_SGDSolver<>()
@ 0x40ee6e caffe::SolverRegistry<>::CreateSolver()
@ 0x407efd train()
@ 0x40590c main
@ 0x7fefd699af45 (unknown)
@ 0x40617b (unknown)
@ (nil) (unknown)
Aborted (core dumped)
label.txt only the label of the positive sample
Looking forward to your reply

run MTCNN error

Thanks for your work!
I used cmake to compile the code.
but when I run the executable file MTCNN ,I got this backtrace:

*** Error in `./MTCNN': free(): invalid next size (fast): 0x00000000023e9bd0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f49b12f17e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x7fe0a)[0x7f49b12f9e0a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f49b12fd98c]
./MTCNN[0x416950]
./MTCNN[0x414d54]
./MTCNN[0x411edc]
./MTCNN[0x40eea1]
./MTCNN[0x40c5b9]
./MTCNN[0x407af5]
./MTCNN[0x407191]
./MTCNN[0x40638b]
./MTCNN[0x40429d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f49b129a830]
./MTCNN[0x403e49]
How could I deal with it ?

trainingdata

Dear author, can you providea link for your prepared training data in baidu pan? thanks

计算landmark loss 层时如何忽略batch中的其他三类样本?

数据处理时,我最终为下面的格式:(每行:图片位置 1个label 4个box 10个landmark)
pos1.jpg 1 0.1 0.2 0.3 0.4 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
neg1.jpg 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
part1.jpg -1 0.1 0.2 0.3 0.4 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
landmark1.jpg -1 -1 -1 -1 -1 0.1 0.3 0.3 0.4 0.5 0.5 0.3 0.2 0.7 0.7
我在算landmark loss层时,如何忽略其他样本,您是重写的loss层么还是? 看prototxt您并没有重写,还是我前面忽略了什么信息,谢谢。

The GPU performence nearly same as CPU?

Hi, I have used your codes, and through macro select cpu or gpu for testing not for training. But I find this tow ways performance nearly same.
By the way, I want to know, can I use your model directly detection multi-faces by no training?
Thank you!

O-Net error?

Hello! i try to get it running under windows, it runs normally and then closes unexpectedly when entering the O-Net phase.. using the MTCNN::detection_TEST function.. some one experiencing the same? or do you know what could be going wrong @foreverYoungGitHub ? i will try to debug it later :)
cheers!

CelebA数据库如何生成训练数据

你好:
请问你是如何将CelebA数据库生成 你 generate_hdf5.py中的
label_path = '../dataset/label.txt'
landmark_path = '../dataset/landmark.txt'
regression_box_path = '../dataset/regression_box.txt'
crop_image_path = '../dataset/crop_image.txt'
train_file_path = '../dataset/train_24.hd5'
caffe菜鸟,期待你的回复 谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.