foreveryounggithub / mtcnn Goto Github PK

Repository for "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks", implemented with Caffe, C++ interface.

CMake 1.72% C++ 92.21% Makefile 0.94% Shell 0.93% Perl 1.82% Python 2.37%

mtcnn's Introduction

MTCNN

Repository for "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks", implemented with Caffe, C++ interface. Compared with Cascade CNN, MTCNN integrates the detection net and calibration net into one net. Moreover, face alignment is also applied in the same net.

Result

Face Detection

The results of each procedure in MTCNN are contained in result folder. The final results are shown in the flowing. These two pictures are collected by FDDB separately. The MTCNN fails to detect a face which is contained in FDDB in the left picture, while it detect a face in the right picture which is not contained in FDDB. (Better than the benchmark in some cases.)

Face Alignment

Time Cost

The average time cost is faster than Cascade CNN which is 0.197 s/frame. The result is generated by testing a 1080P live video.

Accuracy

The accuracy in FDDB which is higher than 0.9. The model contained in as a pretrain model and improve the result

How to train

A hdf5 dataset is necessray to train a model which has multiple labels in a Caffe model. A sample of script which can generate .hdf5 file is list here.

In this sample, you need to prepare 4 txt file, which contains label (0 or 1), landmark (ratio in the cropped image), regression box (ratio), and cropped image pathes. A database contained landmarks infomation is needed to generate the sample with these multiple attribute, such as CelebA. Then change the pathes in the sample code. The output hdf5 file is shown in train_file_path. Notice the image size needs to be changed to generate suitable data for you net.

label_path = '../dataset/label.txt'
landmark_path = '../dataset/landmark.txt'
regression_box_path = '../dataset/regression_box.txt'
crop_image_path = '../dataset/crop_image.txt'
train_file_path = '../dataset/train_24.hd5'

Then, write the path of the hdf5 file into a txt and contain the txt in prototxt file.

hdf5_data_param {
   source: "/Users/Young/Documents/Programming/MTCNN/MTCNN_train/test_48.txt"
   batch_size: 100
 }

Run the shell file and train the model.

Train strategy

The images are used to train the CNN divide into 3 groups: positive face, negative face and part of the face. When the Intersection-over-Union (IoU) ratio of the cropped image is higher than 0.7, lower than 0.3, or between 0.3 and 0.7 to any ground truth provided by dataset, it belongs to the positive face, the negative face or the part of the face. (You can decide the thresholds by yourself.) The general train process is shown in following figure, while the detialed train processes are also listed.

Cropping the images from the dataset randomly, and dividing into positive face, negative face and part of the face based on the IoU between the ground truth and cropped image;
Training the P-Net based on the randomly cropped image;
Cropping the image from the dataset based on the detected bounding box of the P-Net, dividing it and utilizing it to fine-tuning the P-Net;
Training the R-Net based on the data used to fine-tuning the P-Net;
Cropping the image from the dataset based on the detected bounding box of the R-Net, dividing it and utilizing it to fine-tuning the R-Net;
Training the O-Net based on the data used to fine-tuning the P-Net and R-Net;
Cropping the image from the dataset based on the detected bounding box of the O-Net, dividing it and utilizing it to fine-tuning the O-Net.

The example label for the 3 types data:

positive face:

face detection: 1;
face landmark: [0.1,0.2,0.3,0.4,0.5,0.1,0.2,0.3,0.4,0.5];
face regression: [0.1,0.1,0.1,0.1].

part of the face:

face detection: 1;
face landmark: [0.1,0.2,0.3,0.4,0.5,0.1,0.2,0.3,0.4,0.5];
face regression: [0.4,0.4,0.4,0.4].

negative face:

face detection: 0;
face landmark: [0,0,0,0,0,0,0,0,0,0];
face regression: [0,0,0,0].

You can also train the face detection and regression for the dataset without landmark label. The model is then used to train the face landmark.

mtcnn's People

Contributors

Stargazers

Watchers

Forkers

kevinhuuu lijian8 huanleo statml caozhengquan cv9527 kingvision elegantgod runauto brightown jobmpc andyhx donnyyou luhairong11 zhuhongweiyi benjamesbabala templeblock soledad89 jwmneu lyk125 wait1988 zhangxujinsh hzq-github yangluoluo oguzhanmeteozturk zhukkang jerrybonjour ckrunauto apprisi boosting xperzy binwanggit aitechnology tianboguangding chicn facear wang2bo2 zhencang stoneyang-face zgsxwsdxg barongeng lunwk wu-ruijie fireae qiyuefeixue sangcmi arasharchor shiyongde lubinboooos wyc2015fq cptsteven zbxzc35 2php delongqilinksprite lgtkgtv heartrain shaoyanguo jia0511 billtiger hdjsjyl wasai18 iscas-lee xuguozhi tzhang2014 smouse marvin521 xiat0tim fireeyesgit hitdong berli deepxkn yujian0534 honghucode kenh1991 zouliangyu liyi0040 mayidudu column6942 merlin2013 bhuwendongchao rgbitx sinianzaibeifang lvpchen 3rdzhang ewrfcas kuaikuaikim leehungxd szad670401 dreadlord1984 aaronlau0 huazhiai jellydl laulian pchankh opensuperman issac8huxley w510056105 bigcowpeking jimeffry trendstream

mtcnn's Issues

Can I know how you suppress loss that are not used?

According to the MTCNN paper, "some of the losses are not used". For example, for a negative example, not bounding box and landmark points will be detected. Therefore, regression loss and landmark loss are not used. Can I know which part of your code does that? Thanks.

train data label

@foreverYoungGitHub hei, thanks for your work ,could you show us an example of train image's label?

license?

hello! nice implementation, is this under the BSD 3-Clause license?

How to implement face alignment？

Thanks for your code. Could you tell me how to implement face alignment?
Thank you.

generate hdf5 file?

labels = np.concatenate((label, regression_box, landmark), axis = 1)
with h5py.File(train_file_path, 'w') as f:
f['data'] = a
f['labels'] = labels
f['regression'] = regression_box
f['landmark'] = landmark

there are no need to use f['regression'] = regression_box f['landmark'] = landmark
because labels = np.concatenate((label, regression_box, landmark), axis = 1) has regression and landmark

does right?

Fine tuned landmark

Hi @foreverYoungGitHub,

Based on your sentence in README "You can also train the face detection and regression for the dataset without landmark label. The model is then used to train the face landmark."
Can we fine tune the ONet only using landmark data ?.

Thanks

The FDDB result is lower than authors?

I found there is a gap between yours with author's result. Where is the trouble make this lower?

three types train data's proportion

hi, @foreverYoungGitHub , what's the proprotion of three types train data? positive : part : negative= ? :? :?

regression box bug？

            bbox.height = bounding_box_[j].height + regression_box_temp_[4*j+2] * bounding_box_[j].height;
            bbox.width = bounding_box_[j].width + regression_box_temp_[4*j+3] * bounding_box_[j].width;

看mathlab 代码，regression box 对应的应该是 x1,y1; x2,y2 两个坐标，而不是height，width 造成最终结果的bbox位置会有点歪，改正后效果更好些

train data label

Hi Liuyang,could you give us a sample about three types of image's labels ?

Training

Cool project! I'm looking at using either this project or your Cascade CNN detector for a general object detector. I had a couple questions, hoping you can help :)

How fast is MTCNN on CPU?
Is it possible to upload your training scripts? Or samples of the input for the lmdb databases that Caffe needs? I'll need to create my own training data -- not quite sure how to do it.

Thanks!

what's the format of label.txt

Hi ,

I wander what's the format of label.txt & landmark.txt & regression_box.txt & crop_image.txt.
Could you kindly tell me?

Thank you very much!

why loss of landmark task does not descend?

I have trained your codes many times, but the loss of landmark task does not converge. I don' t known what is wrong. When i only train face classification and regression of bounding boxes, losses of these tasks both descend. Why?

The trump.jpg can't achieve your accurate!

Such as the title, I cloned and not modified, but the trump.jpg can't achieve your accurate. I want to know why?
Thank you!

第三层特征点和roi回归loss为欧式，是否5点回归精度有影响？

hi!
good job! 发现训练配置文件里面roi和landmask回归loss为欧式，请问训练最终结果和原版文章精度相差多少？谢谢！

code of generate training data

@foreverYoungGitHub could you share the code of generate training data?

How to calculation bounding_box_ in your code in?

In MTCNN.ccp ,could you tell me you method that how to calculation bounding_box_ in ?

pts loss 层如何忽略负样本

数据处理时，我最终为下面的格式：（每行：图片位置 1个label 4个box 10个landmark）
pos1.jpg 1 0.1 0.2 0.3 0.4 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
neg1.jpg 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
part1.jpg -1 0.1 0.2 0.3 0.4 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
landmark1.jpg -1 -1 -1 -1 -1 0.1 0.3 0.3 0.4 0.5 0.5 0.3 0.2 0.7 0.7
我在算landmark loss层时，如何忽略其他样本，您是重写的loss层么还是？看prototxt您并没有重写，还是我前面忽略了什么信息，谢谢。

little offset in face alignment

For trump.jpg, compared with your result, there is little offset in face alignment phase when the main.cpp calls MTCNN.detection_TEST(), could you tell me how can I improve the face alignmnet precision?

Train the network

I want to train MTCNN using my own dataset. can you please give me some hint to do that?, for example:

Format of the dataset, like naming the folder and image.
Train the network to produce the model

train data label

@foreverYoungGitHub hi, my dear liuyang, based on your lastest reply if an image is H*W, ground truth is (x0,y0,x1,y1), (x0,y0) is left top point of the gruoud truth , a boundbox is (x2,y2,x3,y3),IOU with groud truth >0.65, we consider it as an positive example, does the train label is
( (x0-x2)/W, (y0-y2)/H, (x1-x3)/W, (y1-y3)/H ) ?
But if this is right, the train labels are near 0, we alse set the negative regression values to (0, 0, 0, 0), Does it is ok? i mean both negative and positive samples' regression values are near 0

how to use the mtcnn.cpp

Hi:
i use the detect code for detecting face and alignment face. but i got error. and the code only output
face detect result, not output alignment result.
can you provide a complete example?
thanks

about the label

you give an example of labels of the true ,part,negtive face image. true is 1,part 1,neg 0
why true and part have the same label.some other Programmers set part as -1,
is it practical to train models?

如何忽略多余的标签

训练人脸检测的时候是3部分训练数据，2个任务进行的，非人脸，人脸，部分人脸，分类标签分别为0，1，-1，请问如何忽略掉-1的分类标签进行分类训练呢，又是如何忽略掉非人脸的回归标签进行回归任务训练的呢，

文本文件格式内容是什么？

label_path = '../dataset/label.txt'

landmark_path = '../dataset/landmark.txt'

regression_box_path = '../dataset/regression_box.txt'

crop_image_path = '../dataset/crop_image.txt'
你能发个样本吗，我想这样就可以按照你的流程可以训练了

problem of training models

sorry to bother you .I have trained three mofdels,det1,det2,det3 seperately .but the detetion performance is not as well as yours. the location of landmark is inaccurate especially.
I crop the image and then generate hdf5 file of different sizes ,1212,2424,4848.
and then train det1,det2,det3 by the corresponding hdf5 file 1212,2424,4848.，
am I wrong? should I train the det2 model based on the trained det1 model?

您这代码的license是什么？方便贴在网页上吗？谢谢。

bulk detect version?

mtcnn detect slow in mobile device, only CPU, 720p image avg 500ms (SAMSUNG S7 EDGE)
maybe we cant implements a bulk version, it can be more faster ?

memory leak?

in celeba_crop.cpp
char *cstr = new char[path[i].length() + 1];
no delete

which version of the caffe be used?

When I run the detection on my pc(windows).
I find some function such as Forword() is not the same definition as my caffe.
give the errors:
Error 11 error C2661: 'caffe::Net::Net' : no overloaded function takes 2 arguments D:\my programs13.0\MTCNN_foreverYoung\detection\MTCNN.cpp 30 1 MTCNN
Error 26 error C2661: 'caffe::Net::Forward' : no overloaded function takes 0 arguments D:\my programs13.0\MTCNN_foreverYoung\detection\MTCNN.cpp 346 1 MTCNN
Error 27 error C2661: 'caffe::Net::Forward' : no overloaded function takes 0 arguments D:\my programs13.0\MTCNN_foreverYoung\detection\MTCNN.cpp 382 1 MTCNN
Error 12 error C2660: 'std::shared_ptr<caffe::Net>::reset' : function does not take 1 arguments D:\my programs13.0\MTCNN_foreverYoung\detection\MTCNN.cpp 30 1 MTCNN

linking error

I'am using caffe-1.0 release version，but a linking error come across。
what's the original caffe version did you use??
@foreverYoungGitHub

CMakeFiles/MTCNN.dir/MTCNN.cpp.o: In function MTCNN::MTCNN(std::vector<std::string, std::allocator<std::string> >, std::vector<std::string, std::allocator<std::string> >)': MTCNN.cpp:(.text+0x6df): undefined reference to caffe::Net::Net(std::string const&, caffe::Phase, int, std::vector<std::string, std::allocatorstd::string > const*)'
collect2: error: ld returned 1 exit status
make[2]: *** [MTCNN] Error 1
make[1]: *** [CMakeFiles/MTCNN.dir/all] Error 2
make: *** [all] Error 2

Face Alignment

Can I know if the code does face alignment after detection
If so can I know the location from where the face alignment code starts

GPU version

When I try to run this code in GPU version, it always shows errors like this (even I try in a different GPU server).

...
}
I1030 17:39:38.964071 13238 layer_factory.hpp:77] Creating layer input
I1030 17:39:38.964112 13238 net.cpp:84] Creating Layer input
I1030 17:39:38.964131 13238 net.cpp:380] input -> data
I1030 17:39:38.965209 13238 net.cpp:122] Setting up input
I1030 17:39:38.966150 13238 net.cpp:129] Top shape: 1 3 12 12 (432)
I1030 17:39:38.966163 13238 net.cpp:137] Memory required for data: 1728
I1030 17:39:38.966179 13238 layer_factory.hpp:77] Creating layer conv1
I1030 17:39:38.966215 13238 net.cpp:84] Creating Layer conv1
I1030 17:39:38.966230 13238 net.cpp:406] conv1 <- data
I1030 17:39:38.966775 13238 net.cpp:380] conv1 -> conv1
Segmentation fault

How many training examples are you use?

How many training examples are you use? I extract 6 non-face patches, 2 part-face patches, and 2 positive patches every image, so I get totally 1 billion training data. Is this too many?

关于绘制pr图

你好！我想问下作者提供的pr图是怎么作出来的，我用o-net输出的bbox和score 拿到widerface 的eval-tools上运行，效果非常差……所以应该怎样选取bbox和score

x y w h ?

@foreverYoungGitHub Hi yang, what's your train label. is x y w h or x1 y1 x2,y2?

time cost, missing hard ware info

According to your description,1080P image is processed every 0.179 s.
What graphic card is used？

train data proportion

Hi @foreverYoungGitHub , in your reply : For example, the positive : part : negative= 1 : 3 : 3 at the beginning, while it will change to positive : part : negative= 1 : 5 : 3 in the next iteration.
you mean in a train process, we need to change every batchsize‘s data proportion ? or you mean we use 1:3:3 to train a model A.caffemodel ,then,we use this model to generate train data, the proportion set to 1:5:3 ,finetune on A.caffemodel ? then we will get B.caffemodel

该工程的gpu版本可以在多张gpu卡下运行吗？

你好，想请问一下，如果想要在多张gpu卡的电脑上运行，需要设置什么参数吗？

image label calculate

hei Liuyang many thanks for you . but how to calculate the face regression train labels? as you give-- face regression: [0.1,0.1,0.1,0.1].,,what‘s the calculation method to get these values: 0.1 0.1 0.1 0.1

when I use the celebA database to train the P_Net network will report an error.

Hi:
When I use the celebA database to train the P_Net network, the following error occurs:
I0520 22:22:14.954217 3971 net.cpp:84] Creating Layer loss_label
I0520 22:22:14.954236 3971 net.cpp:406] loss_label <- conv4-1
I0520 22:22:14.954241 3971 net.cpp:406] loss_label <- label
I0520 22:22:14.954246 3971 net.cpp:380] loss_label -> loss_label
I0520 22:22:14.954262 3971 layer_factory.hpp:77] Creating layer loss_label
F0520 22:22:14.954509 3971 softmax_loss_layer.cpp:47] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (4900 vs. 100) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be NHW, with integer values in {0, 1, ..., C-1}.
*** Check failure stack trace: ***
@ 0x7fefd7991daa (unknown)
@ 0x7fefd7991ce4 (unknown)
@ 0x7fefd79916e6 (unknown)
@ 0x7fefd7994687 (unknown)
@ 0x7fefd804eee0 caffe::SoftmaxWithLossLayer<>::Reshape()
@ 0x7fefd7fc1bd5 caffe::Net<>::Init()
@ 0x7fefd7fc3ad2 caffe::Net<>::Net()
@ 0x7fefd810e0d0 caffe::Solver<>::InitTrainNet()
@ 0x7fefd810f023 caffe::Solver<>::Init()
@ 0x7fefd810f2ff caffe::Solver<>::Solver()
@ 0x7fefd7fa2a31 caffe::Creator_SGDSolver<>()
@ 0x40ee6e caffe::SolverRegistry<>::CreateSolver()
@ 0x407efd train()
@ 0x40590c main
@ 0x7fefd699af45 (unknown)
@ 0x40617b (unknown)
@ (nil) (unknown)
Aborted (core dumped)
label.txt only the label of the positive sample
Looking forward to your reply

run MTCNN error

Thanks for your work!
I used cmake to compile the code.
but when I run the executable file MTCNN ,I got this backtrace:

*** Error in `./MTCNN': free(): invalid next size (fast): 0x00000000023e9bd0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f49b12f17e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x7fe0a)[0x7f49b12f9e0a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f49b12fd98c]
./MTCNN[0x416950]
./MTCNN[0x414d54]
./MTCNN[0x411edc]
./MTCNN[0x40eea1]
./MTCNN[0x40c5b9]
./MTCNN[0x407af5]
./MTCNN[0x407191]
./MTCNN[0x40638b]
./MTCNN[0x40429d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f49b129a830]
./MTCNN[0x403e49]
How could I deal with it ?

trainingdata

Dear author, can you providea link for your prepared training data in baidu pan? thanks

计算landmark loss 层时如何忽略batch中的其他三类样本？

The GPU performence nearly same as CPU?

Hi, I have used your codes, and through macro select cpu or gpu for testing not for training. But I find this tow ways performance nearly same.
By the way, I want to know, can I use your model directly detection multi-faces by no training?
Thank you!

the code can be used to train mtcnn model?

@foreverYoungGitHub thx share the code, can the code be used to train mtcnn model?

why you transpose sample_float in preprocess img?

when test, why you transpose sample_float image in prepocess? I can not make sense.

O-Net error?

Hello! i try to get it running under windows, it runs normally and then closes unexpectedly when entering the O-Net phase.. using the MTCNN::detection_TEST function.. some one experiencing the same? or do you know what could be going wrong @foreverYoungGitHub ? i will try to debug it later :)
cheers!

I generate 'rectangle.txt' rather then 'regression_box.txt' in CelebA dataset. Am i wrong?

CelebA数据库如何生成训练数据

你好：
请问你是如何将CelebA数据库生成你 generate_hdf5.py中的
label_path = '../dataset/label.txt'
landmark_path = '../dataset/landmark.txt'
regression_box_path = '../dataset/regression_box.txt'
crop_image_path = '../dataset/crop_image.txt'
train_file_path = '../dataset/train_24.hd5'
caffe菜鸟,期待你的回复谢谢

foreveryounggithub / mtcnn Goto Github PK

mtcnn's Introduction

MTCNN

Result

Face Detection

Face Alignment

Time Cost

Accuracy

How to train

Train strategy

The example label for the 3 types data:

mtcnn's People

Contributors

Stargazers

Watchers

Forkers

mtcnn's Issues

Recommend Projects

Recommend Topics

Recommend Org