tfwu / facedetection-convnet-3d Goto Github PK

Source code for our ECCV16 paper, Face Detection with End-to-End Integration of a ConvNet and a 3D Model

License: Other

MATLAB 0.67% Python 23.11% CMake 1.56% Makefile 1.09% R 3.73% C++ 38.68% Java 0.86% Shell 0.46% Batchfile 0.01% Jupyter Notebook 21.43% C 1.17% Cuda 1.73% Protocol Buffer 0.02% Scala 5.48%

facedetection-convnet-3d's Introduction

Face Detection with End-to-End Integration of a ConvNet and a 3D Model

Reproducing all experimental results in the paper

Yunzhu Li, Benyuan Sun, Tianfu Wu and Yizhou Wang, "Face Detection with End-to-End Integration of a ConvNet and a 3D Model", ECCV 2016 (https://arxiv.org/abs/1606.00850)

The code is mainly written by Y.Z. Li ([email protected]) and B.Y. Sun ([email protected]). Please feel free to report issues to him.

The code is based on the mxnet package (https://github.com/dmlc/mxnet/).

If you find the code is useful in your projects, please consider to cite the paper,

@inproceedings{FaceDetection-ConvNet-3D, author = {Yunzhu Li and Benyuan Sun and Tianfu Wu and Yizhou Wang}, title = {Face Detection with End-to-End Integration of a ConvNet and a 3D Model}, booktitle = {ECCV}, year = {2016} }

Compile

Please refer to https://github.com/dmlc/mxnet/ on how to compile

Prepare training data

Download AFLW datset and generate a list for the training data in the form of: ID file_path width height resize_factor number_of_faces [a list of information of each faces]

The information of different faces should be seperated by space and in the form: x y width height(of bounding box) x y width height(of projected bounding box) number_of_keypoints [keypoint_name keypoint_x keypoint_y projected_keypoint_x projected_keypoint_y](for every keypoint) ellipse_x ellipse_y ellipse_radius ellipse_minoraxes ellipse_majoraxes [9 parameters of scale * rotation matrix] [3 translation parameters]

Note: projected information is not used now, so it can be replaces by any number

training procedure

run Path_To_The_Code/ALFW/vgg16_rpn.py
To finetune on FDDB dataset, run Path_To_The_Code/ALFW/fddb_finetune.py

prediction procedure

AFW: run Path_To_The_Code/afw_predict.py FDDB: run Path_To_The_Code/predict_final.py

facedetection-convnet-3d's People

Contributors

Stargazers

Watchers

facedetection-convnet-3d's Issues

The Smooth l1 Loss of Key-point Locations

Hi, thanks for the work.

I've just found that the code is not doing the same way as the paper said.
For instance, The Smooth l1 Loss of Key-point Locations is not the same. In the paper, only the m predicted labels contribute to the loss. In the code, since the proj_label are the set to be all zeros besides 9 * m locations around m key points, every location will contribute to the loss.

Can anyone explain this for me?

AttributeError: 'module' object has no attribute 'Face3DProj'

line 124, in get_vgg16_rpn
face3dproj = mx.symbol.Face3DProj(
AttributeError: 'module' object has no attribute 'Face3DProj'

I'm new to mxnet, my environment is anaconda2, python2.7, cuda8.0, mxnet-cu80 1.2.0, how to solve this?

Does this project provide pretrained model?

As title says. thanks!

貌似不支持cuda8.0

这个版本已经不支持cuda8.0了
apache/mxnet#3093
能否更新～

How to install

Can you tell me which file I should download? The code here or dmlc/mxnet? @tfwu

The speed of this method

Can you tell me the speed of detecting face in a picture or how many pictures you can process during one second? I am looking forward to hearing from you. @tfwu

Data dimension of proj_label and proj_weight

In file data.py , the shapes of proj_labdel and proj_weight are (out_height, out_width, self.num_class, 2) initially. After proj_label = np.expand_dims(proj_label, axis=0) and proj_weight = np.expand_dims(proj_weight, axis=0) , it should be (1, out_height, out_width, self.num_class, 2) .

In the evaluation part of file solver.py , the shape of the ouptput of proj_regression_loss layer is (1, out_height, out_width, 20) . You directly feed proj_label , proj_weight and that output to function smoothl1_metric. Then you traverse the data through size = label.shape[0]. I wonder that you might have flatted the data somewhere, or the value of size should be 1.

Could you please explain this to me? @YunzhuLi

Some layers lack of initialization code?

Thanks for the released code. However, i found some layers parameters lack of initialized code, I uncomment the code block in init_vgg16.py:

for name, shape in zip(arg_names, arg_shapes):
        if name in ['roi_warping_fc1_weight', 'roi_warping_fc1_bias',
                    'roi_warping_fc2_weight', 'roi_warping_fc2_bias',
                    'offset_predict_weight', 'offset_predict_bias']:
            fan_in, fan_out = np.prod(shape[1:]), shape[0]
            factor = fan_in
            scale = np.sqrt(2.34 / factor)
            tempt = np.random.uniform(-scale, scale, size=shape)
            fc_args[name] = mx.nd.array(tempt, ctx)
        elif name in ['roi_warping_bn1_gamma', 'roi_warping_bn2_gamma']:
            fc_args[name] = mx.nd.ones(shape, ctx)
        elif name in ['roi_warping_bn1_beta', 'roi_warping_bn2_beta']:
            fc_args[name] = mx.nd.zeros(shape, ctx)

and change the code like below(add some layer parmas):

for name, shape in zip(arg_names, arg_shapes):
    if name in ['roi_warping_fc1_weight', 'roi_warping_fc1_bias', 'conv_proposal_weight',
     'roi_warping_fc2_weight', 'roi_warping_fc2_bias', 'conv_proposal_bias',
    'proposal_cls_score_weight', 'proposal_cls_score_bias',
    'param3d_pred_weight', 'param3d_pred_bias',
    'offset_predict_weight', 'offset_predict_bias']:
   fan_in, fan_out = np.prod(shape[1:]), shape[0]
   factor = fan_in
   scale = np.sqrt(2.34 / factor)

After this little change, the code can run with out error.

However, Do I did the initialization for the layer params in the right way? thanks!

General testing procedure?

Hi, I'm running on your ConvNet-3D model on my own database. It would be much easier for me to do testing if you kindly share with us a general testing procedure.

I'm currently going through the afw_predict.py since I though AFW is generally the easiest dataset to test on, but it seems your code need a file names "img_rects.txt" file to run. So I'm currently unable to do the testing. Hope you can help me on that.

Thanks!

Question about face3dproj_forward

my question is mainly about this function:

	MSHADOW_XINLINE DType Eval(index_t i, index_t j) const {
		using namespace std;
		const index_t dim = j % 2;
		const index_t nk = j / 2;
		const index_t w = i % src_width_;
		const index_t h = (i / src_width_) % src_height_;
		const index_t n = i / src_width_ / src_height_;

		DType a0 = data_src_.Eval(n * num_parameters_ * src_height_ +
								  (0 + dim * 4) * src_height_ + h, w);
		DType a1 = data_src_.Eval(n * num_parameters_ * src_height_ + 
								  (1 + dim * 4) * src_height_ + h, w);
		DType a2 = data_src_.Eval(n * num_parameters_ * src_height_ +
								  (2 + dim * 4) * src_height_ + h, w);
		DType a3 = data_src_.Eval(n * num_parameters_ * src_height_ +
								  (3 + dim * 4) * src_height_ + h, w);

		DType x0 = mface_src_.Eval(nk, 0);
		DType x1 = mface_src_.Eval(nk, 1);
		DType x2 = mface_src_.Eval(nk, 2);

		return a0 * x0 + a1 * x1 + a2 * x2 + a3 + \
			   (dim == 0 ? static_cast<DType>(h) / spatial_scale_ :
						   static_cast<DType>(w) / spatial_scale_);
	}

Let's set the input data shape to be (1,3,300,300), then the shape of the input data of "face3d_proj" layer should be (1,8,152,152). And the shape of the output is (1,152,152,20).

I guess the authors used the 8 parameters and the mean face to predict 10 keypoints for each point in 152x152 map.

I am not familiar with mshadow, could anyone explain how the function Eval posted above works, especially how to map the index (dim, nk, w h, n)?
Thx!