Git Product home page Git Product logo

tfwu / facedetection-convnet-3d Goto Github PK

View Code? Open in Web Editor NEW
134.0 134.0 70.0 15.49 MB

Source code for our ECCV16 paper, Face Detection with End-to-End Integration of a ConvNet and a 3D Model

License: Other

MATLAB 0.67% Python 23.11% CMake 1.56% Makefile 1.09% R 3.73% C++ 38.68% Java 0.86% Shell 0.46% Batchfile 0.01% Jupyter Notebook 21.43% C 1.17% Cuda 1.73% Protocol Buffer 0.02% Scala 5.48%

facedetection-convnet-3d's Introduction

Face Detection with End-to-End Integration of a ConvNet and a 3D Model

Reproducing all experimental results in the paper

Yunzhu Li, Benyuan Sun, Tianfu Wu and Yizhou Wang, "Face Detection with End-to-End Integration of a ConvNet and a 3D Model", ECCV 2016 (https://arxiv.org/abs/1606.00850)

The code is mainly written by Y.Z. Li ([email protected]) and B.Y. Sun ([email protected]). Please feel free to report issues to him.

The code is based on the mxnet package (https://github.com/dmlc/mxnet/).

If you find the code is useful in your projects, please consider to cite the paper,

@inproceedings{FaceDetection-ConvNet-3D, author = {Yunzhu Li and Benyuan Sun and Tianfu Wu and Yizhou Wang}, title = {Face Detection with End-to-End Integration of a ConvNet and a 3D Model}, booktitle = {ECCV}, year = {2016} }

Compile

Please refer to https://github.com/dmlc/mxnet/ on how to compile

Prepare training data

Download AFLW datset and generate a list for the training data in the form of: ID file_path width height resize_factor number_of_faces [a list of information of each faces]

The information of different faces should be seperated by space and in the form: x y width height(of bounding box) x y width height(of projected bounding box) number_of_keypoints [keypoint_name keypoint_x keypoint_y projected_keypoint_x projected_keypoint_y](for every keypoint) ellipse_x ellipse_y ellipse_radius ellipse_minoraxes ellipse_majoraxes [9 parameters of scale * rotation matrix] [3 translation parameters]

Note: projected information is not used now, so it can be replaces by any number

training procedure

  1. run Path_To_The_Code/ALFW/vgg16_rpn.py
  2. To finetune on FDDB dataset, run Path_To_The_Code/ALFW/fddb_finetune.py

prediction procedure

AFW: run Path_To_The_Code/afw_predict.py FDDB: run Path_To_The_Code/predict_final.py

facedetection-convnet-3d's People

Contributors

tfwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facedetection-convnet-3d's Issues

The Smooth l1 Loss of Key-point Locations

Hi, thanks for the work.

I've just found that the code is not doing the same way as the paper said.
For instance, The Smooth l1 Loss of Key-point Locations is not the same. In the paper, only the m predicted labels contribute to the loss. In the code, since the proj_label are the set to be all zeros besides 9 * m locations around m key points, every location will contribute to the loss.

Can anyone explain this for me?

How to install

Can you tell me which file I should download? The code here or dmlc/mxnet? @tfwu

The speed of this method

Can you tell me the speed of detecting face in a picture or how many pictures you can process during one second? I am looking forward to hearing from you. @tfwu

Data dimension of proj_label and proj_weight

In file data.py , the shapes of proj_labdel and proj_weight are (out_height, out_width, self.num_class, 2) initially. After proj_label = np.expand_dims(proj_label, axis=0) and proj_weight = np.expand_dims(proj_weight, axis=0) , it should be (1, out_height, out_width, self.num_class, 2) .

In the evaluation part of file solver.py , the shape of the ouptput of proj_regression_loss layer is (1, out_height, out_width, 20) . You directly feed proj_label , proj_weight and that output to function smoothl1_metric. Then you traverse the data through size = label.shape[0]. I wonder that you might have flatted the data somewhere, or the value of size should be 1.

Could you please explain this to me? @YunzhuLi

Some layers lack of initialization code?

Thanks for the released code. However, i found some layers parameters lack of initialized code, I uncomment the code block in init_vgg16.py:

for name, shape in zip(arg_names, arg_shapes):
        if name in ['roi_warping_fc1_weight', 'roi_warping_fc1_bias',
                    'roi_warping_fc2_weight', 'roi_warping_fc2_bias',
                    'offset_predict_weight', 'offset_predict_bias']:
            fan_in, fan_out = np.prod(shape[1:]), shape[0]
            factor = fan_in
            scale = np.sqrt(2.34 / factor)
            tempt = np.random.uniform(-scale, scale, size=shape)
            fc_args[name] = mx.nd.array(tempt, ctx)
        elif name in ['roi_warping_bn1_gamma', 'roi_warping_bn2_gamma']:
            fc_args[name] = mx.nd.ones(shape, ctx)
        elif name in ['roi_warping_bn1_beta', 'roi_warping_bn2_beta']:
            fc_args[name] = mx.nd.zeros(shape, ctx)

and change the code like below(add some layer parmas):

for name, shape in zip(arg_names, arg_shapes):
    if name in ['roi_warping_fc1_weight', 'roi_warping_fc1_bias', 'conv_proposal_weight',
     'roi_warping_fc2_weight', 'roi_warping_fc2_bias', 'conv_proposal_bias',
    'proposal_cls_score_weight', 'proposal_cls_score_bias',
    'param3d_pred_weight', 'param3d_pred_bias',
    'offset_predict_weight', 'offset_predict_bias']:
   fan_in, fan_out = np.prod(shape[1:]), shape[0]
   factor = fan_in
   scale = np.sqrt(2.34 / factor)

After this little change, the code can run with out error.

However, Do I did the initialization for the layer params in the right way? thanks!

General testing procedure?

Hi, I'm running on your ConvNet-3D model on my own database. It would be much easier for me to do testing if you kindly share with us a general testing procedure.

I'm currently going through the afw_predict.py since I though AFW is generally the easiest dataset to test on, but it seems your code need a file names "img_rects.txt" file to run. So I'm currently unable to do the testing. Hope you can help me on that.

Thanks!

Question about face3dproj_forward

my question is mainly about this function:

	MSHADOW_XINLINE DType Eval(index_t i, index_t j) const {
		using namespace std;
		const index_t dim = j % 2;
		const index_t nk = j / 2;
		const index_t w = i % src_width_;
		const index_t h = (i / src_width_) % src_height_;
		const index_t n = i / src_width_ / src_height_;

		DType a0 = data_src_.Eval(n * num_parameters_ * src_height_ +
								  (0 + dim * 4) * src_height_ + h, w);
		DType a1 = data_src_.Eval(n * num_parameters_ * src_height_ + 
								  (1 + dim * 4) * src_height_ + h, w);
		DType a2 = data_src_.Eval(n * num_parameters_ * src_height_ +
								  (2 + dim * 4) * src_height_ + h, w);
		DType a3 = data_src_.Eval(n * num_parameters_ * src_height_ +
								  (3 + dim * 4) * src_height_ + h, w);

		DType x0 = mface_src_.Eval(nk, 0);
		DType x1 = mface_src_.Eval(nk, 1);
		DType x2 = mface_src_.Eval(nk, 2);

		return a0 * x0 + a1 * x1 + a2 * x2 + a3 + \
			   (dim == 0 ? static_cast<DType>(h) / spatial_scale_ :
						   static_cast<DType>(w) / spatial_scale_);
	}

Let's set the input data shape to be (1,3,300,300), then the shape of the input data of "face3d_proj" layer should be (1,8,152,152). And the shape of the output is (1,152,152,20).

I guess the authors used the 8 parameters and the mean face to predict 10 keypoints for each point in 152x152 map.

I am not familiar with mshadow, could anyone explain how the function Eval posted above works, especially how to map the index (dim, nk, w h, n)?
Thx!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.