Git Product home page Git Product logo

Comments (4)

iacopomasi avatar iacopomasi commented on July 17, 2024 1

Hi @melgor ,
I will try to reply to all your questions. I think there is some confusion and I hope this can clear it out.

  1. In this question, I do not understand if you refer to the testing part or training part of FPN. I will try to describe the training part. For FPN training, what we did was to use an off-the-shelf landmark detector method (OpenFace)[3] to estimate 6DoF for each input image in the training set, given a generic 3D model with labeled 3D landmarks in correspondence with the 2D ones. A novel part of the work is that we augment the set with a simple 2D similarity transformation to generate very tough samples, on which standard landmark detectors are likely to fail (Fig. 4 in the paper). In our case, since we know the transformation parameters (we generated them), we can map the pose from the original, "easy" input image on the perturbed image, getting a new "labeled" pose for free.
    Note that in this process we did not make use of 3D augmentation [A] when training FPN.
    3D augmentation to generate multiple views has been used when training the recognition network but this is another paper [B,C].

  2. See reply above. We used OpenFace [3] but other methods can work as well. In order to make the method robust, you need to perform 2D augmentation as described in the paper.

  3. When training the FPN model, the input images are always 224x224. Our 2D augmentation for FPN training is stochastic; that is, we randomly sampled the transformation parameters by varying rotation angle, scale, and translation. So at training time, the faces "are always moving in the 224x224 support". The way the faces are jittered is using a 2D transformation s,R,t and some amount blur to simulate low-resolution faces in videos. In testing, we fixed the crop to the face to roughly contain the entire head in the 224x224 input image.

  4. The testing for recognition is pretty complex and does not use only 3D renderings but combines both 2D+3D alignments. Images are processed with FPN to get 6DoF, then we render images in 3D following [A,B,C]. Note that in the rendering, if faces are far from frontal, we avoid frontalize them. So basically we augmented with new 3D views only near to the input pose.
    Moreover, using the estimated pose by FPN we also compute a 2D similarity transformation for recognition to align the images in 2D. A note: the reference point for the 2D similarity transform are different for frontal faces and profile faces. Frontal faces are aligned with canonical points on eyes, nose, mouth; for profile face, we use the visible eye and the tip of the nose.
    When all the images are aligned (2D+3D) with FPN as mentioned above, they are fed into the recognition network [C] and their features are pooled with average and with some other tricks (PCA + Power Normalization ) into a single compact descriptor. Experimentally we observed that most of the recognition power is in the 2D images but 3D views are also improving results when performing the descriptor pooling. For more info on this check [C].

I hope this helps.

[3] https://github.com/TadasBaltrusaitis/OpenFace
[A] https://github.com/iacopomasi/face_specific_augm ?
[B] Masi et al. "Do We Really Need to Collect Million of Faces for Effective Face Recognition? ", ECCV16
[C] Masi et al. "Rapid synthesis of massive face sets for improved face recognition." FG2017

from face-pose-net.

melgor avatar melgor commented on July 17, 2024

Thanks for the answer, it will help me to understand all the process for sure!

from face-pose-net.

twmht avatar twmht commented on July 17, 2024

@iacopomasi

one more question about the face recognition, as you mentioned

Note that in the rendering, if faces are far from frontal, we avoid frontalize them. So basically we augmented with new 3D views only near to the input pose

so the result of the 3D alignment is only the frontalized face, not the multi-views of the input ? If the input is not near frontal, do you skip 3D alignment?

moreover, after the 3D alignment, how do you get the five 2D landmarks?

By the way, In most of the face recognition papers, they only do 2D alignment. But in your paper, I didn't see the performance gain when doing additional 3D alignment.

from face-pose-net.

iacopomasi avatar iacopomasi commented on July 17, 2024

so the result of the 3D alignment is only the frontalized face, not the multi-views of the input ?

It is both; that is multiple-views;

if the input is not near frontal, do you skip 3D alignment?

no, we don't. If the input is near-profile, we do not frontalize

moreover, after the 3D alignment, how do you get the five 2D landmarks?

by definition, if you do 3D alignment, the faces are already aligned to a 3D shape, up to any possible alignment errors (so the landmarks should be always in the same coordinate system for each rendered view)

By the way, In most of the face recognition papers, they only do 2D alignment. But in your paper, I didn't see the performance gain when doing additional 3D alignment.

yes, this is true and it is a trade-off of the methods. In our case, we got improvement by feeding 2D aligned images and 3D rendered images and averaging the feature vectors after. We further got a boost in results by applying PCA and signed square rooting usually used in Fisher-Vector representation.

from face-pose-net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.