Git Product home page Git Product logo

Comments (6)

vitoralbiero avatar vitoralbiero commented on May 25, 2024

Hello Sungjun Ethan Yoon,

Thank you again for your interest in our work!

  1. Yes, our model predicts rotation vectors (not Euler angles) and translation vectors in hiimg. We are going to release a new version of the paper to reflect this correction.
    We do not convert to Euler angles inside the img2pose model, as the rotation vector is only converted to Euler angles for validation on AFLW2000-3D and BIWI.
    The conversion from rotation vector to Euler angles is on both notebooks and the comment you mentioned [1].
    Note that Euler angles suffer from a drawback, where the yaw is limited to (-90, 90), thus apart from the validation on these two datasets, we prefer to use rotation vectors in our pipeline instead.

  2. You are right. Our model has better predictions than many of GT data as you pointed out in the example [3]. We attribute this to the generalization capabilities of deep networks, where even with noisy labels, the model still improves over the GT data.

    • Yes, the lmdb files contain the global pose (hiimg), but also contains the local pose (hiprop). Because of augmentations, during training (data_loader_augmenter.py), we use the GT landmarks and bboxes to recalculate the GT global poses. And during validation (data_loader_lmdb.py), we use the GT local pose to obtain the GT global pose. So, both data loaders output poses relative to the entire image (hiimg), and you are correctly obtaining the GT pose in [3b].
    • What determines the size of the face is the tz component of the translation vector. You can easily test how this affects the face size by changing this value (pose_pred[5]). If you decrease the tz component, you will see that the face gets larger, as it is now closer to the camera.

I hope this helps clear your questions.

from img2pose.

lucaskyle avatar lucaskyle commented on May 25, 2024

nice work nice paper!

here is my question.
I check the local Pose to global Pose processing.
seems like u guys didn't consider any distortion in widerface dataset.
actually, they don't provide any information about that.

I testified a lot.
local_pose_to_global_pose with considering distortion and without these kinds of information.
the results were quite different.

if the face position in the image and camera intrinsic and distortion info affect a lot ( that means without this info, hardly u can make correct GT annotation), how can we compare with the wiki_test_dataset.

from img2pose.

vitoralbiero avatar vitoralbiero commented on May 25, 2024

Hello @lucaskyle, thanks for your interest in our work!

We don't take into consideration any type of distortion when converting the poses.
In our tests, we were able to achieve reliable GT without adding camera distortion, except for some outliers.
Could you provide examples where you said the local_pose_to_global_pose worked differently depending on the distortion?
Also, what dataset are you referring to as wiki_test_dataset?

Thanks!

from img2pose.

lucaskyle avatar lucaskyle commented on May 25, 2024

Hello @lucaskyle, thanks for your interest in our work!

We don't take into consideration any type of distortion when converting the poses.
In our tests, we were able to achieve reliable GT without adding camera distortion, except for some outliers.
Could you provide examples where you said the local_pose_to_global_pose worked differently depending on the distortion?
Also, what dataset are you referring to as wiki_test_dataset?

Thanks!

I understand.
when doing solvepnp method, there is a camera distortion matrix as an optional input.
unless the input images are undistorted perfectly, u don't have to worry about that.
but when u use wider face data to train headpose, I don't see any process to undistort image.

training:
face landmarks--->sovlepnp(should considering distortion)--->getHP_local--->GTHP_global.
testing:
model results--->HP_global--->get HP_local(should considering GTdistortion)----> vs GTHP_local(coming from landmarks)

widerface doesnt offer any camera distortion, so we cant get very correct HP_local.
also, biwi just cropped every face image from the big image(i guess), also we dont know the camera information.

I think neither training data and testing data were quite not reliable without considering distortion.
because they are not coming from the same camera.

from img2pose.

vitoralbiero avatar vitoralbiero commented on May 25, 2024

I understand the pipeline you are suggesting, but unfortunately, we do not have camera distortion information to do that.
However, I think that without this info, our annotations are reliable enough that when we tested on AFL2000-3D and BIWI, we get SoTA predictions.

When I asked if you have an example where distortion is affecting the GT, I meant a visual example, where you were able to add the camera distortion parameters and get a better GT.

I think when you say wiki dataset you mean BIWI dataset. For BIWI and AFLW2000-3D, we use the provided GT Euler angles for rotation comparison, which other papers also do. For AFLW2000-3D, we use the provided landmarks to get the GT translation, where most other papers do not predict translation and do not have this comparison.

from img2pose.

lucaskyle avatar lucaskyle commented on May 25, 2024

thank you for your explaination.
I understood.

from img2pose.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.