Dear Vítor Albiero, Thanks for your helpful comments in the previous

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hello <a class="user-mention notranslate" data-hovercard-type="user" data

Questions related to the prediction values and rendered results about img2pose HOT 6 CLOSED

vitoralbiero commented on May 25, 2024

Questions related to the prediction values and rendered results

from img2pose.

Comments (6)

vitoralbiero commented on May 25, 2024

Hello Sungjun Ethan Yoon,

Thank you again for your interest in our work!

Yes, our model predicts rotation vectors (not Euler angles) and translation vectors in h_i^img. We are going to release a new version of the paper to reflect this correction.
We do not convert to Euler angles inside the img2pose model, as the rotation vector is only converted to Euler angles for validation on AFLW2000-3D and BIWI.
The conversion from rotation vector to Euler angles is on both notebooks and the comment you mentioned [1].
Note that Euler angles suffer from a drawback, where the yaw is limited to (-90, 90), thus apart from the validation on these two datasets, we prefer to use rotation vectors in our pipeline instead.
You are right. Our model has better predictions than many of GT data as you pointed out in the example [3]. We attribute this to the generalization capabilities of deep networks, where even with noisy labels, the model still improves over the GT data.
- Yes, the lmdb files contain the global pose (h_i^img), but also contains the local pose (h_i^prop). Because of augmentations, during training (data_loader_augmenter.py), we use the GT landmarks and bboxes to recalculate the GT global poses. And during validation (data_loader_lmdb.py), we use the GT local pose to obtain the GT global pose. So, both data loaders output poses relative to the entire image (h_i^img), and you are correctly obtaining the GT pose in [3b].
- What determines the size of the face is the t_z component of the translation vector. You can easily test how this affects the face size by changing this value (pose_pred[5]). If you decrease the t_z component, you will see that the face gets larger, as it is now closer to the camera.

I hope this helps clear your questions.

from img2pose.

lucaskyle commented on May 25, 2024

nice work nice paper!

here is my question.
I check the local Pose to global Pose processing.
seems like u guys didn't consider any distortion in widerface dataset.
actually, they don't provide any information about that.

I testified a lot.
local_pose_to_global_pose with considering distortion and without these kinds of information.
the results were quite different.

if the face position in the image and camera intrinsic and distortion info affect a lot ( that means without this info, hardly u can make correct GT annotation), how can we compare with the wiki_test_dataset.

from img2pose.

vitoralbiero commented on May 25, 2024

Hello @lucaskyle, thanks for your interest in our work!

We don't take into consideration any type of distortion when converting the poses.
In our tests, we were able to achieve reliable GT without adding camera distortion, except for some outliers.
Could you provide examples where you said the local_pose_to_global_pose worked differently depending on the distortion?
Also, what dataset are you referring to as wiki_test_dataset?

Thanks!

from img2pose.

lucaskyle commented on May 25, 2024

Hello @lucaskyle, thanks for your interest in our work!

We don't take into consideration any type of distortion when converting the poses.
In our tests, we were able to achieve reliable GT without adding camera distortion, except for some outliers.
Could you provide examples where you said the local_pose_to_global_pose worked differently depending on the distortion?
Also, what dataset are you referring to as wiki_test_dataset?

Thanks!

I understand.
when doing solvepnp method, there is a camera distortion matrix as an optional input.
unless the input images are undistorted perfectly, u don't have to worry about that.
but when u use wider face data to train headpose, I don't see any process to undistort image.

training:
face landmarks--->sovlepnp(should considering distortion)--->getHP_local--->GTHP_global.
testing:
model results--->HP_global--->get HP_local(should considering GTdistortion)----> vs GTHP_local(coming from landmarks)

widerface doesnt offer any camera distortion, so we cant get very correct HP_local.
also, biwi just cropped every face image from the big image(i guess), also we dont know the camera information.

I think neither training data and testing data were quite not reliable without considering distortion.
because they are not coming from the same camera.

from img2pose.

vitoralbiero commented on May 25, 2024

I understand the pipeline you are suggesting, but unfortunately, we do not have camera distortion information to do that.
However, I think that without this info, our annotations are reliable enough that when we tested on AFL2000-3D and BIWI, we get SoTA predictions.

When I asked if you have an example where distortion is affecting the GT, I meant a visual example, where you were able to add the camera distortion parameters and get a better GT.

I think when you say wiki dataset you mean BIWI dataset. For BIWI and AFLW2000-3D, we use the provided GT Euler angles for rotation comparison, which other papers also do. For AFLW2000-3D, we use the provided landmarks to get the GT translation, where most other papers do not predict translation and do not have this comparison.

from img2pose.

lucaskyle commented on May 25, 2024

thank you for your explaination.
I understood.

from img2pose.

Questions related to the prediction values and rendered results about img2pose HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent