Comments (6)
Hello Sungjun Ethan Yoon,
Thank you again for your interest in our work!
-
Yes, our model predicts rotation vectors (not Euler angles) and translation vectors in hiimg. We are going to release a new version of the paper to reflect this correction.
We do not convert to Euler angles inside the img2pose model, as the rotation vector is only converted to Euler angles for validation on AFLW2000-3D and BIWI.
The conversion from rotation vector to Euler angles is on both notebooks and the comment you mentioned [1].
Note that Euler angles suffer from a drawback, where the yaw is limited to (-90, 90), thus apart from the validation on these two datasets, we prefer to use rotation vectors in our pipeline instead. -
You are right. Our model has better predictions than many of GT data as you pointed out in the example [3]. We attribute this to the generalization capabilities of deep networks, where even with noisy labels, the model still improves over the GT data.
- Yes, the lmdb files contain the global pose (hiimg), but also contains the local pose (hiprop). Because of augmentations, during training (data_loader_augmenter.py), we use the GT landmarks and bboxes to recalculate the GT global poses. And during validation (data_loader_lmdb.py), we use the GT local pose to obtain the GT global pose. So, both data loaders output poses relative to the entire image (hiimg), and you are correctly obtaining the GT pose in [3b].
- What determines the size of the face is the tz component of the translation vector. You can easily test how this affects the face size by changing this value (pose_pred[5]). If you decrease the tz component, you will see that the face gets larger, as it is now closer to the camera.
I hope this helps clear your questions.
from img2pose.
nice work nice paper!
here is my question.
I check the local Pose to global Pose processing.
seems like u guys didn't consider any distortion in widerface dataset.
actually, they don't provide any information about that.
I testified a lot.
local_pose_to_global_pose with considering distortion and without these kinds of information.
the results were quite different.
if the face position in the image and camera intrinsic and distortion info affect a lot ( that means without this info, hardly u can make correct GT annotation), how can we compare with the wiki_test_dataset.
from img2pose.
Hello @lucaskyle, thanks for your interest in our work!
We don't take into consideration any type of distortion when converting the poses.
In our tests, we were able to achieve reliable GT without adding camera distortion, except for some outliers.
Could you provide examples where you said the local_pose_to_global_pose
worked differently depending on the distortion?
Also, what dataset are you referring to as wiki_test_dataset?
Thanks!
from img2pose.
Hello @lucaskyle, thanks for your interest in our work!
We don't take into consideration any type of distortion when converting the poses.
In our tests, we were able to achieve reliable GT without adding camera distortion, except for some outliers.
Could you provide examples where you said thelocal_pose_to_global_pose
worked differently depending on the distortion?
Also, what dataset are you referring to as wiki_test_dataset?Thanks!
I understand.
when doing solvepnp method, there is a camera distortion matrix as an optional input.
unless the input images are undistorted perfectly, u don't have to worry about that.
but when u use wider face data to train headpose, I don't see any process to undistort image.
training:
face landmarks--->sovlepnp(should considering distortion)--->getHP_local--->GTHP_global.
testing:
model results--->HP_global--->get HP_local(should considering GTdistortion)----> vs GTHP_local(coming from landmarks)
widerface doesnt offer any camera distortion, so we cant get very correct HP_local.
also, biwi just cropped every face image from the big image(i guess), also we dont know the camera information.
I think neither training data and testing data were quite not reliable without considering distortion.
because they are not coming from the same camera.
from img2pose.
I understand the pipeline you are suggesting, but unfortunately, we do not have camera distortion information to do that.
However, I think that without this info, our annotations are reliable enough that when we tested on AFL2000-3D and BIWI, we get SoTA predictions.
When I asked if you have an example where distortion is affecting the GT, I meant a visual example, where you were able to add the camera distortion parameters and get a better GT.
I think when you say wiki dataset you mean BIWI dataset. For BIWI and AFLW2000-3D, we use the provided GT Euler angles for rotation comparison, which other papers also do. For AFLW2000-3D, we use the provided landmarks to get the GT translation, where most other papers do not predict translation and do not have this comparison.
from img2pose.
thank you for your explaination.
I understood.
from img2pose.
Related Issues (20)
- Question about 300W-LP labels acquirements HOT 6
- Pose to angle HOT 1
- jaw data HOT 1
- Hi, I'm confused about the definition of the output pose HOT 1
- Slow inference HOT 2
- Question about Visualizing the Activation Map on each layer of the model HOT 1
- Question on fine-tuning HOT 1
- Question about Fine tuning Model with 300W-LP HOT 1
- Bug in readme file HOT 1
- img2pose_v1.pth convert onnx? HOT 1
- A Question about the conversion of t .
- A question about K_box and K_img. HOT 1
- A Question about fine-tuning HOT 2
- TypeError: a bytes-like object is required, not 'NoneType' HOT 4
- A Question about the 6DoF HOT 1
- A Question about the convert_to_aflw
- ONNX output is giving incorrect DOF values HOT 1
- How to get the tvec=[tx,ty,tz]?
- Utilizing GT intrinsics
- AFLW2000-3D translation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from img2pose.