Hello. Thank you for sharing the good code. I wanna test the model trained wit

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Real-time pose detection about densefusion HOT 6 CLOSED

j96w commented on August 25, 2024

Real-time pose detection

from densefusion.

Comments (6)

tangchuangqi commented on August 25, 2024

I also have the same concern. I think the cam_t_m2c and cam_R_m2c are not available during the evaluation.
The cam_t_m2c and cam_R_m2c are used to generate the new_target in the Loss_refiner. And I found the new_target is very close to the model_points after the transformation.
So I have an idea, it's maybe OK to use the model_points to replace the new_target. I'm going to verify the idea in the following days.

from densefusion.

ekdnltrla commented on August 25, 2024

@tangchuangqi
Thank you for answer.
I have a question about 'eval_linemod'.
The results of estimator and refiner are 'pred_r', 'pred_t', 'pred_c', 'idx'.
So I thought they are rotation(quaternion), translation, confidence, index.
But the result of example of linemod dataset after refinement is different with label of the data.
I'm really confused. Does it need some additional calculation for getting rotation matrix and translation matrix that we know?

from densefusion.

j96w commented on August 25, 2024

Hey guys, I think you misunderstand how we use cam_R_m2c and cam_t_m2c. Cam_R_m2c and cam_t_m2c are the ground truth pose we used to build the target (model points rotated by cam_R_m2c and translated by cam_t_m2c). During our evaluation, the target is only used to calculate the distance between our prediction and the target. For real-time testing, you don't need to have this target and distance because you can't have them. You should just use the pred_r, pred_t, pred_c outputted by the network and choose result with the max confidence (pred_r[argmax(pred_c)], pred_t[argmax(pred_c)]) as your pose estimation prediction. For the next refinement iteration, you only need to inversely apply your previous pose estimation result onto the input pointcloud (generated from the depth) and put it into the network to get the pose of the second iteration. After that, you should add the current result to your previous estimation (please follow the 'eval_ycb' to do this adding). And repeat this process until you finish the refinement. The whole evaluation process does not need cam_R_m2c, cam_t_m2c or the target.

@ekdnltrla for your question about 'eval_linemod':
The reason why the final result is different from the label of the data is because, instead of adding the residual pose on to the previous pose estimation, we inverse the predicted residual pose and use it to inversely change the target to calculate the distance. If you want to have the same number as the label of data, please follow 'eval_ycb', where you can see how to accumulate the residual poses to get a final pose output.

@tangchuangqi for your thinking about using the model_points to replace the new_target:
Sorry, you can't do that. The new_target is the target rotated by the inverse of your predicted rotation and translated by the inverse of your predicted translation. It is a new target for the next pose estimation iteration. The reason you think its very close to the model points is mainly because the initial pose estimation is very accurate, so that this inverse transformation turn the target back to somewhere close to the original model points, but it still is not the model points.

from densefusion.

j96w commented on August 25, 2024

I have cleaned the 'eval_linemod' to make it easier to understand and add a comment to show you where is the final pose output (same with the ground truth label of the dataset). Again, the 'target' (transformed by cam_R_m2c and cam_t_m2c) is not required during the pose estimation process and is only used to calculate the distance between prediction and ground truth.

from densefusion.

ekdnltrla commented on August 25, 2024

@j96w
Thank you for your kindness!
I could understand the code with your answer and get the result I desired.

And there's one thing I found out on training with LINEMOD dataset.
In the code "dataset.py", there is some processes that calculate "cloud" with "target_t" unlike YCB dataset.
Maybe because of this, result was so weird. Now, I removed the process and got desired test result.
If there is something I misunderstood, please tell me your opinion.

Thank you.

from densefusion.

j96w commented on August 25, 2024

Those processes are trying to convert the distance metric to meter (YCB doesn't need that since it's original metric is meter). Just keep in mind, if you changed the distance metric, you probably also need to adjust the hyperparameter w during training to reach the best performance.

from densefusion.

Real-time pose detection about densefusion HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent