Ok so thanks for all etc but if I want to infer your advanced model on something diffe

Inference of the multi-view model - guidance required about mannequinchallenge HOT 6 OPEN

google commented on July 27, 2024 1

Inference of the multi-view model - guidance required

from mannequinchallenge.

Comments (6)

fcole commented on July 27, 2024

Hi, yes, the code could be more easy to understand, sorry about that. In order to run the full model, you basically need to fill in the dictionary of values specified here:

mannequinchallenge/loaders/image_folder.py

Line 180 in 3448d9d

return {'img': img,

these correspond to the various buffers mentioned in loss definitions in the paper. You don't need to create your own HDF5s etc. as long as you can create a dictionary including those buffers. Hope that helps.

from mannequinchallenge.

rmbashirov commented on July 27, 2024

How to I get gt_depth, human_mask, flow, keypoints_img in your format on my own data so that I can inference your full model?

from mannequinchallenge.

rmbashirov commented on July 27, 2024

Ok, I realised that providing full pipeline for inferring full model on any data is almost impossible for you.

Can you provide inference result of your full model for MC dataset?

from mannequinchallenge.

fcole commented on July 27, 2024

Unfortunately, we don't have permission to share image-like-results (e.g., depth buffers) from the MC dataset. Sorry about that.

For inference, you shouldn't need the gt_depth, and the model with keypoints input performs only marginally better than the model without, so the only things you really need are flow and the human mask.

from mannequinchallenge.

Tetsujinfr commented on July 27, 2024

Thanks for your indications above. So I looked at the load_tum_hdf5 function. I have a few questions.

A) the code reads 11 objects:

- img_1: is it a simple 24bits RGB numpy matrix of the image we are trying to infer?

- gt_depth: I can ignore it since I just want to infer, but can I just comment this piece of code and all downstream reference to the treatment of this object or do I need to fake a dummy input?

- lr_error: what is that? Can I ignore the same way as gt_depth? It looks like it is used to comput the confidence map which seems a key input to your model no?

- human_mask: I assume this is a binary mask of the same size as img_1, right? What is the format expected, 0.0=transparent and 1.0 equals opaque, i.e. the mask shape? (Rgb black/white image I assume?)

- angle_prior: what is that? Is it the second image? It looks like it is used to comput the confidence map which seems a key input to your model no?

- pp_depth: what is that? It looks like it is used to comput the confidence map which seems a key input to your model no?

- flow: the output of FlowNet2 I assume but is it a 24bits rgb image or is it the raw flow data structure of the .flo object of FlowNet? Does it need to have the exact same height × width size as img_1 ?

- T_1_G: what is that? It looks like it is used to comput the confidence map which seems a key input to your model no?

- T_2_G: same as for T_1_G

- intrinsic: same as for T_1_G

- keypoints_img: can I just input a keypoint image from OpenPose for instance? Do the points need to be single pixels? Is there a particular colouring scheme for each point which needs to be followed or I can just pass the colouring of OpenPose?

Thanks a lot for your guidance on this.

from mannequinchallenge.

zhengqili commented on July 27, 2024

Hi, I am the first author of this paper.

img_1: should be RGB image between 0 and 1
lr_error: is the left-right consistency error corresponding to C_lr in Eq.5 of supplementary material: http://www.cs.cornell.edu/~zl548/images/mannequin_depth_cvpr2019_supp_doc.pdf
human_mask: is the binary mask, where 1 indicates human, 0 indicates background.
angle_prior: is C_pa in Eq.5 of supplementary material.
pp_depth: depth from motion parallax using P+P representation in Eq. 4 of supplementary material.
T_1_G: this is the 4X4 homogenous transformation matrix from global to reference image described in the paper.
T_2_G: this is the 4X4 homogenous transformation matrix from global to source image in the paper.
intrinsic: 3X3 intrinsic matrix
keypoints_img: You can use any keypoint detection algorithm you want but you have to normalize their index based on MaskRCNN. In particular, what you need to do is from https://github.com/roytseng-tw/Detectron.pytorch/blob/master/lib/utils/vis.py, you can see line 198-199:
i1 = kp_lines[l][0]
i2 = kp_lines[l][1],

you need to normalize it by using following code:
final_i1_value = (i1 + 1.0)/18.0
final_i2_value = (i2 + 1.0)/18.0

Please send me email ([email protected]) for more questions since I seldom reply in Github.

from mannequinchallenge.

Inference of the multi-view model - guidance required about mannequinchallenge HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent