oreochocolate / muren Goto Github PK

View Code? Open in Web Editor NEW

46.0 2.0 5.0 161 KB

The official code for Relational Context Learning for Human-Object Interaction Detection, CVPR2023.

Home Page: http://cvlab.postech.ac.kr/research/MUREN/

Python 100.00%

human-object-interaction cvpr2023

muren's People

Contributors

Stargazers

Watchers

Forkers

bingnang gwwangshuo lybllybl wolfworld6 hwijune

muren's Issues

About convergence

Exciting work！Could you tell how many epochs were trained? I would like to know the convergence.Thinks!

Why can’t I achieve results by training locally?

Why can’t I achieve results by training locally?
Is the final model the current model on github?

Please ask muren.py multiplex_context = self. MURE(output_human,output_obj,output_rel,(memory,tgt_mask,memory_mask,tgt_key_padding_mask,memory_key_padding_mask,pos)), tgt_mask,memory_mask,tgt_key_padding_mask, Where are the memory_key_padding_mask respectively

When to release the source code?

Thanks for your nice work. Could you please provide the information on when to release the source code?

orig_boxes?

there is a bug
origin_sub_box = target['orig_boxes'][kept_box_indices.index(hoi['subject_id'])]
obj_box = target['boxes'][kept_box_indices.index(hoi['object_id'])]
origin_obj_box = target['orig_boxes'][kept_box_indices.index(hoi['object_id'])]

How to visualize for model result

Hello, thanks for sharing your code.

I have a question.

Not only the code you wrote, but other codes provide only generate_vcoco_official.py, and no file for visualization is provided.

Can you share some tips on how you did the inference?

Thank you!

bbox formats

Hi,

I wanted to use your code to train on a custom dataset. What is the expected bbox format ? And should it be normalized before loading the data for training?

Why Can't I Reproduce This Results?

On V-coco, I trained using the commands in the repository and only achieved a result of 64.1. When I tested with the 'eval' command, the MAP was only 65.9.

---------Reporting Role AP (%)------------------
hold-obj: AP = 58.44 (#pos = 3608)
sit-instr: AP = 59.50 (#pos = 1916)
ride-instr: AP = 73.68 (#pos = 556)
look-obj: AP = 48.34 (#pos = 3347)
hit-instr: AP = 80.03 (#pos = 349)
hit-obj: AP = 69.51 (#pos = 349)
eat-obj: AP = 71.48 (#pos = 521)
eat-instr: AP = 76.79 (#pos = 521)
jump-instr: AP = 77.71 (#pos = 635)
lay-instr: AP = 58.62 (#pos = 387)
talk_on_phone-instr: AP = 56.47 (#pos = 285)
carry-obj: AP = 48.91 (#pos = 472)
throw-obj: AP = 57.31 (#pos = 244)
catch-obj: AP = 57.66 (#pos = 246)
cut-instr: AP = 50.66 (#pos = 269)
cut-obj: AP = 65.60 (#pos = 269)
work_on_computer-instr: AP = 77.11 (#pos = 410)
ski-instr: AP = 56.09 (#pos = 424)
surf-instr: AP = 80.34 (#pos = 486)
skateboard-instr: AP = 88.40 (#pos = 417)
drink-instr: AP = 59.21 (#pos = 82)
kick-obj: AP = 79.46 (#pos = 180)
point-instr: AP = 8.20 (#pos = 31)
read-obj: AP = 51.02 (#pos = 111)
snowboard-instr: AP = 80.16 (#pos = 277)
Average Role [scenario_1] AP = 63.63
Average Role [scenario_1] AP = 65.94, omitting the action "point"

---------Reporting Role AP (%)------------------
hold-obj: AP = 61.83 (#pos = 3608)
sit-instr: AP = 62.22 (#pos = 1916)
ride-instr: AP = 74.57 (#pos = 556)
look-obj: AP = 53.29 (#pos = 3347)
hit-instr: AP = 81.17 (#pos = 349)
hit-obj: AP = 71.86 (#pos = 349)
eat-obj: AP = 75.43 (#pos = 521)
eat-instr: AP = 77.01 (#pos = 521)
jump-instr: AP = 78.17 (#pos = 635)
lay-instr: AP = 61.32 (#pos = 387)
talk_on_phone-instr: AP = 58.56 (#pos = 285)
carry-obj: AP = 50.48 (#pos = 472)
throw-obj: AP = 59.77 (#pos = 244)
catch-obj: AP = 62.53 (#pos = 246)
cut-instr: AP = 51.62 (#pos = 269)
cut-obj: AP = 67.81 (#pos = 269)
work_on_computer-instr: AP = 78.73 (#pos = 410)
ski-instr: AP = 61.23 (#pos = 424)
surf-instr: AP = 80.91 (#pos = 486)
skateboard-instr: AP = 88.89 (#pos = 417)
drink-instr: AP = 59.94 (#pos = 82)
kick-obj: AP = 83.20 (#pos = 180)
point-instr: AP = 8.24 (#pos = 31)
read-obj: AP = 56.72 (#pos = 111)
snowboard-instr: AP = 81.60 (#pos = 277)
Average Role [scenario_2] AP = 65.88
Average Role [scenario_2] AP = 68.29, omitting the action "point"

Is my understanding of the metrics incorrect? Thank you very much for the reply.

Focal loss for interaction classification

Thank you very much for your work. According to your model diagram, Human loss corresponds to the yellow box and object loss corresponds to the red box. Then how is this interaction loss designed? I see that you use Focal loss for interaction classification in your paper. I would like to know where the code of this Focal loss part is implemented in github. thank you.

How to infer an arbitrary image?

Could you provide a script for testing new images?

category_id

Hi,

Why are there two sets of category ids as shown in the red and blue boxes? From where can I get what this category id maps to?

Thank You.

oreochocolate / muren Goto Github PK

muren's People

Contributors

Stargazers

Watchers

Forkers

muren's Issues

About convergence

Why can’t I achieve results by training locally?

code issues

When to release the source code?

orig_boxes?

How to visualize for model result

bbox formats

Why Can't I Reproduce This Results?

Focal loss for interaction classification

How to infer an arbitrary image?

category_id

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent