oreochocolate / muren Goto Github PK
View Code? Open in Web Editor NEWThe official code for Relational Context Learning for Human-Object Interaction Detection, CVPR2023.
Home Page: http://cvlab.postech.ac.kr/research/MUREN/
The official code for Relational Context Learning for Human-Object Interaction Detection, CVPR2023.
Home Page: http://cvlab.postech.ac.kr/research/MUREN/
Exciting work!Could you tell how many epochs were trained? I would like to know the convergence.Thinks!
Why can’t I achieve results by training locally?
Is the final model the current model on github?
Please ask muren.py multiplex_context = self. MURE(output_human,output_obj,output_rel,(memory,tgt_mask,memory_mask,tgt_key_padding_mask,memory_key_padding_mask,pos)), tgt_mask,memory_mask,tgt_key_padding_mask, Where are the memory_key_padding_mask respectively
Thanks for your nice work. Could you please provide the information on when to release the source code?
there is a bug
origin_sub_box = target['orig_boxes'][kept_box_indices.index(hoi['subject_id'])]
obj_box = target['boxes'][kept_box_indices.index(hoi['object_id'])]
origin_obj_box = target['orig_boxes'][kept_box_indices.index(hoi['object_id'])]
Hello, thanks for sharing your code.
I have a question.
Not only the code you wrote, but other codes provide only generate_vcoco_official.py, and no file for visualization is provided.
Can you share some tips on how you did the inference?
Thank you!
Hi,
I wanted to use your code to train on a custom dataset. What is the expected bbox format ? And should it be normalized before loading the data for training?
On V-coco, I trained using the commands in the repository and only achieved a result of 64.1. When I tested with the 'eval' command, the MAP was only 65.9.
---------Reporting Role AP (%)------------------
hold-obj: AP = 58.44 (#pos = 3608)
sit-instr: AP = 59.50 (#pos = 1916)
ride-instr: AP = 73.68 (#pos = 556)
look-obj: AP = 48.34 (#pos = 3347)
hit-instr: AP = 80.03 (#pos = 349)
hit-obj: AP = 69.51 (#pos = 349)
eat-obj: AP = 71.48 (#pos = 521)
eat-instr: AP = 76.79 (#pos = 521)
jump-instr: AP = 77.71 (#pos = 635)
lay-instr: AP = 58.62 (#pos = 387)
talk_on_phone-instr: AP = 56.47 (#pos = 285)
carry-obj: AP = 48.91 (#pos = 472)
throw-obj: AP = 57.31 (#pos = 244)
catch-obj: AP = 57.66 (#pos = 246)
cut-instr: AP = 50.66 (#pos = 269)
cut-obj: AP = 65.60 (#pos = 269)
work_on_computer-instr: AP = 77.11 (#pos = 410)
ski-instr: AP = 56.09 (#pos = 424)
surf-instr: AP = 80.34 (#pos = 486)
skateboard-instr: AP = 88.40 (#pos = 417)
drink-instr: AP = 59.21 (#pos = 82)
kick-obj: AP = 79.46 (#pos = 180)
point-instr: AP = 8.20 (#pos = 31)
read-obj: AP = 51.02 (#pos = 111)
snowboard-instr: AP = 80.16 (#pos = 277)
Average Role [scenario_1] AP = 63.63
Average Role [scenario_1] AP = 65.94, omitting the action "point"
---------Reporting Role AP (%)------------------
hold-obj: AP = 61.83 (#pos = 3608)
sit-instr: AP = 62.22 (#pos = 1916)
ride-instr: AP = 74.57 (#pos = 556)
look-obj: AP = 53.29 (#pos = 3347)
hit-instr: AP = 81.17 (#pos = 349)
hit-obj: AP = 71.86 (#pos = 349)
eat-obj: AP = 75.43 (#pos = 521)
eat-instr: AP = 77.01 (#pos = 521)
jump-instr: AP = 78.17 (#pos = 635)
lay-instr: AP = 61.32 (#pos = 387)
talk_on_phone-instr: AP = 58.56 (#pos = 285)
carry-obj: AP = 50.48 (#pos = 472)
throw-obj: AP = 59.77 (#pos = 244)
catch-obj: AP = 62.53 (#pos = 246)
cut-instr: AP = 51.62 (#pos = 269)
cut-obj: AP = 67.81 (#pos = 269)
work_on_computer-instr: AP = 78.73 (#pos = 410)
ski-instr: AP = 61.23 (#pos = 424)
surf-instr: AP = 80.91 (#pos = 486)
skateboard-instr: AP = 88.89 (#pos = 417)
drink-instr: AP = 59.94 (#pos = 82)
kick-obj: AP = 83.20 (#pos = 180)
point-instr: AP = 8.24 (#pos = 31)
read-obj: AP = 56.72 (#pos = 111)
snowboard-instr: AP = 81.60 (#pos = 277)
Average Role [scenario_2] AP = 65.88
Average Role [scenario_2] AP = 68.29, omitting the action "point"
Is my understanding of the metrics incorrect? Thank you very much for the reply.
Thank you very much for your work. According to your model diagram, Human loss corresponds to the yellow box and object loss corresponds to the red box. Then how is this interaction loss designed? I see that you use Focal loss for interaction classification in your paper. I would like to know where the code of this Focal loss part is implemented in github. thank you.
Could you provide a script for testing new images?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.