Git Product home page Git Product logo

muren's People

Contributors

oreochocolate avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

muren's Issues

About convergence

Exciting work!Could you tell how many epochs were trained? I would like to know the convergence.Thinks!

code issues

Please ask muren.py multiplex_context = self. MURE(output_human,output_obj,output_rel,(memory,tgt_mask,memory_mask,tgt_key_padding_mask,memory_key_padding_mask,pos)), tgt_mask,memory_mask,tgt_key_padding_mask, Where are the memory_key_padding_mask respectively

orig_boxes?

there is a bug
origin_sub_box = target['orig_boxes'][kept_box_indices.index(hoi['subject_id'])]
obj_box = target['boxes'][kept_box_indices.index(hoi['object_id'])]
origin_obj_box = target['orig_boxes'][kept_box_indices.index(hoi['object_id'])]

How to visualize for model result

Hello, thanks for sharing your code.

I have a question.

Not only the code you wrote, but other codes provide only generate_vcoco_official.py, and no file for visualization is provided.

Can you share some tips on how you did the inference?

Thank you!

bbox formats

Hi,

I wanted to use your code to train on a custom dataset. What is the expected bbox format ? And should it be normalized before loading the data for training?

Why Can't I Reproduce This Results?

On V-coco, I trained using the commands in the repository and only achieved a result of 64.1. When I tested with the 'eval' command, the MAP was only 65.9.

---------Reporting Role AP (%)------------------
hold-obj: AP = 58.44 (#pos = 3608)
sit-instr: AP = 59.50 (#pos = 1916)
ride-instr: AP = 73.68 (#pos = 556)
look-obj: AP = 48.34 (#pos = 3347)
hit-instr: AP = 80.03 (#pos = 349)
hit-obj: AP = 69.51 (#pos = 349)
eat-obj: AP = 71.48 (#pos = 521)
eat-instr: AP = 76.79 (#pos = 521)
jump-instr: AP = 77.71 (#pos = 635)
lay-instr: AP = 58.62 (#pos = 387)
talk_on_phone-instr: AP = 56.47 (#pos = 285)
carry-obj: AP = 48.91 (#pos = 472)
throw-obj: AP = 57.31 (#pos = 244)
catch-obj: AP = 57.66 (#pos = 246)
cut-instr: AP = 50.66 (#pos = 269)
cut-obj: AP = 65.60 (#pos = 269)
work_on_computer-instr: AP = 77.11 (#pos = 410)
ski-instr: AP = 56.09 (#pos = 424)
surf-instr: AP = 80.34 (#pos = 486)
skateboard-instr: AP = 88.40 (#pos = 417)
drink-instr: AP = 59.21 (#pos = 82)
kick-obj: AP = 79.46 (#pos = 180)
point-instr: AP = 8.20 (#pos = 31)
read-obj: AP = 51.02 (#pos = 111)
snowboard-instr: AP = 80.16 (#pos = 277)
Average Role [scenario_1] AP = 63.63
Average Role [scenario_1] AP = 65.94, omitting the action "point"

---------Reporting Role AP (%)------------------
hold-obj: AP = 61.83 (#pos = 3608)
sit-instr: AP = 62.22 (#pos = 1916)
ride-instr: AP = 74.57 (#pos = 556)
look-obj: AP = 53.29 (#pos = 3347)
hit-instr: AP = 81.17 (#pos = 349)
hit-obj: AP = 71.86 (#pos = 349)
eat-obj: AP = 75.43 (#pos = 521)
eat-instr: AP = 77.01 (#pos = 521)
jump-instr: AP = 78.17 (#pos = 635)
lay-instr: AP = 61.32 (#pos = 387)
talk_on_phone-instr: AP = 58.56 (#pos = 285)
carry-obj: AP = 50.48 (#pos = 472)
throw-obj: AP = 59.77 (#pos = 244)
catch-obj: AP = 62.53 (#pos = 246)
cut-instr: AP = 51.62 (#pos = 269)
cut-obj: AP = 67.81 (#pos = 269)
work_on_computer-instr: AP = 78.73 (#pos = 410)
ski-instr: AP = 61.23 (#pos = 424)
surf-instr: AP = 80.91 (#pos = 486)
skateboard-instr: AP = 88.89 (#pos = 417)
drink-instr: AP = 59.94 (#pos = 82)
kick-obj: AP = 83.20 (#pos = 180)
point-instr: AP = 8.24 (#pos = 31)
read-obj: AP = 56.72 (#pos = 111)
snowboard-instr: AP = 81.60 (#pos = 277)
Average Role [scenario_2] AP = 65.88
Average Role [scenario_2] AP = 68.29, omitting the action "point"

Is my understanding of the metrics incorrect? Thank you very much for the reply.

Focal loss for interaction classification

image
Thank you very much for your work. According to your model diagram, Human loss corresponds to the yellow box and object loss corresponds to the red box. Then how is this interaction loss designed? I see that you use Focal loss for interaction classification in your paper. I would like to know where the code of this Focal loss part is implemented in github. thank you.

category_id

Hi,

Why are there two sets of category ids as shown in the red and blue boxes? From where can I get what this category id maps to?

image

Thank You.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.