Git Product home page Git Product logo

Comments (9)

tgc1997 avatar tgc1997 commented on July 19, 2024

Hi takuyara, we extracted our features by I3D, InceptionResNetV2 , and BUTD

from rmn.

takuyara avatar takuyara commented on July 19, 2024

Thanks for your quick reply!

from rmn.

PipiZong avatar PipiZong commented on July 19, 2024

Hi takuyara, we extracted our features by I3D, InceptionResNetV2 , and BUTD

Hi tgc,

I learned the I3D code you provided above. I am wondering how to set max_interval and overlap to get equally-spaced 26 features for each video? Or it's no need to set these two parameters, and we just need to extract around 209 frames as the input of the i3d model?

from rmn.

tgc1997 avatar tgc1997 commented on July 19, 2024

Hi takuyara, we extracted our features by I3D, InceptionResNetV2 , and BUTD

Hi tgc,

I learned the I3D code you provided above. I am wondering how to set max_interval and overlap to get equally-spaced 26 features for each video? Or it's no need to set these two parameters, and we just need to extract around 209 frames as the input of the i3d model?

We first set max_interval=64, overlap=8 to extract features and then sample 26 of them.

from rmn.

PipiZong avatar PipiZong commented on July 19, 2024

e first set max_interval=64, overlap=8 to extract features and then sa

Hi tgc,

Thanks for your reply! Sorry to bother you again, I still have 2 questions about this,

  1. why do you need to set max_interval and overlap? If you just input 209 frames as the 'clip', you can exactly get 1x26x1024 as the feature by 'features = get_features(clip, i3d_rgb)'. So these two parameters are used for saving the computation cost? If not, how to determine them specifically?
  2. For each video, 2D features (irv2, 1x26x1536) are extracted from 26 images (frames). And i3d features are extracted from 26 segments, which is not the same as the 26 frames. Is it good to concatenate these two features in this dimension (26)? For example, 1x26x2560 cannot be represented as " one video has 26 frames, each frame has 2560 features".

from rmn.

tgc1997 avatar tgc1997 commented on July 19, 2024

Hi tgc,

Thanks for your reply! Sorry to bother you again, I still have 2 questions about this,

  1. why do you need to set max_interval and overlap? If you just input 209 frames as the 'clip', you can exactly get 1x26x1024 as the feature by 'features = get_features(clip, i3d_rgb)'. So these two parameters are used for saving the computation cost? If not, how to determine them specifically?
  2. For each video, 2D features (irv2, 1x26x1536) are extracted from 26 images (frames). And i3d features are extracted from 26 segments, which is not the same as the 26 frames. Is it good to concatenate these two features in this dimension (26)? For example, 1x26x2560 cannot be represented as " one video has 26 frames, each frame has 2560 features".
  1. Maybe the frames of some videos are less than 209, and 209 frames may miss important information for those long videos with much more than 209 frames. These two parameters are used for saving the computation cost.
  2. It is difficult to perfectly align 2d and 3d features, if you know how to do it, you are welcome to comment.

from rmn.

PipiZong avatar PipiZong commented on July 19, 2024

Hi tgc,
Thanks for your reply! Sorry to bother you again, I still have 2 questions about this,

  1. why do you need to set max_interval and overlap? If you just input 209 frames as the 'clip', you can exactly get 1x26x1024 as the feature by 'features = get_features(clip, i3d_rgb)'. So these two parameters are used for saving the computation cost? If not, how to determine them specifically?
  2. For each video, 2D features (irv2, 1x26x1536) are extracted from 26 images (frames). And i3d features are extracted from 26 segments, which is not the same as the 26 frames. Is it good to concatenate these two features in this dimension (26)? For example, 1x26x2560 cannot be represented as " one video has 26 frames, each frame has 2560 features".
  1. Maybe the frames of some videos are less than 209, and 209 frames may miss important information for those long videos with much more than 209 frames. These two parameters are used for saving the computation cost.
  2. It is difficult to perfectly align 2d and 3d features, if you know how to do it, you are welcome to comment.

Thanks for your explanations!

from rmn.

PipiZong avatar PipiZong commented on July 19, 2024

Hi tgc,
Thanks for your reply! Sorry to bother you again, I still have 2 questions about this,

  1. why do you need to set max_interval and overlap? If you just input 209 frames as the 'clip', you can exactly get 1x26x1024 as the feature by 'features = get_features(clip, i3d_rgb)'. So these two parameters are used for saving the computation cost? If not, how to determine them specifically?
  2. For each video, 2D features (irv2, 1x26x1536) are extracted from 26 images (frames). And i3d features are extracted from 26 segments, which is not the same as the 26 frames. Is it good to concatenate these two features in this dimension (26)? For example, 1x26x2560 cannot be represented as " one video has 26 frames, each frame has 2560 features".
  1. Maybe the frames of some videos are less than 209, and 209 frames may miss important information for those long videos with much more than 209 frames. These two parameters are used for saving the computation cost.
  2. It is difficult to perfectly align 2d and 3d features, if you know how to do it, you are welcome to comment.

Sorry to bother you again! I am still confused about the steps to determine the 'max_interval' and 'overlap'. Could you please give an example, like when we have 2 videos, and one is 10-min long at 25 FPS while another is 8-min long at 30 FPS? Many thanks!

from rmn.

monic7 avatar monic7 commented on July 19, 2024

Hi, I have tried to extract features using I3D, IRV2 and BUTD like you mentioned but I am not able to get the same features as you. The features that I obtained seem to be very different from the provided h5 file...

How to get the same features as produced in the h5 file?

How were the equally spaced frames selected? Is it by the following method: index = [int(ceil(i*len(l)/26)) for i in range(26)]
Are the equally spaced frames only needed for irv2 and butd, while i3d takes whole video as input to generate 43 x 1024 for msvd videos then equally spaced by the same method as above?

May I know what other steps are required during extraction?
Thank you very much!

from rmn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.