Hi tgc, I'd like to test this model on my own video. How could I get the extracted fea

Hi takuyara, we extracted our features by <a href="https://github.com/eric-xw/kinetics

Hi takuyara, we extracted our features by <a href="https://github.com/eri

Hi takuyara, we extracted our features by <a href="https://g

How to get my own extracted-features? about rmn HOT 9 CLOSED

takuyara commented on July 19, 2024

How to get my own extracted-features?

from rmn.

Comments (9)

tgc1997 commented on July 19, 2024

Hi takuyara, we extracted our features by I3D, InceptionResNetV2 , and BUTD

from rmn.

takuyara commented on July 19, 2024

Thanks for your quick reply!

from rmn.

PipiZong commented on July 19, 2024

Hi takuyara, we extracted our features by I3D, InceptionResNetV2 , and BUTD

Hi tgc,

I learned the I3D code you provided above. I am wondering how to set max_interval and overlap to get equally-spaced 26 features for each video? Or it's no need to set these two parameters, and we just need to extract around 209 frames as the input of the i3d model?

from rmn.

tgc1997 commented on July 19, 2024

Hi takuyara, we extracted our features by I3D, InceptionResNetV2 , and BUTD

Hi tgc,

I learned the I3D code you provided above. I am wondering how to set max_interval and overlap to get equally-spaced 26 features for each video? Or it's no need to set these two parameters, and we just need to extract around 209 frames as the input of the i3d model?

We first set max_interval=64, overlap=8 to extract features and then sample 26 of them.

from rmn.

PipiZong commented on July 19, 2024

e first set max_interval=64, overlap=8 to extract features and then sa

Hi tgc,

Thanks for your reply! Sorry to bother you again, I still have 2 questions about this,

why do you need to set max_interval and overlap? If you just input 209 frames as the 'clip', you can exactly get 1x26x1024 as the feature by 'features = get_features(clip, i3d_rgb)'. So these two parameters are used for saving the computation cost? If not, how to determine them specifically?
For each video, 2D features (irv2, 1x26x1536) are extracted from 26 images (frames). And i3d features are extracted from 26 segments, which is not the same as the 26 frames. Is it good to concatenate these two features in this dimension (26)? For example, 1x26x2560 cannot be represented as " one video has 26 frames, each frame has 2560 features".

from rmn.

tgc1997 commented on July 19, 2024

Hi tgc,

Thanks for your reply! Sorry to bother you again, I still have 2 questions about this,

why do you need to set max_interval and overlap? If you just input 209 frames as the 'clip', you can exactly get 1x26x1024 as the feature by 'features = get_features(clip, i3d_rgb)'. So these two parameters are used for saving the computation cost? If not, how to determine them specifically?

For each video, 2D features (irv2, 1x26x1536) are extracted from 26 images (frames). And i3d features are extracted from 26 segments, which is not the same as the 26 frames. Is it good to concatenate these two features in this dimension (26)? For example, 1x26x2560 cannot be represented as " one video has 26 frames, each frame has 2560 features".

Maybe the frames of some videos are less than 209, and 209 frames may miss important information for those long videos with much more than 209 frames. These two parameters are used for saving the computation cost.
It is difficult to perfectly align 2d and 3d features, if you know how to do it, you are welcome to comment.

from rmn.

PipiZong commented on July 19, 2024

Hi tgc,
Thanks for your reply! Sorry to bother you again, I still have 2 questions about this,

why do you need to set max_interval and overlap? If you just input 209 frames as the 'clip', you can exactly get 1x26x1024 as the feature by 'features = get_features(clip, i3d_rgb)'. So these two parameters are used for saving the computation cost? If not, how to determine them specifically?

For each video, 2D features (irv2, 1x26x1536) are extracted from 26 images (frames). And i3d features are extracted from 26 segments, which is not the same as the 26 frames. Is it good to concatenate these two features in this dimension (26)? For example, 1x26x2560 cannot be represented as " one video has 26 frames, each frame has 2560 features".

Maybe the frames of some videos are less than 209, and 209 frames may miss important information for those long videos with much more than 209 frames. These two parameters are used for saving the computation cost.

It is difficult to perfectly align 2d and 3d features, if you know how to do it, you are welcome to comment.

Thanks for your explanations!

from rmn.

PipiZong commented on July 19, 2024

Hi tgc,
Thanks for your reply! Sorry to bother you again, I still have 2 questions about this,

why do you need to set max_interval and overlap? If you just input 209 frames as the 'clip', you can exactly get 1x26x1024 as the feature by 'features = get_features(clip, i3d_rgb)'. So these two parameters are used for saving the computation cost? If not, how to determine them specifically?

For each video, 2D features (irv2, 1x26x1536) are extracted from 26 images (frames). And i3d features are extracted from 26 segments, which is not the same as the 26 frames. Is it good to concatenate these two features in this dimension (26)? For example, 1x26x2560 cannot be represented as " one video has 26 frames, each frame has 2560 features".

Maybe the frames of some videos are less than 209, and 209 frames may miss important information for those long videos with much more than 209 frames. These two parameters are used for saving the computation cost.

It is difficult to perfectly align 2d and 3d features, if you know how to do it, you are welcome to comment.

Sorry to bother you again! I am still confused about the steps to determine the 'max_interval' and 'overlap'. Could you please give an example, like when we have 2 videos, and one is 10-min long at 25 FPS while another is 8-min long at 30 FPS? Many thanks!

from rmn.

monic7 commented on July 19, 2024

Hi, I have tried to extract features using I3D, IRV2 and BUTD like you mentioned but I am not able to get the same features as you. The features that I obtained seem to be very different from the provided h5 file...

How to get the same features as produced in the h5 file?

How were the equally spaced frames selected? Is it by the following method: index = [int(ceil(i*len(l)/26)) for i in range(26)]
Are the equally spaced frames only needed for irv2 and butd, while i3d takes whole video as input to generate 43 x 1024 for msvd videos then equally spaced by the same method as above?

May I know what other steps are required during extraction?
Thank you very much!

from rmn.

How to get my own extracted-features? about rmn HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent