Git Product home page Git Product logo

Comments (10)

aelnouby avatar aelnouby commented on July 22, 2024 6

Hi everyone,

Thanks for your question and sorry for the late response. The IMU signal corresponds to 10 second clips, this is a typo in the appendix that will be fixed in the coming revision of the paper. For the aligned video, we sample 2 frames at the center of the window.

from imagebind.

artemisp avatar artemisp commented on July 22, 2024 3

Oh yes of course, I did not mean for this to be a final answer - just trying to help out/start a discussion since it has been a while without a response 🥲.

Yes they do provide source code, but once again, the embedding dimension is 1000 corresponding to 5second clips.

For my use case I tried the following to account for the 2x factor: pad with zeros, grab 10 second clips, and the "repeat" method, and it seemed that the repeat method works best. I hope this helps to get your application moving.

from imagebind.

aelimame avatar aelimame commented on July 22, 2024 1

Based on that sample from the Ego-4D dataset (https://ego4d-data.org/docs/data/imu/) the sample rate is 200Hz (5ms each time step). If only T=2000 works, this means they expect the clips to correspond to a 10 seconds video segment?

However they mention this in the paper:

For each video, We select all time-stamps that contains a synchronized IMU signal as well as aligned narrations. We sample 5 second clips around each time-stamp.

So, there seems to be some 2x ratio lost somewhere?

from imagebind.

artemisp avatar artemisp commented on July 22, 2024 1

I agree - I am just making the conjecture that since we want image-IMU alignments for training, if this is the procedure for image padding, it could work for IMU padding to maintain the alignment - even though it is nowhere to be found in the code/paper. It is worth a try. Another option would be to sample 10s - but it seems to directly contradict the paper.

Grabbing a 10s video clip and aligning it with the 5s IMU could make sense - given that there may be a small 1-2s misalignment between IMUs and Videos due to various factors (e.g. latency).

Now....this is all a guess! I tried this method for action recognition (see IMU2CLIP paper) and it seemed to work decently. However, I cannot say for sure if it is the right way to go.

from imagebind.

artemisp avatar artemisp commented on July 22, 2024

It seems that we are supposed to use repeated padding?

PadIm2Video(pad_type="repeat", ntimes=2)

from imagebind.

aelimame avatar aelimame commented on July 22, 2024

It seems that we are supposed to use repeated padding?

PadIm2Video(pad_type="repeat", ntimes=2)

But that's for the image to video transformation (forward() method). It seems to convert a single image to n time steps video. Basically either copying the same image to create a video of the given image (pad_type="repeat") or just using zeros/black images (pad_type="zero") to create the video sequence.

So not related to the IMU processing really.

from imagebind.

aelimame avatar aelimame commented on July 22, 2024

I agree - I am just making the conjecture that since we want image-IMU alignments for training, if this is the procedure for image padding, it could work for IMU padding to maintain the alignment - even though it is nowhere to be found in the code/paper. It is worth a try. Another option would be to sample 10s - but it seems to directly contradict the paper.

Grabbing a 10s video clip and aligning it with the 5s IMU could make sense - given that there may be a small 1-2s misalignment between IMUs and Videos due to various factors (e.g. latency).

Now....this is all a guess! I tried this method for action recognition (see IMU2CLIP paper) and it seemed to work decently. However, I cannot say for sure if it is the right way to go.

Yeah sure, this is all hypothesis waiting for the FAIR guys to validate...

Thanks for sharing that paper, it looks interesting. Do they also provide source code?

from imagebind.

beitong95 avatar beitong95 commented on July 22, 2024

Hi, I was wondering what is the normalization method used on IMU data in ImageBind. It seems the data from ego4d is raw imu data. However, in Figure 7, I found IMU data is clipped to -1 to 1.

from imagebind.

zainhas avatar zainhas commented on July 22, 2024

@beitong95 Good point, another issue with the preprocessing is that it doesn't work for any inputs greater than or less than 2000 points - in my current implementation I've just padded upto 2k or cut down and only taken the first 2k datapoints to generate embeddings. Would be good to know the details about how the model was trained so that embeddings are more reliable!

from imagebind.

RitvikKapila avatar RitvikKapila commented on July 22, 2024

Hi, I had a question similar to that of @beitong95, how is the IMU input preprocessed and/or normalized before being fed as an input to the model? Is there a load_and_transform function provided for IMU? Thanks.

from imagebind.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.