The model file's size is more than 4g, so what's the minimum requirement of gpu? I hav

I ran it on a 970 4gb, just need to use autocast and half. </blockquo

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

the minimum requirement of gpu? about imagebind HOT 8 OPEN

facebookresearch commented on July 22, 2024 7

the minimum requirement of gpu?

from imagebind.

Comments (8)

aelnouby commented on July 22, 2024 1

We will work on releasing smaller checkpoints in the coming couple of weeks.

from imagebind.

z-x-x136 commented on July 22, 2024

我们将在接下来的几周内努力发布更小的检查点。
My graphics card is NVIDIA GeForce GTX 1650 Ti，Will a version of my graphics card run in the next few weeks？

from imagebind.

TashaSkyUp commented on July 22, 2024

I ran it on a 970 4gb, just need to use autocast and half.

from imagebind.

tanluDIMA commented on July 22, 2024

I ran it on a 970 4gb, just need to use autocast and half.

Encouraging! Only for inference or also for training? Could you shed a bit more lights on your implementation?

from imagebind.

TashaSkyUp commented on July 22, 2024

Just inference.

from imagebind.

abhimanyu891998 commented on July 22, 2024

Hey @TashaSkyUp , I have been experimenting with imageBind for videos, I essentially extract the clips (5 secs), audio and subtitles from a video and I want them all in the same embedding space. I have tried this with the vanilla imagebind implementation on my 3080ti GPU with 24GB memory. However, I am facing an issue where the embedding generation takes too long. For a 8 minute video, it takes 40 minutes to generate the embeddings for video clips, text segments and audio clips (each corresponding to every 5 second segment of the video).

I was wondering if I could use your implementation to speed up the inference, or if you know of a way to quantize imageBind model to accelerate this process? Or maybe I am doing something wrong, just wanted your advice on it. Thanks again!

from imagebind.

LinB203 commented on July 22, 2024

Hi, here to recommend our work, which is LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment. We open source all training and validation code.
For video just only 16 V100s are needed, if you turn on gradient accumulation then 8 V100s are fine. For depth maps and infrared maps, only 8 V100s are needed.

from imagebind.

anas2908 commented on July 22, 2024

I have videos nearly 8minute of length i want to create an embedding of audio and video, what change do i need to make in the code.

Hey @TashaSkyUp , I have been experimenting with imageBind for videos, I essentially extract the clips (5 secs), audio and subtitles from a video and I want them all in the same embedding space. I have tried this with the vanilla imagebind implementation on my 3080ti GPU with 24GB memory. However, I am facing an issue where the embedding generation takes too long. For a 8 minute video, it takes 40 minutes to generate the embeddings for video clips, text segments and audio clips (each corresponding to every 5 second segment of the video).

I was wondering if I could use your implementation to speed up the inference, or if you know of a way to quantize imageBind model to accelerate this process? Or maybe I am doing something wrong, just wanted your advice on it. Thanks again!

from imagebind.

the minimum requirement of gpu? about imagebind HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent