Comments (10)
Agreed. How can you guys spend all that time training the model and writing the paper and setting up the demo website and not spend a few hours giving working example scripts to show us how to use it?
from imagebind.
I am also interested in this. Any news? Also, how can you retrieve an image based on image and audio/text? I am referring to the embedding space arithmetic examples in Figure 4 in the paper. Do you just sum the image embeddings with the audio/text embedding and perform cosine similarity with all the image embeddings and get the most similar image? Thanks!
We made a quick attempt: https://github.com/sail-sg/BindDiffusion
from imagebind.
See also Anything2Image and InternGPT, it is implemented with Diffusers.
from imagebind.
I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?
Maybe this could help Zeqiang-Lai/Anything2Image#4
from imagebind.
I don't think the model can actually generate those things; I think it just 'translates' the information from one form to another. I think it'll have to be built into an extension for SD-WebUI or something, in order to let us play with it more easily.
from imagebind.
I don't think the model can actually generate those things; I think it just 'translates' the information from one form to another. I think it'll have to be built into an extension for SD-WebUI or something, in order to let us play with it more easily.
But the model can be downloaded and loaded in the script.
from imagebind.
I am also interested in this. Any news?
Also, how can you retrieve an image based on image and audio/text? I am referring to the embedding space arithmetic examples in Figure 4 in the paper.
Do you just sum the image embeddings with the audio/text embedding and perform cosine similarity with all the image embeddings and get the most similar image?
Thanks!
from imagebind.
See also Anything2Image , it is implemented with Diffusers.
This works well with a nice gradio GUI interface.
from imagebind.
I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?
from imagebind.
I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?
Maybe this could help Zeqiang-Lai/Anything2Image#4
This is great!!
I'm also looking for "Image+Text --> Image". For example, take a photo and ask to perform some augmentation to the person on the photo (e.g. makeup).
from imagebind.
Related Issues (20)
- 多模态数据对
- `load_and_transform_text` method exec failed HOT 1
- Something wrong with EncodedVideo in load_and_transform_video_data HOT 1
- 预训练模型的输出问题
- Custom sensor as one of the multimodality? HOT 1
- Question regarding SelectElement(index=0) in the modality heads HOT 1
- Using Depth Embeddings in NyuV2 Zero-Shot Classification HOT 4
- Directly using images from S3 bucket using URL.
- Train/Val Split for LLVIP and IMU HOT 1
- Same vector embedding output for different text inputs HOT 3
- Inconsistent Statement Regarding Experiments on NYU-Depth-v2 HOT 2
- Checkpoints for small/medium model
- Imagebind for commercial purposes
- Simply replacing Detic's CLIP-based ‘class’ enbedding with imagebind audio embedding
- How to use ImageBind to locate sound sources in video?
- issue building wheel for cartopy (Windows 11) HOT 3
- 3 and more modalities in one model HOT 1
- What is your perspective on LanguageBind surpassing ImageBind? HOT 1
- Questions for demo sites audio and image data usage.
- Initialization of Thermal backbone
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from imagebind.