Git Product home page Git Product logo

Comments (10)

SoftologyPro avatar SoftologyPro commented on June 23, 2024 10

Agreed. How can you guys spend all that time training the model and writing the paper and setting up the demo website and not spend a few hours giving working example scripts to show us how to use it?

from imagebind.

ikuinen avatar ikuinen commented on June 23, 2024 5

I am also interested in this. Any news? Also, how can you retrieve an image based on image and audio/text? I am referring to the embedding space arithmetic examples in Figure 4 in the paper. Do you just sum the image embeddings with the audio/text embedding and perform cosine similarity with all the image embeddings and get the most similar image? Thanks!

We made a quick attempt: https://github.com/sail-sg/BindDiffusion

from imagebind.

Zeqiang-Lai avatar Zeqiang-Lai commented on June 23, 2024 2

See also Anything2Image and InternGPT, it is implemented with Diffusers.

from imagebind.

Zeqiang-Lai avatar Zeqiang-Lai commented on June 23, 2024 2

I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?

Maybe this could help Zeqiang-Lai/Anything2Image#4

from imagebind.

echo-lalia avatar echo-lalia commented on June 23, 2024 1

I don't think the model can actually generate those things; I think it just 'translates' the information from one form to another. I think it'll have to be built into an extension for SD-WebUI or something, in order to let us play with it more easily.

from imagebind.

WilTay1 avatar WilTay1 commented on June 23, 2024

I don't think the model can actually generate those things; I think it just 'translates' the information from one form to another. I think it'll have to be built into an extension for SD-WebUI or something, in order to let us play with it more easily.

But the model can be downloaded and loaded in the script.

from imagebind.

bakachan19 avatar bakachan19 commented on June 23, 2024

I am also interested in this. Any news?
Also, how can you retrieve an image based on image and audio/text? I am referring to the embedding space arithmetic examples in Figure 4 in the paper.
Do you just sum the image embeddings with the audio/text embedding and perform cosine similarity with all the image embeddings and get the most similar image?
Thanks!

from imagebind.

SoftologyPro avatar SoftologyPro commented on June 23, 2024

See also Anything2Image , it is implemented with Diffusers.

This works well with a nice gradio GUI interface.

from imagebind.

ChloeL19 avatar ChloeL19 commented on June 23, 2024

I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?

from imagebind.

celster avatar celster commented on June 23, 2024

I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?

Maybe this could help Zeqiang-Lai/Anything2Image#4

This is great!!
I'm also looking for "Image+Text --> Image". For example, take a photo and ask to perform some augmentation to the person on the photo (e.g. makeup).

from imagebind.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.