Git Product home page Git Product logo

Comments (8)

xiankgx avatar xiankgx commented on May 14, 2024 1

Also, here we multiply with a scale without first doing l2norm.

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L986

which is ok if we use XClip because we are doing l2norm here.

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L180

But, we are not doing l2norm when using OpenAI CLIP.

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L274-L275

from dalle2-pytorch.

xiankgx avatar xiankgx commented on May 14, 2024 1

Maybe we can ask crowsonkb for advice.

from dalle2-pytorch.

lucidrains avatar lucidrains commented on May 14, 2024

@xiankgx good idea! i've added it here 14e63a3 although i think the whole l2norm clamping thing is not proven out yet

from dalle2-pytorch.

lucidrains avatar lucidrains commented on May 14, 2024

Also, here we multiply with a scale without first doing l2norm.

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L986

which is ok if we use XClip because we are doing l2norm here.

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L180

But, we are not doing l2norm when using OpenAI CLIP.

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L213

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L213 ohh, this isn't OpenAIClip, it is actually from CoCa https://arxiv.org/abs/2205.01917 , which debuted yesterday. i think it is a better version of clip

however, it is unclear from the CoCa paper whether they l2normed for cosine similarity contrastive learning

in the paper, it seems they use layernorms on both image and text cls tokens, but not sure if the l2norm is present

from dalle2-pytorch.

xiankgx avatar xiankgx commented on May 14, 2024

Sorry, wrong line quote.

from dalle2-pytorch.

xiankgx avatar xiankgx commented on May 14, 2024

Lol, don't take my word for it, I'm a newbie in diffusion models.

from dalle2-pytorch.

lucidrains avatar lucidrains commented on May 14, 2024

newbie

@xiankgx same, i think we all are, except for a few researchers around the world and maybe @crowsonkb lol

you are right! https://github.com/openai/CLIP/blob/main/clip/model.py#L364 they normalized it outside of the encoding functions, let me fix it now 🙏

from dalle2-pytorch.

lucidrains avatar lucidrains commented on May 14, 2024

https://github.com/lucidrains/DALLE2-pytorch/releases/tag/0.1.4

from dalle2-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.