Comments (8)
All clip implementation I know of provide a preprocess function, we should use it to convert from image to tensor
However, i wouldn't recommend to do any clip forward for training and instead to use precomputed clip embeddings
The prior training takes as input clip text and clip image
The generator training take as input clip image
So for training we don't need to do any clip forward
The only time we may want to do clip forward is at inference time
from dalle2-pytorch.
@xiankgx yea, this repository does it from -1 to +1, but i've seen some generative models out there (back when i trained a lot of GANs) that does it from 0 to 1. i don't actually know if i've ever read any papers that did a proper comparison between the two
from dalle2-pytorch.
We should be very careful when passing images to various places, for different CLIP implementations which expect different things, also as the prediction target or x_start for the decoder.
from dalle2-pytorch.
@rom1504 yea, i think the issue is that the decoder will be trained images that are simply normalized to -1 to 1, but CLIP uses https://github.com/openai/CLIP/blob/main/clip/clip.py#L85 (but we can always just do this within the embed_image
forward function for the CLIP adapter)
i think what will have to happen is that on CLIP image embedding forward, we unnormalize the image (back to 0 to 1) then run the CLIP normalization
from dalle2-pytorch.
ah I see, makes sense
from dalle2-pytorch.
@xiankgx yes! we definitely need to keep an eye on normalization
from dalle2-pytorch.
I don't understand.
Doesn't the decoder take as input (image, clip embedding) ?
If yes, then how is the normalization used in the clip forward related with the decoder training process?
from dalle2-pytorch.
basically, images usually start off normalized from a range of 0 to 1 (shrunk from 0 to 255)
for DDPMs, we normalize them to -1 to 1 using 2 * image - 1
for CLIP, if we do all the embedding processing externally, then there is no problem - however, for the decoder, we currently take in the image and do both DDPM and derive the CLIP image embedding. so I just have to make sure to unnormalize the image before renormalizing it with what CLIP was trained on, before passing it into the attention net. you can double check my work here! https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L228
from dalle2-pytorch.
Related Issues (20)
- How to combine the huggingface pretrained models with you repository?
- How to change TimeSteps? When I try to change and run, it warns.(Help pls, it's important to me)
- some questions about scaling
- Custom Dalle-2 trained decoder generating random noise
- some interesting results HOT 4
- About the weight file of prior HOT 1
- Inference time is longer than training time HOT 1
- Custom Clip in Decoder HOT 1
- Classifier-Free Guidance Formulation HOT 3
- Can I use a pretrained clip model from huggingface?
- Field missing in config.
- Field required in train_decoder_example config HOT 16
- HELP: Training conditional generation on my own dataset HOT 1
- train datasets
- Loss NaN when training CLIP HOT 5
- "Nan" values in loss.
- How to apply CoCa (open-clip weights)
- Use the code in the Usage section of the README,but error has occurred. HOT 1
- Train on custom input -> noise as output HOT 7
- The resulting images are of poor quality HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dalle2-pytorch.