Comments (9)
You mean you want to apply DiffAE not on RGB images but a matrix of image features, e.g. VQ-VAE-like features? The thing you expect to get by this is a meaningful z_sem that DiffAE may provide?
If so, it seems possible and you don't need to have a groundtruth z_sem for it. You just need to train DiffAE on tor of that image feature space instead of training DiffAE on the RGB space as usual. In this case, DiffAE learns to reconstruct the image features, i.e. 64x64x256, and at the same time learns to come up a useful z_sem.
from diffae.
Hi @phizaz,
Yes, I am looking for something similar. I need to reconstruct the image using the diffusion model. The conditioning i.e. z_sem should come from image feature space while the DDIM should work on RGB space.
I have a few doubts:
- Do I have to train only Diffusion AutoEncoder or DDIM as well?
- If we train Diffusion AutoEncoder only, then whether the z_sem generated from it will be compatible with pre-trained DDIM or not?
- What about the losses like LPIPS if we train in feature space instead of RGB?
- As per my understanding Diffusion AutoEncoder training uses autoencoder as well as diffusion model. If we train a model using a config file 'ffhq128_autoenc_130M', then it will be going to use both autoencoder and diffusion. Am I right?.
from diffae.
A few jargons might need to be made clear first.
- You already have an autoencoder that provides the feature space on which everything else will build up on.
- Diffusion autoencode is itself a kind of DDIM. It's not very clear to mention DiffAE and DDIM separately. In any case, I don't think you need another DDIM besides a DiffAE.
- You mentioned pretrained DDIM* which I'm not sure what it is.
- Definitely, the word autoencoder in DiffAE is NOT the same as in 1). You need to be careful with words here.
My imaginative picture about how should it look like is like this:
- You have a pretrained autoencoder that provides the image feature space.
- You train DiffAE on the image feature space with some loss function, I don't think you need LPIPS.
I don't think you need anymore than these two components.
from diffae.
- I have an autoencoder that transfers an RGB image into a feature space.
- Thanks for clarifying.
- Pretrained DDIM: I was referring to the model generated using 'ffhq128_ddpm_130M' config.
- Thanks for removing the confusion.
If I train DIFFAE on Image Feature Space, then I will need the decoder from my original autoencoder model to transfer the generated feature vector back to RGB space.
Is there any way I can only train the semantic encoder of DIFFAE, keeping the DDIM part fixed? In that way, the semantic encoder will take the image feature space-> generates z_sem (a 512 vector using model.encode() ) -> z_sem will be used for manipulating the Conditional DDIM model which still works on RGB space.
from diffae.
Is there any way I can only train the semantic encoder of DIFFAE, keeping the DDIM part fixed?
I think you mean training only the semantic encoder while keeping the DDIM part fixed. Let assume that we have a pretrained DiffAE on potentially related dataset, it might be possible, not sure, no experiment on this.
from diffae.
Dear Authors,
Thanks for sharing this work. I have a question about how the Semantic Encoder (shown in Figure 2) is trained. I cannot find a related loss for the training of the Semantic Encoder. The paper only shows Eqn (6) and (9), but these two losses are used to train the "conditional DDIM" and "latent DDIM," not for the "Semantic Encoder."
from diffae.
The semantic encoder is trained end-to-end which means the training signal is propagated from the reconstruction loss function, through the diffusion model, UNET, then arrives at the semactic encoder.
This means the encoder is encouraged to encode useful information to help denoising the whole image while only the corrupted version of it is avaible to the UNET at the time.
from diffae.
I see. Thanks for the reply.
Then, in this case, how can we ensure the code z_{sem} will have two separate parts, one for linear semantics and the other for stochastic details (as mentioned in the Abstract)? It seems there is no explicit regularization to encourage this kind of disentangling. Any insightful understanding about this?
from diffae.
z_sem should only encode semantic information leaving the stochasitc part be the job of the X_T.
from diffae.
Related Issues (20)
- about fine-tune diffae
- generative process backward deterministically to obtain the noise map xT HOT 2
- About Stochastic encoder HOT 2
- Deterministic Reconstruction Error HOT 3
- Encountering persistently poor reconstruction ability of DiffAE when used for 9-frame videos
- Could you provide your classification model weights? HOT 1
- About stochastic encoder and how to get the variations in fig3
- About the LPIPS evaluation HOT 1
- About the image range of metrics
- FileNotFoundError: Checkpoint at log-latent/celeba64d2c_autoenc/last.ckpt not found. Aborting training. HOT 1
- Contradiction in the Appendix regarding the Latent DDIM HOT 2
- The grad of training loss with respect to z_{sem} is zero HOT 2
- Why use zero_module? HOT 2
- Configuration of the experiment -- attribute manipulation on real images HOT 4
- what is the input of conditional DDIM decoder? HOT 6
- how to visualize the reconstruction result
- Inquiry about using Guided-Diffusion parameters HOT 2
- How to determine cond_fn in condition_mean HOT 4
- I got the issue about lmdb: lmdb.Error: ffhq256.lmdb: No such file or directory HOT 7
- Regarding the error I met when I try to run the run_bedroom128.py HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diffae.