phizaz / diffae Goto Github PK

View Code? Open in Web Editor NEW

823.0 9.0 124.0 11.19 MB

Official implementation of Diffusion Autoencoders

Home Page: https://diff-ae.github.io/

License: MIT License

Python 5.35% Jupyter Notebook 94.63% Shell 0.01%

deep-learning diffusion-models autoencoder ffhq lsun latent-variable-models cvpr2022

diffae's Introduction

Official implementation of Diffusion Autoencoders

A CVPR 2022 (ORAL) paper (paper, site, 5-min video):

@inproceedings{preechakul2021diffusion,
      title={Diffusion Autoencoders: Toward a Meaningful and Decodable Representation}, 
      author={Preechakul, Konpat and Chatthee, Nattanat and Wizadwongsa, Suttisak and Suwajanakorn, Supasorn},
      booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
      year={2022},
}

Usage

⚙️ Try a Colab walkthrough:

🤗 Try a web demo:

Note: Since we expect a lot of changes on the codebase, please fork the repo before using.

Prerequisites

See requirements.txt

pip install -r requirements.txt

Quick start

A jupyter notebook.

For unconditional generation: sample.ipynb

For manipulation: manipulate.ipynb

For interpolation: interpolate.ipynb

For autoencoding: autoencoding.ipynb

Aligning your own images:

Put images into the imgs directory
Run align.py (need to pip install dlib requests)
Result images will be available in imgs_align directory

Original in `imgs` directory	Aligned with `align.py`	Using `manipulate.ipynb`

Checkpoints

We provide checkpoints for the following models:

DDIM: FFHQ128 (72M, 130M), Bedroom128, Horse128
DiffAE (autoencoding only): FFHQ256, FFHQ128 (72M, 130M), Bedroom128, Horse128
DiffAE (with latent DPM, can sample): FFHQ256, FFHQ128, Bedroom128, Horse128
DiffAE's classifiers (for manipulation): FFHQ256's latent on CelebAHQ, FFHQ128's latent on CelebAHQ

Checkpoints ought to be put into a separate directory checkpoints. Download the checkpoints and put them into checkpoints directory. It should look like this:

checkpoints/
- bedroom128_autoenc
    - last.ckpt # diffae checkpoint
    - latent.ckpt # predicted z_sem on the dataset
- bedroom128_autoenc_latent
    - last.ckpt # diffae + latent DPM checkpoint
- bedroom128_ddpm
- ...

LMDB Datasets

We do not own any of the following datasets. We provide the LMDB ready-to-use dataset for the sake of convenience.

Broken links

Note: I'm trying to recover the following links.

The directory tree should be:

datasets/
- bedroom256.lmdb
- celebahq256.lmdb
- celeba.lmdb
- ffhq256.lmdb
- horse256.lmdb

You can also download from the original sources, and use our provided codes to package them as LMDB files. Original sources for each dataset is as follows:

FFHQ (https://github.com/NVlabs/ffhq-dataset)
CelebAHQ (https://github.com/switchablenorms/CelebAMask-HQ)
CelebA (https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)
LSUN (https://github.com/fyu/lsun)

The conversion codes are provided as:

data_resize_bedroom.py
data_resize_celebhq.py
data_resize_celeba.py
data_resize_ffhq.py
data_resize_horse.py

Google drive: https://drive.google.com/drive/folders/1abNP4QKGbNnymjn8607BF0cwxX2L23jh?usp=sharing

Training

We provide scripts for training & evaluate DDIM and DiffAE (including latent DPM) on the following datasets: FFHQ128, FFHQ256, Bedroom128, Horse128, Celeba64 (D2C's crop). Usually, the evaluation results (FID's) will be available in eval directory.

Note: Most experiment requires at least 4x V100s during training the DPM models while requiring 1x 2080Ti during training the accompanying latent DPM.

FFHQ128

# diffae
python run_ffhq128.py
# ddim
python run_ffhq128_ddim.py

A classifier (for manipulation) can be trained using:

python run_ffhq128_cls.py

FFHQ256

We only trained the DiffAE due to high computation cost. This requires 8x V100s.

sbatch run_ffhq256.py

After the task is done, you need to train the latent DPM (requiring only 1x 2080Ti)

python run_ffhq256_latent.py

A classifier (for manipulation) can be trained using:

python run_ffhq256_cls.py

Bedroom128

# diffae
python run_bedroom128.py
# ddim
python run_bedroom128_ddim.py

Horse128

# diffae
python run_horse128.py
# ddim
python run_horse128_ddim.py

Celeba64

This experiment can be run on 2080Ti's.

# diffae
python run_celeba64.py

diffae's People

Contributors

Stargazers

Watchers

Forkers

gorluxor wn1695173791 five-hundred-years-ago 170928 vitochien harskish w-hc jiyouseo celsopitta fido20160817 mq-zhang1 kingofprank dijkstra14 zxzheng826 leg0m4n forks-learning zivzone phymhan silyfox sherryxtchen jonaskouwenhoven twistedalex techthiyanes ishine maxmax2016 chenxwh saptarshi-saha-1996 erwannmillon jxzhangjhu ysterin miora2020 junhopark0314 jaedukseo anthonyhu fjgeodev helia95 jonasgrebe lyp2333 albhl ibinti tianqi-zhu jbmaxwell rsn870 jackzhousz wdon021 yair-schiff anirudhs96 tubui vedantdere0104 athaioan snidhan danbotchan lesballes libbyi babyblue26 wjulyw bradenyang cybersys peterhan91 wzssl juanprietob raynoldpanjiz yicijiuhaobala wenbinlee lazigu cenkbircanoglu kkodoo leixiaoning nx6xe23 awj2021 brightgu jiankunw molecular-medicine henryfw kjt0518 yiiitong u3voli soumickmj ashwinrajendraprasad harshasatyavardhan michsc33 dhwanit99 mbrown3434 tjdoc chrisbyd bochenys mfazampour zhiyuanyou pbloem alain-ryser zeldam1 sandarujayawardana sebascorreia lcolbois bigcowpeking yu12ki04 dsilvavinicius akkapakasaikiran grantmcconachie render-ai

diffae's Issues

Not compatible on A100 or RTX3090

Hi, thanks for your great work!
I run your code on V100 successfully, however, it seems uncompatible while running on A100 or RTX3090. If your team have test cods on any of above hardware, could you please offer me a corrsponding list of environments like requirements.txt?
Thank you a lot!

Train AutoEncoder Only

Hi,
Can we train the autoencoder only, by fixing the ddim?
I want to train an autoencoder on a feature vector of size 64x64x256 and expect to get the z_sem which can work with the pretrained ddim.
The feature vector was generated from an image using a different U-Net architecture. The feature vector has all the information of the original image, as we can easily transfer it back to the original image using the decoder of U-Net model.
Now using the original image, I got the z_sem from the pre-trained diffae autoencoder, which can be used as a ground truth.
Is there a some to train only the autoencoder with the feature vector and the ground truth z_sem?

how the model ensure semantic information goes to z rather than x_t?

Dear diffae team,

Thank you for sharing this great work, I really enjoy it.

I paly with the model a bit. When I fix z and randomly sample x_t, the output images are almost the same, with some small variaces. how the model ensure semantic information goes to z rather than x_t (starting gaussian noise)? Maybe because the model is trained with reconstruction loss, z is the only source for the target image information, and x_t is randomly sampled even for the same target image. So the model learn to 'ingore' x_t and therefore x_t only contains little semantic information.

In this repo, Justin takes the Unet from stable diffusion and the image encoder from CLIP, and train the model using image reconstruction loss. If my intuition in last paragraph is correct, then given the same image as input, we randomly sample x_t, the output images will be almost the same. But this not the case in Justin's model, changing x_t causes a large change (including semantic change) in output image. This mean the x_t in Justin's model contain more semantic information than the x_t in DA model. Could you tell me why this is happening?

Thank you for your help.

Best Wishes,

Zongze

about fine-tune diffae

Thanks for the excellent work.
I want to train diffae on my dataset, due to equipment and time reasons, I want to fine-tune on the basis of the model you released instead of training from scratch, have you tried fine-tuning model that you have already trained on another dataset? Or, do you think this solution is feasible? If it is feasible, further, taking ffhq256 as an example, how many samples should I fine-tune?

no definition for ModelCheckpoint

no definition for ModelCheckpoint:

checkpoint = ModelCheckpoint(dirpath=f'{conf.logdir}',
                                 save_last=True,
                                 save_top_k=1,
                                 every_n_train_steps=conf.save_every_samples //
                                 conf.batch_size_effective)

Error converting to my own dataset:AttributeError: 'NoneType' object has no attribute 'decode'

Hello! ^_^
I changed the dataset to my own according to your method, but there is the following error, I don't know how to solve it.

how to interpolate faces

can you tell me where to find the code to interpolate faces, or show me how to do it, according to the page I use weighted sum
but I'm not very good at math and I don't know what to add here.

Sampling without noise during training

First of all, thanks for your great work!
I am trying to understand how to properly use deterministic noising and denoising.
From your code (in particular the file diffae/diffusion/base.py) I get that
there are function to do this:
ddim_reverse_sample_loop for the deterministic noising and ddim_sample_loop for
the deterministic denoising.
However, it seems like these are never used during training but only after training.

For the denoising during training, only the function q_sample is used and the noise
parameter is always set to none so that gaussian noise is added.

So it seems like during training the denoising process is not deterministic but then after training
you use the deterministic functions mentioned before.

Is this observation correct? I find it hard to understand the reason for this difference.
Is it desired that the generation is non-deterministic during training and only deterministic after training?

More precisely, I was expecting that the ddim_reverse_sample_loop would also be used
during training so that the latent variables of the same images are also equal during training
(due to the deterministic property of ddim).

I would very much appreciate it you could clear up my confusion.
Thanks in advance, Anthony Mendil.

generative process backward deterministically to obtain the noise map xT

Thanks for the excellent work.
I am a beginner in diffusion models and have recently come into contact with them. I saw this content in your paper.
With DDIM, it is possible to run the generative process backward deterministically to obtain the noise map xT ,
which represents the latent variable or encoding of a givenimage x0. In this context, DDIM can be thought of as an
image decoder that decodes the latent code xT back to theinput image. This process can yield a very accurate reconstruction; however, xT still does not contain high-levelsemantics as would be expected from a meaningful representation.
I also saw the part about ddim reverse in your code. If I just want to get XT from X0, can a regular ddim do it? Does it require special training or model adjustments?
I would very much appreciate it you could clear up my confusion.

is z_sem can be the latent of VQGAN?

Hi, If I train a vqgan on one dataset, whether its latent code can be thought of semantic codes (ignore the shape of latent code of vqgan and your z_sem are different )?

reconstruct blur/noise image

thank you for your work
can i use the auto encoder to remove noise/blur from images?
in the paper i can see the following example

so can i do something similar?
insert image with noise and get reconstructed non noised image ?
thank you

Please provide a license

Hi, congratulations on your excellent work!
I wonder whether you can provide a license for your code.
Thanks.

Training autoenc and Training latent DDM

Might I ask a stupid question that what is the difference between training autoenc and training latent DDM? As far as I understand, I suppose these two are trained at the same time. Can you enlighten me a little bit?

Put changed image back to source. Do opposite that Align.py doing

Hello! any ideas or ready examples how to put image back to source?

EMA model lagging behind

Thanks so much for awesome work!
After around 1 Million images my model samples already look pretty good, but the ema samples are still very noisy (FID of 450 vs. 76). Does this make sense?
I thought in general the ema model is supposed to have better and more stable results.

How to get Figure 10 in the paper?

Could you give a example to get Figure 10 in the paper?
From my understanding, stochastic subcode is random sampe from DDIM.
Could you show the codes?
Looking forward to it.

network architecture

Dear Diffae team,

Thank you for sharing this great work, I really enjoy it.

I understand that the Unet architecture you used is based on guided diffusion model. Unfortunately, they do not provide a figure to visualize the network architecture, it is very hard for me to understand its architecture. Would you mind providing a figure to explain the structure of the Unet you used? From the codes, the Unet seems to contain 3 blocks (input, middle, output).

Thank you for your help.

Best Wishes,

Zongze

FFHQ1024

Dear DiffAE team,

Thank you for sharing this great work, I really like it.

Have you try to train a model in FFHQ dataset with high resolution (512, 1024)? Do you have plan to release the high resolution checkpoints?

Thank you again for your help.

Best Wishes,

Zongze

Total training time

Hello,
Thank you for your work!
However, it was not clear from the paper how much time it took for the model(s) to be trained. Is it in countable (on average) in days or just hours (took less than a day)?

torchmetrics version

The pytorch lightning package doesn't require a specific version of torchmetrics,
This causes import errors since torchmetrics moves classes around.

The ligthning version specified in requirements.txt works with torchmetrics==0.6.2

Unsupervised Domain Adaptation

Can this method be used for unsupervised domain adaptation?

Inverse a image to latent space and recover it

I have trained a model with my own dataset, and can i do latent inversion in my own dataset.

inverse a image to latent space.
and then reconstruction latent vector to a same image.

anyone can help me, how can i do this.

Thanks.

Best wishes.

Question about using the Latent DDIM to Generate $z_{sem}$

Hello DiffAE team, thank you for this work. It's absolutely amazing.
I've been playing around with the codebase, and seem to be lost with how to generate the latent z_sem on its own.

I've been using the third checkpoint FFHQ256:

DiffAE (with latent DPM, can sample): FFHQ256

and seem to be a bit lost with how the LiT module operates. If I'm not mistaken, this checkpoint should allow sampling z_sem directly, like how an image is sampled unconditionally in a vanilla DPM, but can't seem to find the right method to call.

Also, this checkpoint should allow to call upon the model.render() function, but I'm hit with the NotImplementedError.

Am I not getting something right with the module itself? It would be great if I could just call conf = ffhq256_autoenc_latent() upon the model and sample z_sem directly, and also play around with the model.render() function.

Any type of help would be much appreciated.

Cheers,

Tom

Changing dimension of z_sem

Thanks for open-sourcing this repo!

I wanted to confirm which config parameters need to be changed in order to test different sizes for z_sem in the CelebA experiment. Am I right in assuming I need to change the following 3 lines?

TrainConfig -> net_beatgans_embed_channels

    net_beatgans_embed_channels: int = 512

TrainConfig -> style_ch

    style_ch: int = 512

autoenc_base -> conf.net_beatgans_embed_channels

        conf.net_beatgans_embed_channels = 512

Am I missing any other locations that need to be changed (e.g., for ddpm) or am I changing too many config parameters?

When I tried changing only TrainConfig -> net_beatgans_embed_channels I ran into tensor multiplication dim errors for training the latent DDIM.

How to train with my own dataset?

Hello!
@phizaz @nessessence @chenxwh
It's a great work!
Does it support to train on ourown dataset?
Can you give a guide about how to train with my own dataset？
Thanks!

Training head rotation classifier

Hi Konpat and team of DiffAE,

I m still trying to train head rotation classifier, using data from FFHQ-Aging(they labeled the headpose).
I am using:

1 layer linear regression from cond to headpose.(512,3)
MSEloss with a learning rate of 0.001
Training data: 1200 images and headposes.
batch size of 128(tried everything from 16 to 512)
normalized cond, but not the headpose output. Would you recommend normalizing output as well?
accumulate_grad_batches=7
gradient_clip_val=0.5
300 epoches

For some reason, loss is keep being stuck around 65, which seems to be too large. Do you have any idea, what I am doing wrong and how I can improve this loss? Would you recommend using more data(perhaps all 70k images)? In your examples, 1k labeled images were enough, so I m not sure increasing the size of training data will help.

Many thanks,
Rich

usage of

I am a bit confused by args passed to resnet, https://github.com/phizaz/diffae/blob/master/model/blocks.py#L196-L211

    def _forward(
        self,
        x,
        emb=None,
        cond=None,
        lateral=None,
    ):

where emb is time embedding, cond is for semantic code, but what is usage of lateral?

FFHQ Custom class finetuning

Hi Konpat and Diffae team,

Thank you so much for your work! I am trying to create a new class for the head pose(actually 3: pitch, yaw, roll) and finetune it on FFHQ model. It's not a binary classification, so I assume d2c method of having positives and negatives won't work(I have concrete values for each), I need to switch the training loss function from binary cross-entropy to mse. Do I understand correctly that I need to finetune ffhq256_autoenc/latent.ckpt checkpoint? Is it the checkpoint of the encoder?

Thanks,
Richard

stochastic subcode xT

Dear diffae group,

Thank you for sharing this great work.

In section 3.1, you mention that 'for training, the stochastic subcodexT is not needed.' You mean X_T is freezed during training? From time T to T-1, we need to have X_T according to equation 6, right?

Thank you for your help.

Best Wishes,

Zongze

Inference DiffAE as AutoEncoder

Hi, thank you for sharing this nice works.

Can you share some example code for how to use DiffAE as auto encoder?

some .ipynb file would be great.

thanks.

Reconstruct image with only z_{sem}, with x_T is sampled from N (0, I)

Hi, thanks for sharing this nice work.

Could you share some example code for how to reconstruct images by DiffAE when only z_{sem} is encoded from original images but x_T is sampled from N (0, I) for decoding?

It's probably just a small change to the autoencoding.ipynb, but I met some problems when I try to do it.

Thanks a lot.

Question about AdaGN with conditioned Z

Hello! Nice work!
May I ask a relatively stupid question about HOW $z_{sem}$ is add to the UNet ?
Let say h is the previous layer output. In your paper, such $z_{sem}$ adding is like:

$out = Affine(z_{sem}) * (h * MLP_1(\phi _1 (t) ) + MLP_2 (\phi _2 (t)) )$

That is quite wired! Why choose a times operation here?

I found a different understanding of $z_{sem}$ adding:

$temp = (h * MLP_1(\phi _1 (t) ) + MLP_2 (\phi _2 (t)) )$

$Affine(z_{sem})=s, c$

$out = s * temp + c$

Is this understanding right?

I don't know the blue "times" in appendix figure 7 (a) mean. I suspect that it is not a "times" operation?

Minor issue on the pertained models

I think the diffusion model for FFHQ 128 is wrongly uploaded as the pertained classifiers for FFHQ128.
Thank you for sharing your awesome work tho!

DiffAE (with latent DPM, can sample): FFHQ128
DiffAE's classifiers (for manipulation): FFHQ128's latent on CelebAHQ

What's the loss to train the semantic-autoenc?

Dear diffae team,

Thank you for your great work. I have one question about some implementation detail.
I'm a little lost in the loss of training ae.
From my understanding , the ae input is a image and output is a vector contains semantic information.
I see the L1 loss in the templates

So what's the two item to calculate the loss ? Maybe one is the z_sem produced by encoder, what is the other item?
Or this does not have any loss, just train with a ddim and use the loss of ddim to do backpropagation?

about datasets

hi, thanks for your great work, when I try to train the model using python run_ffhq128.py, it show me en error like:

my file tree is as follows:

Could you please help find out what's wrong with my file tree please?
Really thanks and hope for your reply.

no dconditional dim model for ffhq 256?

Render method is non-differentiable

Hi! Thanks for your amazing work. I found diff-ae is quite interesting and trying to apply it to my research as well. However, I found one of the issues that make it impossible to put an outer optimization loop with the diff-ae sampler as a simple image generator (similar to, e.g., a GAN model). In particular, I'm interested in doing manipulation the attributes with a beta tensor that beta.requires_grad = True, like this:

cond2 = cond * beta + cond_new * (1 - beta)

Note that cond and cond_new are calculated beforehand without gradients. The problem I faced is that when I generate an image from the cond2 which has gradients in this way:

gen_img = model.render(xT, cond2, T=100)

At this step, no gradients can flow back through model.render(). The optimization loop which is supposed to update beta cannot update weights.

So, my question is: why the model.render() cannot be differential, and is there any option that allows gradients to flow back this method?

Missing evaluate_interpolate_fid

In experiment.py in the function intp_fid the evaluate_interpolate_fid method is called which doesn't exist in the publicly available repo, alongside the error that will be thrown because .is_interpolate() is not part of TrainMode class. Guess this was not intended.
Adding def is_interpolate(self): return None does fix the problem of the run error, but putting it out here.

A question about add z_sem add to conditional ddim

Dear diffae team,

Thank you for your great work. I have one question about some implementation detail.
I am wondering how do you add the encoded z_sem into the conditional DDIM?

1.Is it mean first feed the z_sem into a linear layer to change its shape then use it to multiply (the bottleneck output+timestep embedding)?

2.Or it is concat(output,z_sem) like cGAN do?
3.If it is not the case 2, what are concat in the middle?
I read the code and try to find where you define the model.render function and which part of code does this work but I didn't find it.
I will appreciate if you could kindly let me where is this code.

Best Regards

Error "AssertionError: 32 != 4" while training

Epoch 1: 14% 2500/17500 [51:03<5:06:23, 1.23s/it, loss=0.0144, v_num=]Traceback (most recent call last):
File "/content/diffae/run_ffhq256.py", line 10, in
train(conf, gpus=gpus, nodes=nodes)
File "/content/diffae/experiment.py", line 937, in train
trainer.fit(model)
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run
results = self._run_stage()
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/trainer/trainer.py", line 1191, in _run_stage
self._run_train()
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/trainer/trainer.py", line 1214, in _run_train
self.fit_loop.run()
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 230, in advance
self.trainer._call_lightning_module_hook("on_train_batch_end", batch_end_outputs, batch, batch_idx)
File "/usr/local/lib/python3.9/dist-packages/pytorch_lightning/trainer/trainer.py", line 1356, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/content/diffae/experiment.py", line 429, in on_train_batch_end
self.log_sample(x_start=imgs)
File "/content/diffae/experiment.py", line 569, in log_sample
do(self.model, '', use_xstart=True, save_real=True)
File "/content/diffae/experiment.py", line 498, in do
gen = self.eval_sampler.sample(model=model,
File "/content/diffae/diffusion/base.py", line 208, in sample
return self.ddim_sample_loop(model,
File "/content/diffae/diffusion/base.py", line 735, in ddim_sample_loop
for sample in self.ddim_sample_loop_progressive(
File "/content/diffae/diffusion/base.py", line 795, in ddim_sample_loop_progressive
out = self.ddim_sample(
File "/content/diffae/diffusion/base.py", line 600, in ddim_sample
out = self.p_mean_variance(
File "/content/diffae/diffusion/diffusion.py", line 96, in p_mean_variance
return super().p_mean_variance(self._wrap_model(model), *args,
File "/content/diffae/diffusion/base.py", line 307, in p_mean_variance
model_forward = model.forward(x=x,
File "/content/diffae/diffusion/diffusion.py", line 153, in forward
return self.model(x=x, t=do(t), t_cond=t_cond, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/diffae/model/unet_autoenc.py", line 149, in forward
assert len(x) == len(x_start), f'{len(x)} != {len(x_start)}'
AssertionError: 32 != 4
Epoch 1: 14%|█▍ | 2500/17500 [51:21<5:08:09, 1.23s/it, loss=0.0144, v_num=]

I got an error like that while I tried to resume the training ffhq256. How do I fix it?

How to eval the model?

Hi, do you have a plan to release evaluation code?

or can you explain how to eval the model?

DiffAE without sampling + other questions

Hello, thanks for the great work, it is quite interesting. I have a couple questions

If we are not interested in the sampling ability of DiffAE, and merely reconstruction, it is sufficient to just train DiffAE without the latent DPM (commented as 'train the autoenc model' in some of the provided training scripts), correct?
I saw in another issue (although I can't find it anymore) you performed some experiments on regularization of z_sem. Did you observe any performance issues besides a less meaningful and interpretable semantic space?
I'm a bit confused on how the semantic encoder is trained, is a reconstruction loss simply calculated between the input and output images, and the gradient passed through the U-net and into the semantic encoder?
I don't see any problems in doing this, but wondering if you had any comments about training DiffAE within some feature space (image encodings, for example)

Thank you for your input, if you have time constraints the first 2 questions are my largest interests, although I am curious on your thoughts to all :)

How many epochs required for training?

When digging in your code, I found that training is based on a number of total iterations (let's call max_steps). Based on your codes in experiment.py, it is computed as max_steps=conf.total_samples // conf.batch_size_effective.

total_samples is predefined at template.py (e.g. 130_000_000 for ffhq128) and batch_size_effective is set to 128 by default. For this example, max_steps = 1_015_625. As FFHQ128 includes 70,000 samples, a number of required epochs are 1_015_625 / (70_000 / 128)) ~ 1857 (this is such a huge epoch to train :(( )

Might you let me know that I am correct?

Some error in the evaluation stage

Thanks for your amazing work! I try to train on customized dataset, when after 2 days training, I got some error in the calcuation of lpips, the traceback as follow:

File "/share_graphics_ai/linminxuan/Workspace/diffusion-models/diffae/experiment.py", line 938, in train
  trainer.fit(model)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
  self._run(model)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 917, in _run
  self._dispatch()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 985, in _dispatch
  self.accelerator.start_training(self)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
  self.training_type_plugin.start_training(trainer)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
  self._results = trainer.run_stage()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 995, in run_stage
  return self._run_train()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 1044, in _run_train
  self.fit_loop.run()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/loops/base.py", line 111, in run
  self.advance(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance
  epoch_output = self.epoch_loop.run(train_dataloader)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/loops/base.py", line 111, in run
  self.advance(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 150, in advance
  "on_train_batch_end", processed_batch_end_outputs, batch, self.iteration_count, self._dataloader_idx
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 1226, in call_hook
  output = hook_fx(*args, **kwargs)
File "/share_graphics_ai/linminxuan/Workspace/diffusion-models/diffae/experiment.py", line 431, in on_train_batch_end
  self.evaluate_scores()
File "/share_graphics_ai/linminxuan/Workspace/diffusion-models/diffae/experiment.py", line 622, in evaluate_scores
  lpips(self.model, '')
File "/share_graphics_ai/linminxuan/Workspace/diffusion-models/diffae/experiment.py", line 611, in lpips
  latent_sampler=self.eval_latent_sampler)
File "/share_graphics_ai/linminxuan/Workspace/diffusion-models/diffae/metrics.py", line 111, in evaluate_lpips
  latent_sampler=latent_sampler)
TypeError: render_condition() got an unexpected keyword argument 'latent_sampler'

There is no "latent_sampler" keyword in "render_condition" function, I guess the latent_sampler should use in the "render_uncondition" case. Should I delete this key?

Nice work! About "Predictive power of the semantic subcode".

Hi, Nice work!!🎉🎉
Do you have a plan to release evaluation code about "Predictive power of the semantic subcode" (Table 8 in paper)?
I don't know how to reproduce this result.
I would appreciate it if you could provide the code!!
Thank you!

Deterministic Reconstruction Error

Upon reading the paper, specifically the part about the deterministic reconstruction with ddim,
I can not seem to understand why the reconstruction is not exact.
You talk about exact reconstruction and also mention reconstruction loss.
Why, if the formulas are deterministic and the output of the network
is also deterministic, is the reconstruction not exact?
Thanks in advance, Anthony Mendil.

About Stochastic encoder

Thanks for your excellent work! It is very inspiring！
I have a question about Stochastic encoder.
Equation 8 in the paper is described as the reverse of equation 1. Equation 8 uses the U-Net ϵθ(xt, t, z) trained in training process to generate x_t+1 from x_t. However, as far as i can see, the ϵθ was trained for denoising from x_t to generate x_t-1.
More specificly, ϵθ is used to predict noise that already exist in x_t, why Stochastic encoder uses the noise that is predicted to be exist currently by ϵθ to map the picture to latent space?
Thanks for your answering!

Request for Download Link

Hi,
Could you please provide the download link for DiffAE (with latent DPM, can sample): [FFHQ256]? Just like the links you provided in the DiffAE Manipulation demo for Colab (simplified version).ipynb.
Thank you!

Image reconstruction problem

Hello, thanks for your code and pre-trained models!
I am trying to run the Manipulate.ipynb code to reconstruct the source image, but it seems that the refactoring is not working very well. Is there something wrong with my code?
Here's my testing code:

Here's the results:

About the peformance of running "sbatch run_ffhq256.py"

Thanks for the excellent work.
I have tried to run sbatch run_ffhq256.py with 1 node 8 GPUS.
In comparison to the released model last.ckpt
I found some differences in performance.

My evalution code was based on https://github.com/phizaz/diffae/blob/master/autoencoding.ipynb with a change in how to get XT.

xT = torch.randn(len(cond),3,conf.img_size,conf.img_size,device=cond.device)

For the released model, I got:

For my training model, I got

The difference is even larger when I test both in other images (different dataset)

For the released model, I got:

For my training model, I got

It shows the released model is very robust in different situations but not for my version.

Did I mistake anything in code?

phizaz / diffae Goto Github PK

diffae's Introduction

Official implementation of Diffusion Autoencoders

Usage

Prerequisites

Quick start

Checkpoints

LMDB Datasets

Training

diffae's People

Contributors

Stargazers

Watchers

Forkers

diffae's Issues

Recommend Projects

Recommend Topics

Recommend Org