tiankaihang / min-snr-diffusion-training Goto Github PK

View Code? Open in Web Editor NEW

201.0 201.0 6.0 130 KB

[ICCV 2023] Efficient Diffusion Training via Min-SNR Weighting Strategy

Shell 2.71% Python 91.43% C++ 0.89% Cuda 4.97%

diffusion-models diffusion-transformer efficient-diffusion-training

min-snr-diffusion-training's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes rezabasiri deepfake-study bghira jinxiaolong1129 ruoyufeng

min-snr-diffusion-training's Issues

v-prediction implicit weighting (appendix math)

First, congratulations on a very interesting paper!

I'm attempting to follow the math in the appendix, which derives implicit weighting schemes for different objectives. I follow the noise-prediction math in its entirety.

For v-prediction, I follow to the 5th line (looking the paper version on Arxiv, v2: https://arxiv.org/abs/2303.09556), then I arrive at a different result (abbreviating the syntax slightly):

$$\ell = \| \frac{ \alpha_t^2 + \sigma_t^2 }{\sigma_t} (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$

$$= \frac{1}{\sigma_t^2} \| (\alpha_t^2 + \sigma_t^2 ) (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$

$$= \frac{(\alpha_t^2 + \sigma_t^2)^2}{\sigma_t^2} \| (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$

$$= \frac{\sigma_t^2}{\sigma_t^2}\frac{(\alpha_t^2 + \sigma_t^2)^2}{\sigma_t^2} \| (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$

$$= \sigma_t^2 (\frac{\alpha_t^2 + \sigma_t^2}{\sigma_t^2} )^2 \| (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$

$$= \sigma_t^2 (\frac{\alpha_t^2}{\sigma_t^2} + 1 )^2 \| (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$

$$= \sigma_t^2 ( {SNR}_t + 1 )^2 \| (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$

Should I not be squaring factors pulled out of the squared $L_2$ objective? If I do not, then I wouldn't get $\sigma_t^2$ in the denominator, and would not complete the SNR term. However, if I do, then it seems to me I must also square $(\alpha_t^2 - \sigma_t^2)$, which I think differs from the authors' result, if I am understanding their work. The noise prediction proof has a similar step, which does square terms pulled out of the squared $L_2$ distance.

I hope I have not made a trivial error, but it is certainly possible, and I apologize if that is the case.

An unrelated question, but have you compared the weighting scheme you propose against the implicit weighting scheme of v-prediction?

Thanks for your help, and congrats again on a very interesting result!

How to train model with "Ours" UNET?

Hi,
I'm very interested in your work and would like to reproduce the result.
However, I could not find the script for training the UNET.
Could you point out where it is?

Thanks

How to Train CelebA 64x64 unconditionally and ImageNet 64x64

I greatly appreciate your work and am interested in reproducing the results for CelebA 64x64 (unconditional) and ImageNet 64x64 UNet models. I noticed there are architectural differences between these models, and I am unsure which hyperparameters need adjustment to align with the findings presented in your paper.

Could you please provide the scripts or any suggestions on how to proceed with these models?

Thanks,

Correct min_snr weighting for v-prediction objective

Hello,

I just wanted to confirm the formulation for the loss weight using the v-prediction objective in the code is correct.

In guided_diffusion/gaussian_diffusion.py, lines 861 to 864 we have:
elif self.mse_loss_weight_type.startswith("vmin_snr_"): k = float(self.mse_loss_weight_type.split('vmin_snr_')[-1]) # min{snr, k} mse_loss_weight = th.stack([snr, k * th.ones_like(t)], dim=1).min(dim=1)[0] / (snr + 1)

Should the snr inside the stack also be +1? Without it the loss weights are always < 1, the weighting for SNRs near 0 will also be near 0, and the weight for zero-terminal SNR would be == 0.

Thank you.

Division by (SNR + 1)?

Hello @TiankaiHang,

Your paper talks about division by (SNR + 1) for velocity objective, but when I read your code I could not see this dividing.

Could you explain it to me? It is necessary to do loss_weight = minimum(snr, 5) / (snr + 1) for velocity objective???

Thanks.

Image Size

Is it possible to generate images larger than 64x64 using this model? The baseline vit models included for training are only 32 and 64 base and large models, and when --image_size 256 was set, I received a dimension mismatch error.

Inquiry about imagenet 256X256 preparation

Good evening, Dr. Hang,

Thank you so much for sharing this excelent work.
Could you please provide some details about the implementation of " imagenet 256X256 preparation"? Following is my code:

#####################################
from PIL import Image

def crop_image_to_256x256(image_path):
with Image.open(image_path) as img:
width, height = img.size
left = (width - 256)/2
top = (height - 256)/2
right = (width + 256)/2
bottom = (height + 256)/2
img_cropped = img.crop((left, top, right, bottom))
return img_cropped

from diffusers import AutoencoderKL

def compress_image_with_autoencoderkl(image):
# Initialize the AutoencoderKL model
model_id = "CompVis/autoencoder-kl-stable-diffusion"
autoencoder = AutoencoderKL.from_pretrained(model_id)

# Encode and decode the image
image_tensor = autoencoder.feature_extractor(images=[image], return_tensors="pt").pixel_values
encoded_images = autoencoder.encode(image_tensor)
decoded_images = autoencoder.decode(encoded_images)

return decoded_images

from diffusers import AutoencoderKL

def compress_image_with_autoencoderkl(image):
# Initialize the AutoencoderKL model
model_id = "CompVis/autoencoder-kl-stable-diffusion"
autoencoder = AutoencoderKL.from_pretrained(model_id)

# Encode and decode the image
image_tensor = autoencoder.feature_extractor(images=[image], return_tensors="pt").pixel_values
encoded_images = autoencoder.encode(image_tensor)
decoded_images = autoencoder.decode(encoded_images)

return decoded_images

######################33333

I want to be consistency with yours, in order to use your repo. Can you help me to confirm the above code? or can you kindly provide the code of this data preparation for imagenet?
Thank you very much for your time and assistance. I appreciate any reponse.

Best regards,
Ziqiang

wandb fail to log

wandb implementation fail to log loss, the step given to wandb always limit to 1-15, or something like that.

Compatibility with rescaled betas for zero terminal SNR

Hello, as Bytedance has explained in their paper advocating for zero-terminal SNR, the final sigma value is zero, which leads to a divide by zero using your SNR computation.

I am curious what your proposed solution would be. Mine is to set MSE weight to 1 for these timestamps.

Please advise.

tiankaihang / min-snr-diffusion-training Goto Github PK

min-snr-diffusion-training's People

Contributors

Stargazers

Watchers

Forkers

min-snr-diffusion-training's Issues

v-prediction implicit weighting (appendix math)

How to train model with "Ours" UNET?

How to Train CelebA 64x64 unconditionally and ImageNet 64x64

Correct min_snr weighting for v-prediction objective

Division by (SNR + 1)?

Image Size

Inquiry about imagenet 256X256 preparation

wandb fail to log

Compatibility with rescaled betas for zero terminal SNR

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent