Git Product home page Git Product logo

min-snr-diffusion-training's People

Contributors

tiankaihang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

min-snr-diffusion-training's Issues

v-prediction implicit weighting (appendix math)

First, congratulations on a very interesting paper!

I'm attempting to follow the math in the appendix, which derives implicit weighting schemes for different objectives. I follow the noise-prediction math in its entirety.

For v-prediction, I follow to the 5th line (looking the paper version on Arxiv, v2: https://arxiv.org/abs/2303.09556), then I arrive at a different result (abbreviating the syntax slightly):

$$\ell = \| \frac{ \alpha_t^2 + \sigma_t^2 }{\sigma_t} (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$ $$= \frac{1}{\sigma_t^2} \| (\alpha_t^2 + \sigma_t^2 ) (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$ $$= \frac{(\alpha_t^2 + \sigma_t^2)^2}{\sigma_t^2} \| (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$ $$= \frac{\sigma_t^2}{\sigma_t^2}\frac{(\alpha_t^2 + \sigma_t^2)^2}{\sigma_t^2} \| (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$ $$= \sigma_t^2 (\frac{\alpha_t^2 + \sigma_t^2}{\sigma_t^2} )^2 \| (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$ $$= \sigma_t^2 (\frac{\alpha_t^2}{\sigma_t^2} + 1 )^2 \| (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$ $$= \sigma_t^2 ( {SNR}_t + 1 )^2 \| (x_0 - \hat{x}_{\theta} ) \|^2_2 \\$$

Should I not be squaring factors pulled out of the squared $L_2$ objective? If I do not, then I wouldn't get $\sigma_t^2$ in the denominator, and would not complete the SNR term. However, if I do, then it seems to me I must also square $(\alpha_t^2 - \sigma_t^2)$, which I think differs from the authors' result, if I am understanding their work. The noise prediction proof has a similar step, which does square terms pulled out of the squared $L_2$ distance.

I hope I have not made a trivial error, but it is certainly possible, and I apologize if that is the case.

An unrelated question, but have you compared the weighting scheme you propose against the implicit weighting scheme of v-prediction?

Thanks for your help, and congrats again on a very interesting result!

How to train model with "Ours" UNET?

Hi,
I'm very interested in your work and would like to reproduce the result.
However, I could not find the script for training the UNET.
Could you point out where it is?

Thanks

How to Train CelebA 64x64 unconditionally and ImageNet 64x64

I greatly appreciate your work and am interested in reproducing the results for CelebA 64x64 (unconditional) and ImageNet 64x64 UNet models. I noticed there are architectural differences between these models, and I am unsure which hyperparameters need adjustment to align with the findings presented in your paper.

Could you please provide the scripts or any suggestions on how to proceed with these models?

Thanks,

Correct min_snr weighting for v-prediction objective

Hello,

I just wanted to confirm the formulation for the loss weight using the v-prediction objective in the code is correct.

In guided_diffusion/gaussian_diffusion.py, lines 861 to 864 we have:
elif self.mse_loss_weight_type.startswith("vmin_snr_"): k = float(self.mse_loss_weight_type.split('vmin_snr_')[-1]) # min{snr, k} mse_loss_weight = th.stack([snr, k * th.ones_like(t)], dim=1).min(dim=1)[0] / (snr + 1)

Should the snr inside the stack also be +1? Without it the loss weights are always < 1, the weighting for SNRs near 0 will also be near 0, and the weight for zero-terminal SNR would be == 0.

Thank you.

Division by (SNR + 1)?

Hello @TiankaiHang,

Your paper talks about division by (SNR + 1) for velocity objective, but when I read your code I could not see this dividing.

Could you explain it to me? It is necessary to do loss_weight = minimum(snr, 5) / (snr + 1) for velocity objective???

Thanks.

Image Size

Is it possible to generate images larger than 64x64 using this model? The baseline vit models included for training are only 32 and 64 base and large models, and when --image_size 256 was set, I received a dimension mismatch error.

Inquiry about imagenet 256X256 preparation

Good evening, Dr. Hang,

Thank you so much for sharing this excelent work.
Could you please provide some details about the implementation of " imagenet 256X256 preparation"? Following is my code:

#####################################
from PIL import Image

def crop_image_to_256x256(image_path):
with Image.open(image_path) as img:
width, height = img.size
left = (width - 256)/2
top = (height - 256)/2
right = (width + 256)/2
bottom = (height + 256)/2
img_cropped = img.crop((left, top, right, bottom))
return img_cropped

from diffusers import AutoencoderKL

def compress_image_with_autoencoderkl(image):
# Initialize the AutoencoderKL model
model_id = "CompVis/autoencoder-kl-stable-diffusion"
autoencoder = AutoencoderKL.from_pretrained(model_id)

# Encode and decode the image
image_tensor = autoencoder.feature_extractor(images=[image], return_tensors="pt").pixel_values
encoded_images = autoencoder.encode(image_tensor)
decoded_images = autoencoder.decode(encoded_images)

return decoded_images

from diffusers import AutoencoderKL

def compress_image_with_autoencoderkl(image):
# Initialize the AutoencoderKL model
model_id = "CompVis/autoencoder-kl-stable-diffusion"
autoencoder = AutoencoderKL.from_pretrained(model_id)

# Encode and decode the image
image_tensor = autoencoder.feature_extractor(images=[image], return_tensors="pt").pixel_values
encoded_images = autoencoder.encode(image_tensor)
decoded_images = autoencoder.decode(encoded_images)

return decoded_images

######################33333

I want to be consistency with yours, in order to use your repo. Can you help me to confirm the above code? or can you kindly provide the code of this data preparation for imagenet?
Thank you very much for your time and assistance. I appreciate any reponse.

Best regards,
Ziqiang

wandb fail to log

wandb implementation fail to log loss, the step given to wandb always limit to 1-15, or something like that.

Compatibility with rescaled betas for zero terminal SNR

Hello, as Bytedance has explained in their paper advocating for zero-terminal SNR, the final sigma value is zero, which leads to a divide by zero using your SNR computation.

I am curious what your proposed solution would be. Mine is to set MSE weight to 1 for these timestamps.

Please advise.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.