tiankaihang / min-snr-diffusion-training Goto Github PK
View Code? Open in Web Editor NEW[ICCV 2023] Efficient Diffusion Training via Min-SNR Weighting Strategy
[ICCV 2023] Efficient Diffusion Training via Min-SNR Weighting Strategy
First, congratulations on a very interesting paper!
I'm attempting to follow the math in the appendix, which derives implicit weighting schemes for different objectives. I follow the noise-prediction math in its entirety.
For v-prediction, I follow to the 5th line (looking the paper version on Arxiv, v2: https://arxiv.org/abs/2303.09556), then I arrive at a different result (abbreviating the syntax slightly):
Should I not be squaring factors pulled out of the squared
I hope I have not made a trivial error, but it is certainly possible, and I apologize if that is the case.
An unrelated question, but have you compared the weighting scheme you propose against the implicit weighting scheme of v-prediction?
Thanks for your help, and congrats again on a very interesting result!
Hi,
I'm very interested in your work and would like to reproduce the result.
However, I could not find the script for training the UNET.
Could you point out where it is?
Thanks
I greatly appreciate your work and am interested in reproducing the results for CelebA 64x64 (unconditional) and ImageNet 64x64 UNet models. I noticed there are architectural differences between these models, and I am unsure which hyperparameters need adjustment to align with the findings presented in your paper.
Could you please provide the scripts or any suggestions on how to proceed with these models?
Thanks,
Hello,
I just wanted to confirm the formulation for the loss weight using the v-prediction objective in the code is correct.
In guided_diffusion/gaussian_diffusion.py, lines 861 to 864 we have:
elif self.mse_loss_weight_type.startswith("vmin_snr_"): k = float(self.mse_loss_weight_type.split('vmin_snr_')[-1]) # min{snr, k} mse_loss_weight = th.stack([snr, k * th.ones_like(t)], dim=1).min(dim=1)[0] / (snr + 1)
Should the snr inside the stack also be +1? Without it the loss weights are always < 1, the weighting for SNRs near 0 will also be near 0, and the weight for zero-terminal SNR would be == 0.
Thank you.
Hello @TiankaiHang,
Your paper talks about division by (SNR + 1) for velocity objective, but when I read your code I could not see this dividing.
Could you explain it to me? It is necessary to do loss_weight = minimum(snr, 5) / (snr + 1)
for velocity objective???
Thanks.
Is it possible to generate images larger than 64x64 using this model? The baseline vit models included for training are only 32 and 64 base and large models, and when --image_size 256 was set, I received a dimension mismatch error.
Good evening, Dr. Hang,
Thank you so much for sharing this excelent work.
Could you please provide some details about the implementation of " imagenet 256X256 preparation"? Following is my code:
#####################################
from PIL import Image
def crop_image_to_256x256(image_path):
with Image.open(image_path) as img:
width, height = img.size
left = (width - 256)/2
top = (height - 256)/2
right = (width + 256)/2
bottom = (height + 256)/2
img_cropped = img.crop((left, top, right, bottom))
return img_cropped
from diffusers import AutoencoderKL
def compress_image_with_autoencoderkl(image):
# Initialize the AutoencoderKL model
model_id = "CompVis/autoencoder-kl-stable-diffusion"
autoencoder = AutoencoderKL.from_pretrained(model_id)
# Encode and decode the image
image_tensor = autoencoder.feature_extractor(images=[image], return_tensors="pt").pixel_values
encoded_images = autoencoder.encode(image_tensor)
decoded_images = autoencoder.decode(encoded_images)
return decoded_images
from diffusers import AutoencoderKL
def compress_image_with_autoencoderkl(image):
# Initialize the AutoencoderKL model
model_id = "CompVis/autoencoder-kl-stable-diffusion"
autoencoder = AutoencoderKL.from_pretrained(model_id)
# Encode and decode the image
image_tensor = autoencoder.feature_extractor(images=[image], return_tensors="pt").pixel_values
encoded_images = autoencoder.encode(image_tensor)
decoded_images = autoencoder.decode(encoded_images)
return decoded_images
######################33333
I want to be consistency with yours, in order to use your repo. Can you help me to confirm the above code? or can you kindly provide the code of this data preparation for imagenet?
Thank you very much for your time and assistance. I appreciate any reponse.
Best regards,
Ziqiang
wandb implementation fail to log loss, the step given to wandb always limit to 1-15, or something like that.
Hello, as Bytedance has explained in their paper advocating for zero-terminal SNR, the final sigma value is zero, which leads to a divide by zero using your SNR computation.
I am curious what your proposed solution would be. Mine is to set MSE weight to 1 for these timestamps.
Please advise.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.