jychoi118 / p2-weighting Goto Github PK

View Code? Open in Web Editor NEW

133.0 133.0 11.0 73 KB

CVPR 2022

License: MIT License

Python 100.00%

p2-weighting's People

Contributors

Stargazers

Watchers

Forkers

shaun95 cv-ip dashstander plyfager twistedalex arwa-x cenkbircanoglu grainsack rachelteamo harry19s hiozx

p2-weighting's Issues

Bug in image_sample.py

Hi, first of all thanks a lot for sharing your code!

I believe that in image_sample.py the line count += 1 (possibly in line 88) is missing, because otherwise subsequent batches will always overwrite the first batch, and sampling will run forever.

P2-weighting/scripts/image_sample.py

Line 65 in f0ae8b7

f"{str(count * args.batch_size + i).zfill(5)}.png")

Kind regards, Jonas

What is the batch size of training

Hi, I could not find the details in the original paper about the batch size.
Do you train all 256x256 models on single GPU with batch-size=8? Or you trained them on multi-gpu with larger batch sizes (for example the original DDPM mentioned 64/128).
Thank you very much

How should I adjust the hyperparameters to train a strong DDPM model based on CelebA for human face inpainting task?

I'm very grateful for your work about this github repository!!! However, I have some questions about the hyperparameters to want to ask you.
I know your repository is heavily based on guided-diffusion, and I want to use guided-diffusion to train a DDPM model based on CelebA dataset consisting of 202,599 align&cropped human face images (each image is 218(height)x178(width) pixels) for human face image synthesis task and then use the saved model to perform human face image inpainting task.

Could you give me some suggestions about how to adjust the hyperparameters used in guided-diffusion to train a DDPM model as strong as models provided by guided-diffusion for human face image synthesis and then inpainting task?
I wonder why the sizes of the models provided by guided-diffusion are so big, especially why the size of 256x256 diffusion (not class conditional) is so big (about 2.1GB) and how it is trained.
By the way, for dataset, should I need to resize the images in my dataset to 256x256 firstly?

I know your repository adds two extra hyperparameters (one is p2_gamma and the another is p2_k), but I still want to use guided-diffusion to train my own DDPM model firstly because I want to learn DDPM from base and then advance, hence, I come here to ask you how to adjust the other hyperparameters except for p2_gamma and p2_k.

I hope the trained strong DDPM model learns the features of human face well so it can be used for human face image synthesis task and human face image inpainting task (i.e. recovering the masked parts of a masked human face image).

I want to know how to adjust the values of the hyperparameters in diffusion_defaults() of script_util.py, model_and_diffusion_defaults() of script_util.py and create_argparser() of image_train.py in guided-diffusion to train a strong denoising model in DDPM based on my CelebA dataset.

I have tried some combinations of the hyperparameters used in guided-diffusion to train, however, the human face image inpainting results of the saved model files ema_0.9999_XXX.pt and modelXXX.pt are both bad.
I mainly used RePaint to perform sampling for human face image inpainting task, as described in the README of RePaint, it uesd the pretrained model celeba256_250000.pt (The model is downloaded from download.sh and based on guided-diffusion) to perform sampling for human face image inpainting task, and the size of the model is big (about 2.1GB) and its (celeba256_250000.pt) sampling results are not bad. However, I don't know why the model's size is so big and how it is trained.

In addition, because of the limitation of my GPU memory, I set the value of the hyperparameter num_channels only 64, I want to know if this hyperparameter affects the performance of the traind DDPM model. Should I try to set it larger?

In conclusion, I hope you can give me some suggestions about how to adjust the hyperparameters used in guided-diffusion to let me get a strong DDPM model for human face image inpainting task.

Thanks a lot for your any help!!!
p.s. I directly and manually set up the values of the hyperparameters in the codes of guided-diffusion not through any flag.

About training on FFHQ

Thanks for your work.

I have two questions for training p2-weighting on FFHQ dataset.

Is it correct to set p2_gamma = 1 and p2_k = 1 for training face dataset (FFHQ) ?
I notice you choose a lightweight unet based on ADM. Do you think using a larger-scale unet will further improve the quality of generated fake face images? If so, which hyperparameters should I modify？

Thanks again and look forward to your reply.

Kind Regards

We cannot download checkpoints from https://1drv.ms/u/s!AkQjJhxDm0Fyhqp_4gkYjwVRBe8V_w?e=Et3ITH

Could you fix the error or re-upload the checkpoint?

Thanks a lot.

About the re-weighted loss

P2-weighting/guided_diffusion/gaussian_diffusion.py

Line 818 in 3ea1470

 weight = _extract_into_tensor(1 / (self.p2_k + self.snr)**self.p2_gamma, t, target.shape) 

I found that you use the weight to multiply on the final loss, in which the denominator is greater than 1 since self.p2_k >= 1 and self.snr > 0 . Therefore, weight is smaller than 1. I wonder how to achieve the result that the total weights of your method is greater than the baseline of DDPM when the SNR is in the interval [1e-2, 1e0]?

SNR value

In the paper in equation (4) you give a formulation of the SNR, but then in the code you use another formulation for this value.
Which is the correct one?

Use CUDA_VISIBLE_DEVICES=x

Thanks a lot for this work. But when I use CUDA_VISIBLE_DEVICES=x, it still works on GPU 0. That's why

.

About the perceptual distance

I want to caculate the perceptual distance for the specific input images, could you please provide the detailed impletation of this part?

How to save the noise image obtained for each moment T of the diffusion process?

How to save the noise image obtained for each moment T of the diffusion process? I want to get a dynamic gif showing the gradual generation of an image from noise. How should I change the code?

DDP Learning

May I ask do you use DDP during training? If yes, do you follow the same setting as https://github.com/openai/guided-diffusion?

About normalized weight

Thanks for your work.

I have one questions for training p2-weighting.
How can I find the normalized weight part in the code like it is in the paper?
Are the results in the paper indicative of unnormalized weights?

Thanks again and look forward to your reply.

The shape of weights curve

Hello @jychoi118,

I read your paper and code and I can not find the relationship between the shape of the weights curve in section D. and the equation in your code: 1 / (self.p2_k + self.snr)**self.p2_gamma.

For example, I'm using the cosine scheduler with 1000 timesteps. The SNR is calculated as snr = 1.0 / (1 - self.alphas_cumprod) - 1. And then the weights are calculated as 1 / (1.0 + self.snr)**1.0.

This is a chart of SNR as a function of diffusion steps:

This is a chart of unnormalized weights as a function of diffusion steps:

This is a chart of unnormalized weights as a function of signal-to-noise ratio (SNR):

This is an original chart of unnormalized weights from Figure A.:

The full code:

import numpy as np
import matplotlib.pyplot as plt

def cosine_beta_schedule(timesteps, s=0.008):
    steps = timesteps + 1
    x = np.linspace(0.0, timesteps, steps)
    alphas_cumprod = np.cos(((x / timesteps) + s) / (1 + s) * np.pi * 0.5) ** 2
    alphas_cumprod = alphas_cumprod / alphas_cumprod[0]
    betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])
    return betas


class NoiseScheduler():
    def __init__(self, timesteps):
        super().__init__()
        self.timesteps = timesteps

        betas = cosine_beta_schedule(timesteps)

        alphas = 1.0 - betas
        self.alphas_cumprod = np.cumprod(alphas, axis=0)
        self.snr = 1.0 / (1 - self.alphas_cumprod) - 1
        self.P2_weights = 1 / (1.0 + self.snr)**1.0


noise_scheduler = NoiseScheduler(timesteps=1000)
plt.plot(np.arange(noise_scheduler.timesteps), noise_scheduler.snr)
plt.show()

plt.plot(np.arange(noise_scheduler.timesteps), noise_scheduler.P2_weights)
plt.show()
            
plt.plot(noise_scheduler.snr, noise_scheduler.P2_weights)
plt.show()

Thanks.

FID score for FFHQ

I tried to reimplement the FID score for FFHQ by using code from torch-fidelity. However, the score is higher than the paper mentioned. I notice the evaluations folder does not have the FFHQ reference batch. Can you please provide the fid score code and the real image used in the evaluation?

About p2-weighting

weight = _extract_into_tensor(1 / (self.p2_k + self.snr)**self.p2_gamma, t, target.shape)

Why is the numerator 1 in the code, not the lambda mentioned in the equation 8?

How to prepare the dataset?

Hello, I would like to use this project to train my own dataset, but I'm not quite sure about the format and directory structure of the dataset. As a result, I'm unsure how to prepare the dataset. Could you please provide some guidance? Thank you very much!

Thank you for your contribution, but I can't make sense of the three curves in Figure 3.

Curve 1: Figure III left (Signal-to-noise ratio (SNR) of linear and cosine noise schedules for reference.) The relationship is given in your paper:

However when I use linear schedule locally, Diffusion Steps = 1000, and SNR by Equation 4, the curve is like this:

Curve 2: Figure III right（Weights of P2 weighting and the baseline with a linear schedule.）The relationship is given in your paper:

However when I use linear schedule locally, Diffusion Steps = 1000, and SNR by Equation 4, weight by

and P2-weight by

set k = 1, γ =1.
Also, I understand that you used the normalization operation: so I scaled weight to a range of 0 to 1 by maximum and minimum values
However, the curve I get is this:

These curves do not match at all the results you gave in your paper, did I do any step wrong? I am very interested in this article and look forward to your reply, thanks!

How many iterations are needed for your pretrained model?

Thanks for your work!

I want to train on my dataset, but I noice that using model050000.pt has bad effects.
Could you tell me how many iterations needed to have similar effects to your pretrained models?
And do your code support training on multiple GPUs?

ffhq train

I would like to know when you train ffhq -use_fp16 parameter Settings, if you can provide training ffhq training Settings more extended than in the paper similar to the following way I would be grateful,I wonder if this setup is consistent with your training.

python scripts/image_train.py --data_dir ./data --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --lr 2e-5 --batch_size 8 --rescale_learned_sigmas True --p2_gamma 0.5 --p2_k 1 --log_dir logs

train on FFHQ 512

Have you tried to train the model on FFHQ 512x512? Are there any suggested parameter settings?

About the formula in the appendix

Hi,
I would like to consult you about the formula in A.2.Derivation in the paper. For "which is a differential of log-SNR(t) regarding time-step t", the differential of log-SNR(t) should be SNR(t)'/SNR(t)，not -SNR(t)/SNR(t)'

If I misunderstood, please reply me as soon as possible, thank you very much

Questions regarding training

First of all, thank you for the great work.

I have 2 questions.
1.
I was training the model in my custom dataset and the training seems to be going forever.
I have model upto model1530000.pt . Does this run forever? When does it stop? I have been running it for already 4~5 days.
I am using single A100 GPU.

Also, it seems that I get 3 .pt file.
ema_0.9999_1530000.pt
model1530000.pt
opt1530000.pt

Which one should I use it for inference?

Thank you in advanced!