Comments (1)
Hi, I am also confused about the weight initialization in different implementations.
Each implementation has its own initialization style
In the official DDPM repo, the convs before residual connections and the final conv are initialized with zeros, while other convs are initialized with zero-mean uniform distributions.
In the ADM guided-diffusion repo, the convs before residual connections and the final conv are also initialized with zeros, while others are initialized by PyTorch default.
In the Score-Based SDE repo, the implementation covers both DDPM/NCSN style initialization.
In this repo, I think it's similar to the Score-Based SDE, but it's still different to the three codebase mentioned above.
My experiments and observations
Recently, I tried to train diffusion models (DDPM, DDIM, EDM, ...) with the original basic UNet (35.7M #params) on CIFAR-10. Here are some observations:
- I can successfully reproduce the FIDs reported by DDPM and DDIM without any custom weight initialization. All parameters are initialized by PyTorch default.
- However, my optimal learning rate differs from those in the official repo (
1e-4
vs2e-4
). When I tried the official one (2e-4
), the FID result got far worse. - I train the EDM model with my no-initialization 35.7M mini network with my learning rate, and the results are reasonable (better than DDIM).
- However, when I train with the EDM proposed
10e-4
learning rate, the FID result got far worse. To confirm it, I replace the networks.py with mine and run with the official EDM code, the FID is still bad.
Seemingly, the mathematical diffusion model (training + sampler) can be decoupled as an individual component. But the neural network model (and its initialization) may be strongly coupled with the hyper-parameters (?).
I wonder if it is really the case, and why the initialization / hyper-parameter matters a lot.
from edm.
Related Issues (20)
- Question about the parameterizations of VP-SDE HOT 1
- params[x] in this process with sizes [x, x] appears not to match sizes of the same param in process 0 HOT 2
- Is it possible to also share the code for Figure 3 in the paper HOT 1
- VP checkpoints are trained using VE-scaling? HOT 2
- About the Checkpoint
- Could you please provide us imagenet-uncond checkpoint?
- Can not reproduct the result in Table 2. HOT 5
- Question about Augmentation HOT 1
- ERROR when using multi-gpu training HOT 7
- Possible error in the up-sampling function HOT 1
- Question about parameter tuning
- On checkpoint for 256*256 images
- Different Phases of Sampling
- Deterministic sampling and sampling steps
- Wrong beta_D coefficient for VP during sampling? HOT 1
- How to select options shown in Table2 for training VP/VE models HOT 1
- DataLoader worker (pid xxxx) is killed by signal HOT 1
- .
- Is the original loss of VP/VE implemented (i.e. network predict epsilon)? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from edm.