Comments (4)
Hi @Liujingxiu23.
I think rightly-found noise schedule is significantly dataset-dependent only on extremely small number of iterations.
Authors in the paper note: to have good audio reconstruction quality on less than 1000 iterations you should start noise schedule with small beta values, since they make the most impact on removing static noise. In that case, to extract "pretrained" 12-, 25-, 50- and 100-iteration schemes I used some exponential-type approach (see 25 iters graph I attached). Since during training you always set constant schedule to be linspace(1e-6, 1e-2, steps=1000)
, it doesn't matter what type of data you train - the lower-iteration denoising trajectory would always be the same, more or less. Thanks to the strong conditions of mel-spectrogram and direct noise level.
Of course, on 6 iterations I assume it wouldn't work so well on new dataset, since 6 points is a very small number to reconstruct the right trajectory.
This is my view.
from wavegrad.
@ivanvovk Thank you very much for you reply.
The following is my understanding, can you help me point out whether they are right or not.
1.The noise schedule is designed but not trained, in a trajectory that more or less the same.
If I want to get a good result using iters=6 on my own database, I should try different betas setting that should have similar trajectory like other iterations
2.In the stage of inference, the iteration is from iter-25, 24, 23.... to 1, and the noise level is 0.3096, 0.5343, 0.6959....1.0, and the corresponding beta value is 0.66428 0.41055 0.25373....0.000007(from large to small).
Is that mean that form pure Gaussian noise to coarse-wave and than to refined-wave, the corresponding noise is from big to small, as the beta value?
There is another question about the generation of waves:
In the paper, when y_recon is got , y_t is computed as:
In the code version of lmnt-com, the related code is simple:
sigma = ((1.0 - alpha_cum[n-1]) / (1.0 - alpha_cum[n]) * beta[n])**0.5
audio += sigma * noise
But in this version, not only log_vari but also mean-value is computed and used for getting a new y_t:
model_mean, model_log_variance = self.p_mean_variance(mels, y, t, clip_denoised)
eps = torch.randn_like(y) if t > 0 else torch.zeros_like(y)
return model_mean + eps * (0.5 * model_log_variance).exp()
I do not understand what is the mean and log_vari, and why you compure y_t in this way.
from wavegrad.
@Liujingxiu23 Commenting your questions:
-
Once again, noise schedule (betas) is set to be constant 1000 values from 1e-6 to 1e-2 (in diffusion order). It sets the noise levels with which we destroy our original distribution during forward diffusion process. At training stage, conditioned on mel-spectrogram and these noise levels our model learns the perturbations made to the data point by approximating the exact injected noise. During inference we want to refine the reverse trajectory of point destruction. Basically, the trajectory variance is linear (during training we set betas to be
linspace(1e-6, 1e-2, 1000)
). On practice, when constructing lower-iteration schemes, it happens, that for good perceptual quality of waveform restoration your diffusion noise schedule should start with small betas (thus I have built the schedules in exponential way). But even so, 6 iterations - is a very small number to reconstruct fine-grained structures of waveform, however, authors show that if you run grid search, you may find a suitable one. -
Didn't got this question.
betas
are ordered in ascending manner for diffusion process (descending for generation).alphas
are computed as1 - betas
, thus they are of descending order for diffusion (ascending for generation). Noise levels are computed assqrt(alphas_cumprod)
- cumulative product doesn't change order sincealphas
are of range [0, 1] and sqrt doesn't make any effect on order, so noise levels asalphas
are ordered in descending manner for diffusion (ascending for generation). -
To reconstruct the reverse denoising process, you need to know Gaussian transitions of diffusion process: mean and variance. Thanks to the model architecture you can do that analytically by estimating denoising posteriors q(y_{t-1} | y_t, y_0). Code base of lmnt-com guys is equivalent, they wrote it in a single formula as paper suggests but code syntax lost the probabilistic logic. See original DDPM paper and issue created by main WaveGrad author Nanxin Chen.
from wavegrad.
Closing issue due inactivity. Feel free to make a new issue or reopen this one if you have another questions.
from wavegrad.
Related Issues (20)
- ValueError: low >= high HOT 2
- inference.py seems not loading the specified checkpoint HOT 1
- Exponents calculation in positional encoding HOT 9
- Were your `generated_samples` generated using a model trained with AMP? HOT 2
- predict_start_from_noise HOT 2
- best noise schedule HOT 1
- TTS without Text? HOT 2
- Training so slow HOT 3
- How to make it work with TPU? HOT 2
- Audio quality improvements HOT 6
- The order of upsampling_dilations HOT 1
- Interpolation and Conv order in Upsample module HOT 1
- slow training in single GPU HOT 1
- Using NVIDIA RTX 3090 GPU?
- Static Noise with f_max = 10000
- Poor Synthesis Quality on 44k Sample Rate HOT 1
- Evaluation tools
- Unable to load the pre-trained parameters for inference HOT 2
- Matplotlib API change & NaNs for short clips & new hop_length HOT 27
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wavegrad.