Git Product home page Git Product logo

Comments (4)

ivanvovk avatar ivanvovk commented on June 22, 2024

Hi @Liujingxiu23.

I think rightly-found noise schedule is significantly dataset-dependent only on extremely small number of iterations.

Authors in the paper note: to have good audio reconstruction quality on less than 1000 iterations you should start noise schedule with small beta values, since they make the most impact on removing static noise. In that case, to extract "pretrained" 12-, 25-, 50- and 100-iteration schemes I used some exponential-type approach (see 25 iters graph I attached). Since during training you always set constant schedule to be linspace(1e-6, 1e-2, steps=1000), it doesn't matter what type of data you train - the lower-iteration denoising trajectory would always be the same, more or less. Thanks to the strong conditions of mel-spectrogram and direct noise level.

Of course, on 6 iterations I assume it wouldn't work so well on new dataset, since 6 points is a very small number to reconstruct the right trajectory.

This is my view.

from wavegrad.

Liujingxiu23 avatar Liujingxiu23 commented on June 22, 2024

@ivanvovk Thank you very much for you reply.
The following is my understanding, can you help me point out whether they are right or not.
1.The noise schedule is designed but not trained, in a trajectory that more or less the same.
If I want to get a good result using iters=6 on my own database, I should try different betas setting that should have similar trajectory like other iterations
2.In the stage of inference, the iteration is from iter-25, 24, 23.... to 1, and the noise level is 0.3096, 0.5343, 0.6959....1.0, and the corresponding beta value is 0.66428 0.41055 0.25373....0.000007(from large to small).
Is that mean that form pure Gaussian noise to coarse-wave and than to refined-wave, the corresponding noise is from big to small, as the beta value?

There is another question about the generation of waves:
In the paper, when y_recon is got , y_t is computed as:
捕获
In the code version of lmnt-com, the related code is simple:
sigma = ((1.0 - alpha_cum[n-1]) / (1.0 - alpha_cum[n]) * beta[n])**0.5
audio += sigma * noise

But in this version, not only log_vari but also mean-value is computed and used for getting a new y_t:
model_mean, model_log_variance = self.p_mean_variance(mels, y, t, clip_denoised)
eps = torch.randn_like(y) if t > 0 else torch.zeros_like(y)
return model_mean + eps * (0.5 * model_log_variance).exp()

I do not understand what is the mean and log_vari, and why you compure y_t in this way.

from wavegrad.

ivanvovk avatar ivanvovk commented on June 22, 2024

@Liujingxiu23 Commenting your questions:

  1. Once again, noise schedule (betas) is set to be constant 1000 values from 1e-6 to 1e-2 (in diffusion order). It sets the noise levels with which we destroy our original distribution during forward diffusion process. At training stage, conditioned on mel-spectrogram and these noise levels our model learns the perturbations made to the data point by approximating the exact injected noise. During inference we want to refine the reverse trajectory of point destruction. Basically, the trajectory variance is linear (during training we set betas to be linspace(1e-6, 1e-2, 1000)). On practice, when constructing lower-iteration schemes, it happens, that for good perceptual quality of waveform restoration your diffusion noise schedule should start with small betas (thus I have built the schedules in exponential way). But even so, 6 iterations - is a very small number to reconstruct fine-grained structures of waveform, however, authors show that if you run grid search, you may find a suitable one.

  2. Didn't got this question. betas are ordered in ascending manner for diffusion process (descending for generation). alphas are computed as 1 - betas, thus they are of descending order for diffusion (ascending for generation). Noise levels are computed as sqrt(alphas_cumprod) - cumulative product doesn't change order since alphas are of range [0, 1] and sqrt doesn't make any effect on order, so noise levels as alphas are ordered in descending manner for diffusion (ascending for generation).

  3. To reconstruct the reverse denoising process, you need to know Gaussian transitions of diffusion process: mean and variance. Thanks to the model architecture you can do that analytically by estimating denoising posteriors q(y_{t-1} | y_t, y_0). Code base of lmnt-com guys is equivalent, they wrote it in a single formula as paper suggests but code syntax lost the probabilistic logic. See original DDPM paper and issue created by main WaveGrad author Nanxin Chen.

from wavegrad.

ivanvovk avatar ivanvovk commented on June 22, 2024

Closing issue due inactivity. Feel free to make a new issue or reopen this one if you have another questions.

from wavegrad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.