Git Product home page Git Product logo

Comments (9)

ivanvovk avatar ivanvovk commented on June 26, 2024

@enhuiz thanks for another one feedback. I agree with you that it should be exponents = 1e-4 ** exponents! However, I think for model it is still interpretable at some point since we cat sin and cos, which makes more or less individual (locally) encoding for noise levels at range of [0, 1]. But, obviously, what happens currently, it is not good. Please, report on your experiment with the second approach! I will also try.

from wavegrad.

enhuiz avatar enhuiz commented on June 26, 2024

Hi @ivanvovk, I have tried to fix the positional encoding and retrain the model, though the grad_norm get lower, the test result seems much worse (from both the test loss curve and the generated samples).

image

I guess there could be some other issues, so I check the other part of the code and find here is a mismatch between the implementation and the paper (formula 11).

outputs = continuous_sqrt_alpha_cumprod * y_0 + (1 - continuous_sqrt_alpha_cumprod**2) * eps

An sqrt() seems lost here. I'll fix it and try again.

from wavegrad.

ivanvovk avatar ivanvovk commented on June 26, 2024

@enhuiz yeah, I fixed PE and got the same problems. And yeah, you're right, sqrt() is missed here, need to fix it also.

from wavegrad.

ivanvovk avatar ivanvovk commented on June 26, 2024

@enhuiz seems like for me sqrt() update solves the problem and now test samples look good. How it does for you?

from wavegrad.

enhuiz avatar enhuiz commented on June 26, 2024

image

The loss curve looks better than the previous one, l1_spec_test_batch_loss and total loss are lower, l1_test_batch_loss is higher which is acceptable as it is measure on the audio wave. Training grad and total loss are both lower.

I think I'm still in the early stage. I use niters=1000 to train and niters=50 to test, the audio quality of the fixed version seems not significantly better than the previous one.

samples-at-12k.zip

from wavegrad.

enhuiz avatar enhuiz commented on June 26, 2024

noise_level = torch.FloatTensor([self.sqrt_alphas_cumprod_prev[t]]).repeat(batch_size, 1).to(mels)

I find changing this t to t+1 helps remove the noise in the generated sample after fix pe and sqrt, you may check the following samples:

samples-at-12k.zip

I guess here we need the current sqrt cumprod instead of the previous one.

from wavegrad.

ivanvovk avatar ivanvovk commented on June 26, 2024

@enhuiz yes, I agree! For me this change improved the quality even more! Somehow missed it when made the implementation... Thanks for revealing all these bugs, man, I really appreciate it. Now it seems to work fine.

from wavegrad.

enhuiz avatar enhuiz commented on June 26, 2024

@ivanvovk Good to know it, you are welcome!

from wavegrad.

ivanvovk avatar ivanvovk commented on June 26, 2024

Closing issue since it is solved.

from wavegrad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.