Git Product home page Git Product logo

Comments (6)

ivanvovk avatar ivanvovk commented on June 22, 2024

@janvainer Hey, thanks, man. Yeah, the samples are of a bit lower quality than ones presented in demo page of the paper. However, authors used their personal proprietary dataset for training, where the female had much lower pitch than Linda (it is always hard to train on LJ). And I noticed that the less iterations you make, model reconstructs the less accurate higher frequencies. But I also think there might be some issues in diffusion calculations. I can suggest you to look towards lucidrains code and reuse forward and backward DDPM calculations with improved cosine schedules (maybe this can help): https://github.com/lucidrains/denoising-diffusion-pytorch. His repo follows the paper https://arxiv.org/pdf/2102.09672.pdf. I am going to return to this WaveGrad repo and gain its best quality, finally, once all my other projects are finished. But I think it can be delayed till summer. Also, you can check Mozilla's TTS library, I remember some guys from there interested in WaveGrad and they even added WaveGrad to their codebase: https://github.com/mozilla/TTS. Hope, it can help you.

from wavegrad.

janvainer avatar janvainer commented on June 22, 2024

Thanks for swift repsonse :) I will check the diffusion calculations. I also tried the mozzila version, but the quality of the synthesized audio seemed a bit lower to me, at least for the WaveGrad vocoder combined with tacotron 2. There is this weird high freq noise.

On a side note, I am getting increasing L1 test batch loss, while the l1 test spec batch loss is going down. Did you experience the same behavior?

image

from wavegrad.

ivanvovk avatar ivanvovk commented on June 22, 2024

@janvainer Yes, actually, I remember in my experiments that loss was not representative at all, spectral was more informative. I think such behavior is okay, don't pay attention to this.

from wavegrad.

janvainer avatar janvainer commented on June 22, 2024

Ok thanks! :)

from wavegrad.

yijingshihenxiule avatar yijingshihenxiule commented on June 22, 2024

Hello, @janvainer ! I just train and the audio samples are very noisy now (approx 12 hours 25K epochs on single GPU, batch size 96,). Could you show me your train result? And when will the samples be good? Thanks!

from wavegrad.

janvainer avatar janvainer commented on June 22, 2024

Hi, unfortunately I do not have the results with me anymore. But I remember training on 4 GPUs for several days.

from wavegrad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.