Hi, awesome contribution for TTS community :) I am wondering, did you manage to train

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hello, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Audio quality improvements about wavegrad HOT 6 OPEN

ivanvovk commented on June 22, 2024

Audio quality improvements

from wavegrad.

Comments (6)

ivanvovk commented on June 22, 2024

@janvainer Hey, thanks, man. Yeah, the samples are of a bit lower quality than ones presented in demo page of the paper. However, authors used their personal proprietary dataset for training, where the female had much lower pitch than Linda (it is always hard to train on LJ). And I noticed that the less iterations you make, model reconstructs the less accurate higher frequencies. But I also think there might be some issues in diffusion calculations. I can suggest you to look towards lucidrains code and reuse forward and backward DDPM calculations with improved cosine schedules (maybe this can help): https://github.com/lucidrains/denoising-diffusion-pytorch. His repo follows the paper https://arxiv.org/pdf/2102.09672.pdf. I am going to return to this WaveGrad repo and gain its best quality, finally, once all my other projects are finished. But I think it can be delayed till summer. Also, you can check Mozilla's TTS library, I remember some guys from there interested in WaveGrad and they even added WaveGrad to their codebase: https://github.com/mozilla/TTS. Hope, it can help you.

from wavegrad.

janvainer commented on June 22, 2024

Thanks for swift repsonse :) I will check the diffusion calculations. I also tried the mozzila version, but the quality of the synthesized audio seemed a bit lower to me, at least for the WaveGrad vocoder combined with tacotron 2. There is this weird high freq noise.

On a side note, I am getting increasing L1 test batch loss, while the l1 test spec batch loss is going down. Did you experience the same behavior?

from wavegrad.

ivanvovk commented on June 22, 2024

@janvainer Yes, actually, I remember in my experiments that loss was not representative at all, spectral was more informative. I think such behavior is okay, don't pay attention to this.

from wavegrad.

janvainer commented on June 22, 2024

Ok thanks! :)

from wavegrad.

yijingshihenxiule commented on June 22, 2024

Hello, @janvainer ! I just train and the audio samples are very noisy now (approx 12 hours 25K epochs on single GPU, batch size 96,). Could you show me your train result? And when will the samples be good? Thanks!

from wavegrad.

janvainer commented on June 22, 2024

Hi, unfortunately I do not have the results with me anymore. But I remember training on 4 GPUs for several days.

from wavegrad.

Audio quality improvements about wavegrad HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent