<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Exponents calculation in positional encoding about wavegrad HOT 9 CLOSED

ivanvovk commented on June 26, 2024

Exponents calculation in positional encoding

from wavegrad.

Comments (9)

ivanvovk commented on June 26, 2024

@enhuiz thanks for another one feedback. I agree with you that it should be exponents = 1e-4 ** exponents! However, I think for model it is still interpretable at some point since we cat sin and cos, which makes more or less individual (locally) encoding for noise levels at range of [0, 1]. But, obviously, what happens currently, it is not good. Please, report on your experiment with the second approach! I will also try.

from wavegrad.

enhuiz commented on June 26, 2024

Hi @ivanvovk, I have tried to fix the positional encoding and retrain the model, though the grad_norm get lower, the test result seems much worse (from both the test loss curve and the generated samples).

I guess there could be some other issues, so I check the other part of the code and find here is a mismatch between the implementation and the paper (formula 11).

WaveGrad/model/diffusion_process.py

Line 103 in d230621

 outputs = continuous_sqrt_alpha_cumprod * y_0 + (1 - continuous_sqrt_alpha_cumprod**2) * eps 

An sqrt() seems lost here. I'll fix it and try again.

from wavegrad.

ivanvovk commented on June 26, 2024

@enhuiz yeah, I fixed PE and got the same problems. And yeah, you're right, sqrt() is missed here, need to fix it also.

from wavegrad.

ivanvovk commented on June 26, 2024

@enhuiz seems like for me sqrt() update solves the problem and now test samples look good. How it does for you?

from wavegrad.

enhuiz commented on June 26, 2024

The loss curve looks better than the previous one, l1_spec_test_batch_loss and total loss are lower, l1_test_batch_loss is higher which is acceptable as it is measure on the audio wave. Training grad and total loss are both lower.

I think I'm still in the early stage. I use niters=1000 to train and niters=50 to test, the audio quality of the fixed version seems not significantly better than the previous one.

samples-at-12k.zip

from wavegrad.

enhuiz commented on June 26, 2024

WaveGrad/model/diffusion_process.py

Line 116 in d230621

 noise_level = torch.FloatTensor([self.sqrt_alphas_cumprod_prev[t]]).repeat(batch_size, 1).to(mels) 

I find changing this t to t+1 helps remove the noise in the generated sample after fix pe and sqrt, you may check the following samples:

samples-at-12k.zip

I guess here we need the current sqrt cumprod instead of the previous one.

from wavegrad.

ivanvovk commented on June 26, 2024

@enhuiz yes, I agree! For me this change improved the quality even more! Somehow missed it when made the implementation... Thanks for revealing all these bugs, man, I really appreciate it. Now it seems to work fine.

from wavegrad.

enhuiz commented on June 26, 2024

@ivanvovk Good to know it, you are welcome!

from wavegrad.

ivanvovk commented on June 26, 2024

Closing issue since it is solved.

from wavegrad.

Exponents calculation in positional encoding about wavegrad HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent