v-manhlt3 / disentangle-vae-for-vc Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 6.0 969 KB

Python 99.41% Shell 0.59%

disentangle-vae-for-vc's People

Contributors

Stargazers

Watchers

Forkers

glitteringau powei-c jesper-jung vsumin diogolimamarques mehdi-mirzapour

disentangle-vae-for-vc's Issues

Sample rate of audio for training

Thanks for your public repo, I have a question: I trained some models for TTS task like tacotron2, fastspeech, hifigan... and it's almost use mel-spectrogram from wav with sr=22050Hz, so i wonder that did you try to train your model in that sample rate and what is quality of synthesized wav?!

Reproduce issue

A good work!

Just want to ask if the script is the latest one, becasue I tried to train from scratch but fail to convert. Are the hyper-parameters the ones used to produce the provided model? How many epochs did you run? I notice that you selected the 1560 epochs model, is there any way to guide the model selection?

Thank you!

time about train

Hello，how long will the train running?,I run about 12 hours just for 80 epoches ,or when can I stop the running?

about speaker style

Hello, I trained 2000 epochs and the conversion results I obtained were unable to effectively convert speaker styles. The specific parameters are consistent with those set in your paper. I would like to ask you what the specific reason is？

The parameters :
--train true
--dataset_fp=/VCTK_mel1
--latent-size=32
--epochs=2000
--report-interval=250
--lr=1e-4
--samples_length=64
--batch-size=8
--mse_cof=10
--style_cof=0.1
--speaker_size=4 \

--convert true --dataset_fp=/VCTK_mel1
--latent-size=32
--samples_length=64
--batch-size=8
--mse_cof=10
--style_cof=0.1
--speaker_size=4
--src_spk=$src_spk
--trg_spk=$trg_spk \

Do not backpropagate through style embedding when training?

Hi. I am confused by the forward function of DisentangledVAE. I wonder why you detach the style embeddings from the computational graph. I would appreciate your clarification. Please let me know if I misunderstood anything.

Questions about Disentangle-VAE training

Hi, I am very interested in your work and decided to train this model. Btw, in the training.sh script, --style_cof appears twice. Which parameter should I choose? Not only that, in the training.sh script, - -samples-length =128, but in your paper it is 64. What should I do?

Loss for KL

Hi, it is a nice work.
There is a question I want to ask.
Why here returns content_mu1, content_logvar1, content_mu2, content_logvar2 instead of q_z1_mu, q_z1_logvar, q_z2_mu, q_z2_logvar?
Here, the step function updates the loss, and according to the paper, in equation 7, the latent vector z should be the one that is concatenated.
I am confused about this. It will be great if you can reply to me.

Reproduce_issue 2

hi, I tried the batch size 8, and obtained the model of epoch 1300. It still does not work...May I know from approximately what epochs the model can succeed in converting?

Btw, in the training.sh script, --style_cof appears twice.

For your convinience, I list the configuration I used as follows:

batch_size:8
hidden_size:"400"
speaker_size:4
latent_size:32
lr:0.0001
epochs:20000
no_cuda:false
dataset:"VCTK"
seed:1
log_interval:500
report_interval:50
sample_size:64
do_not_resume:false
normalize:false
beta_cof:0.1
mse_cof:10
kl_cof:10
style_cof:0.1
samples_length:128
alpha:0.01
dataset_fp:"datasets/VCTK_mel"
log_dir:"results_bs8"
src_spk:"VCTK-Corpus_wav16_p225"
trg_spk:"VCTK-Corpus_wav16_p226"
train:true
convert:false

This is the latest epoch info:
====> Epoch: 1346 Average loss: 1187.7294
recons loss1 epoch_1346: 201.34361554669906
recons loss2 epoch_1346: 201.35837986851598
recons loss1 hat epoch_1346: 198.02147375708228
recons loss2 hat epoch_1346: 198.03397934372362
Z1 KL loss epoch_1346: 75.65964660781998
Z2 kL loss epoch_1346: 75.63799599862314
Z Style KL epoch_1346: 0.0009613697808068078
kl coef: 10

Let me know if you need any other information. Thank you!

v-manhlt3 / disentangle-vae-for-vc Goto Github PK

disentangle-vae-for-vc's People

Contributors

Stargazers

Watchers

Forkers

disentangle-vae-for-vc's Issues

Sample rate of audio for training

Reproduce issue

time about train

about speaker style

Do not backpropagate through style embedding when training?

Questions about Disentangle-VAE training

Loss for KL

Reproduce_issue 2

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent