Git Product home page Git Product logo

disentangle-vae-for-vc's People

Contributors

v-manhlt3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

disentangle-vae-for-vc's Issues

Sample rate of audio for training

Thanks for your public repo, I have a question: I trained some models for TTS task like tacotron2, fastspeech, hifigan... and it's almost use mel-spectrogram from wav with sr=22050Hz, so i wonder that did you try to train your model in that sample rate and what is quality of synthesized wav?!

Reproduce issue

A good work!

Just want to ask if the script is the latest one, becasue I tried to train from scratch but fail to convert. Are the hyper-parameters the ones used to produce the provided model? How many epochs did you run? I notice that you selected the 1560 epochs model, is there any way to guide the model selection?

Thank you!

time about train

Hello,how long will the train running?,I run about 12 hours just for 80 epoches ,or when can I stop the running?

about speaker style

Hello, I trained 2000 epochs and the conversion results I obtained were unable to effectively convert speaker styles. The specific parameters are consistent with those set in your paper. I would like to ask you what the specific reason is?

The parameters :
--train true
--dataset_fp=/VCTK_mel1
--latent-size=32
--epochs=2000
--report-interval=250
--lr=1e-4
--samples_length=64
--batch-size=8
--mse_cof=10
--style_cof=0.1
--speaker_size=4 \

--convert true --dataset_fp=/VCTK_mel1
--latent-size=32
--samples_length=64
--batch-size=8
--mse_cof=10
--style_cof=0.1
--speaker_size=4
--src_spk=$src_spk
--trg_spk=$trg_spk \

Questions about Disentangle-VAE training

Hi, I am very interested in your work and decided to train this model. Btw, in the training.sh script, --style_cof appears twice. Which parameter should I choose? Not only that, in the training.sh script, - -samples-length =128, but in your paper it is 64. What should I do?

Loss for KL

Hi, it is a nice work.
There is a question I want to ask.
Why here returns content_mu1, content_logvar1, content_mu2, content_logvar2 instead of q_z1_mu, q_z1_logvar, q_z2_mu, q_z2_logvar?
Here, the step function updates the loss, and according to the paper, in equation 7, the latent vector z should be the one that is concatenated.
I am confused about this. It will be great if you can reply to me.

Reproduce_issue 2

hi, I tried the batch size 8, and obtained the model of epoch 1300. It still does not work...May I know from approximately what epochs the model can succeed in converting?

Btw, in the training.sh script, --style_cof appears twice.

For your convinience, I list the configuration I used as follows:

batch_size:8
hidden_size:"400"
speaker_size:4
latent_size:32
lr:0.0001
epochs:20000
no_cuda:false
dataset:"VCTK"
seed:1
log_interval:500
report_interval:50
sample_size:64
do_not_resume:false
normalize:false
beta_cof:0.1
mse_cof:10
kl_cof:10
style_cof:0.1
samples_length:128
alpha:0.01
dataset_fp:"datasets/VCTK_mel"
log_dir:"results_bs8"
src_spk:"VCTK-Corpus_wav16_p225"
trg_spk:"VCTK-Corpus_wav16_p226"
train:true
convert:false

This is the latest epoch info:
====> Epoch: 1346 Average loss: 1187.7294
recons loss1 epoch_1346: 201.34361554669906
recons loss2 epoch_1346: 201.35837986851598
recons loss1 hat epoch_1346: 198.02147375708228
recons loss2 hat epoch_1346: 198.03397934372362
Z1 KL loss epoch_1346: 75.65964660781998
Z2 kL loss epoch_1346: 75.63799599862314
Z Style KL epoch_1346: 0.0009613697808068078
kl coef: 10

Let me know if you need any other information. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.