disentangle-vae-for-vc's People
disentangle-vae-for-vc's Issues
Sample rate of audio for training
Thanks for your public repo, I have a question: I trained some models for TTS task like tacotron2, fastspeech, hifigan... and it's almost use mel-spectrogram from wav with sr=22050Hz, so i wonder that did you try to train your model in that sample rate and what is quality of synthesized wav?!
Reproduce issue
A good work!
Just want to ask if the script is the latest one, becasue I tried to train from scratch but fail to convert. Are the hyper-parameters the ones used to produce the provided model? How many epochs did you run? I notice that you selected the 1560 epochs model, is there any way to guide the model selection?
Thank you!
time about train
Hello,how long will the train running?,I run about 12 hours just for 80 epoches ,or when can I stop the running?
about speaker style
Hello, I trained 2000 epochs and the conversion results I obtained were unable to effectively convert speaker styles. The specific parameters are consistent with those set in your paper. I would like to ask you what the specific reason is?
The parameters :
--train true
--dataset_fp=/VCTK_mel1
--latent-size=32
--epochs=2000
--report-interval=250
--lr=1e-4
--samples_length=64
--batch-size=8
--mse_cof=10
--style_cof=0.1
--speaker_size=4 \
--convert true --dataset_fp=/VCTK_mel1
--latent-size=32
--samples_length=64
--batch-size=8
--mse_cof=10
--style_cof=0.1
--speaker_size=4
--src_spk=$src_spk
--trg_spk=$trg_spk \
Do not backpropagate through style embedding when training?
Questions about Disentangle-VAE training
Hi, I am very interested in your work and decided to train this model. Btw, in the training.sh script, --style_cof appears twice. Which parameter should I choose? Not only that, in the training.sh script, - -samples-length =128, but in your paper it is 64. What should I do?
Loss for KL
Hi, it is a nice work.
There is a question I want to ask.
Why here returns content_mu1, content_logvar1, content_mu2, content_logvar2
instead of q_z1_mu, q_z1_logvar, q_z2_mu, q_z2_logvar
?
Here, the step function updates the loss, and according to the paper, in equation 7, the latent vector z should be the one that is concatenated.
I am confused about this. It will be great if you can reply to me.
Reproduce_issue 2
hi, I tried the batch size 8, and obtained the model of epoch 1300. It still does not work...May I know from approximately what epochs the model can succeed in converting?
Btw, in the training.sh script, --style_cof appears twice.
For your convinience, I list the configuration I used as follows:
batch_size:8
hidden_size:"400"
speaker_size:4
latent_size:32
lr:0.0001
epochs:20000
no_cuda:false
dataset:"VCTK"
seed:1
log_interval:500
report_interval:50
sample_size:64
do_not_resume:false
normalize:false
beta_cof:0.1
mse_cof:10
kl_cof:10
style_cof:0.1
samples_length:128
alpha:0.01
dataset_fp:"datasets/VCTK_mel"
log_dir:"results_bs8"
src_spk:"VCTK-Corpus_wav16_p225"
trg_spk:"VCTK-Corpus_wav16_p226"
train:true
convert:false
This is the latest epoch info:
====> Epoch: 1346 Average loss: 1187.7294
recons loss1 epoch_1346: 201.34361554669906
recons loss2 epoch_1346: 201.35837986851598
recons loss1 hat epoch_1346: 198.02147375708228
recons loss2 hat epoch_1346: 198.03397934372362
Z1 KL loss epoch_1346: 75.65964660781998
Z2 kL loss epoch_1346: 75.63799599862314
Z Style KL epoch_1346: 0.0009613697808068078
kl coef: 10
Let me know if you need any other information. Thank you!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.