Comments (20)
Coming soon!
from tacotron.
@xuerq I'm running a sanity-check test. I'll share with you as soon as it's done.
from tacotron.
What error value you have achieved so far? Could you please post convergence plots? I might be wrong, but I think current model need some work before it will be able to do synthesize speech from text. For single wav file, one need to get error below 0.08 on average to hear good speech (reached after 400 epoch with total 2400 weight update steps (I put 6 identical files in the list). And I had to change default learning rate to achieve that. For two wav files, I was not able to train it to speak both files, optimization got stuck near 0.10, despite my all efforts to find good learning rate /optimizer. In text-to-text sequence to sequence models if one can't get them to reproduce few training samples exactly, that usually means they won't work for larger sets as well, although there are some exceptions. So debugging model on simple cases probably needed. Maybe do what paper describes as "ablation experiments", using simple GRU encoder and see if it works.
from tacotron.
I trained a single wav file, used 2 identical files in the list, changed dropouts to 1.0 and training rate to 0.01. Trained for 1350k steps (I think it was more than 1000 epochs) and loss came down to 0.057. (18h 41m on gtx 1080).
Tried to generate a sound with the model using the same text, 3/5 of the file is silent, remaining has some low quality speech. I was trying to overfit the network and see how it would generate.
One thing that I did not understand is that; while the loss is 0.057 on training data, evaluation script shows around 0.58 loss with the same text and wav. Maybe someone can explain the difference between the losses?
from tacotron.
The fatc that 3/5 of the generated file is silent looks fine, because we intended to reconstruct them (zero paddings). The training curve looks good, too. When I was training the whole data, the training curve looks messy. Simply, it keeps hanging around 0.2.
from tacotron.
The silence was at the beginning of the file not in the end.
I believe it is messy because you are using dropout of 0.5 and learning rate of 0.0001, it should converge in time and the spikes will get smaller and smaller gradually.
from tacotron.
I trained with a single file for about 2000 epochs and got this, where loss1 is the seq2seq loss, and loss2 is the spectrogram loss.
Total training loss was about 0.017.
from tacotron.
I trained the model with full data for about 130 epoches. The best loss I got was about 0.14. The loss figures is as follows:
Here is the synthesized audio: http://pan.baidu.com/s/1skMStGT
from tacotron.
from tacotron.
@Spotlight0xff I kept all the hyper parameters unchanged.
from tacotron.
@candlewill how long did it take your machine to reach 180k steps?
from tacotron.
@minsangkim142 It takes about five days with two Tesla M40 24GB GPUs (just one for computation).
from tacotron.
New synthesized speech samples here: http://pan.baidu.com/s/1miohdVy
It was trained on a small data. Just Revelation
from Bible was used. Epoch 2000. Best loss 0.53.
from tacotron.
Some human-like voice is heard, though I can't recognize what he(?)'s saying about. (I think it's natural because the data is far from enough)
I've recently revised the code. When did you start training?
from tacotron.
@candlewill @Kyubyong any new updates ? Thanks!
from tacotron.
Does it learn attention when you use only one sample for training?
I'm worried about just memorizing the whole speech sample rather than predicting it from the text input.
from tacotron.
@candlewill Hi, do you have any suggestions to train the model. I listened to the samples from http://pan.baidu.com/s/1miohdVy. Though the results are not good, it is less noisy than what I synthesized. Really appreciate your answer.
from tacotron.
I train model tacotron with 3 file audio but loss function very high. Data using is Vietnamese
from tacotron.
What are the default number of epochs? And where is it in the code?
from tacotron.
@ashupednekar Did you find it?
from tacotron.
Related Issues (20)
- Can anyone guide me how to get Audio out from eval.py file for testing ?? HOT 1
- Bus error: 10 at training HOT 6
- How to synthesize long sentences?
- About the performance of synthesis
- speaker adaption : not to update the encoder parameters
- ref_db=20, max_db=100, Where did these values come from? statistics?
- Generated wave were empty HOT 8
- I've uploaded Donald Trump speeches and transcripts HOT 1
- error in utils,py
- How can we exploit forced alignments?
- Error in data_load.py---TypeError: a bytes-like object is required, not 'str'
- Segmentation fault on training HOT 1
- Get MelSpectogram for wavenet
- IOError: [Errno 2] No such file or directory: '/data/private/voice/LJSpeech-1.0/transcript.csv'
- IOError: [Errno 2] No such file or directory: '/data/private/voice/LJSpeech-1.0/transcript.csv' HOT 1
- Transcript.csv file? HOT 1
- Different result in train and eval and synthesize mode
- how much it takes to train and on what hardware?
- Tensorflow error
- [CONTRIBUTION] Speech Dataset Generator
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tacotron.