Comments (16)
yes, I trained model with one sample and two samples. With one sample, speech is clear, with two samples, results seems like hybrid of two.
You can download sample results and original files here:
https://www.dropbox.com/s/po2gnncbh4k5e1h/results.zip?dl=0
So this is only speculation, but it seems that decoder works well, while encoder might have some issues.
Also I observed that update character sequence in eval always produces zeros:
if j < timesteps - 1:
outputs_shifted[:, j + 1] = _outputs[:, j, :]
outputs[:, j, :] = _outputs[:, j, :]
outputs[:, j, :] - all zeros on all timesteps
from tacotron.
https://www.dropbox.com/s/it887p8jw61i5iq/model.zip?dl=0 you can download model from this link. Note, however, that it most likely won't generate other texts, only that sample, since it was trained on single sample, in order to test model capabilities.
from tacotron.
@onyedikilo you can change that in the code part that do train/test splitting. It currently allocates some data for testing, and when you put only one line in file, it is removed for testing, resulting in total number of samples zero. So either add few lines with same file, or change that part in the code.
from tacotron.
As I mentioned in the README file, I haven't achieved any promising results yet. I'll share them if there're any updates.
from tacotron.
I run system with simple setups to test it capabilities:
- single wav file + single text f
- two wav files with different texts
Results:
With 1 file, system manages to generate wav with recognizable speech, but quality is average
With 2 files, system generates recognizable speech for one sample and speech-like sounds for the second
In both cases optimization gets stuck after 200 or so iterations, and no improvement can be achieved by changing learning rate or optimizer. Disabling dropout helps a bit, but not much.
With 2 wav files, loss tend to oscillate widely after each epoch,like:
0.34
0.85
0.33
0.84
...
and progresses further like that, even with very small learning rates.
Conclusions so far: work is very impressive, system can generate audio and it is actually aware of text context, so this part works somehow, which is already big achievement. Still further work is needed. I'd recommend debugging system on such simple cases first, before trying it on large corpus, it has to reproduce training results on small sets.
from tacotron.
Do you mean you trained with one or two samples? Would you share your results or pretrained file?
from tacotron.
@Durham : Is it possible to ask you for the pretrained model?
I listened to your results and one of them is really clear.
from tacotron.
@Durham : When training a single file, I had to put the same line twice to the text.txt and change the number of batch_size to 1. Did you do the same? Using only one line results in batches = 0 and it's looping through a handled exception forever which I don't recall at the moment.
from tacotron.
After running train.py
and eval.py
exactly how they are in the repository I got a model per epoch (I ran 10 epochs not modifiying the original code). The lowest loss was achieved in model_epoch_09_gs_11889_loss_0.19 and the final model is model_epoch_10_gs_13210_loss_0.20.
You can see how it behaved after 14hrs and 20 mins of training.
I used the latest model (model_epoch_10_gs_13210_loss_0.20) for the eval.py
and these are some results: https://www.dropbox.com/sh/ak79vftzhywbmj5/AAAwbjdTE09RWYcCOH5SujYAa?dl=0
If you listened to them @Durham is right when he is saying that the model should work first for small datasets and then it can be trained in the whole dataset.
@Kyubyong or anyone here, do you have any idea where we should pay attention to improve the model?
from tacotron.
trained to 0.06 loss, you can download sample wav from this link https://we.tl/HE9sOliX4W
from tacotron.
@onyedikilo
Did you trained just for one single file? After how many hours/epochs did you achieve that loss?
If you input another text does it sound well?
from tacotron.
@basuam I trained a single wav file, used 2 identical files in the list, changed dropouts to 1.0 and training rate to 0.01. Trained for 1350k steps (I think it was more than 1000 epochs) and loss came down to 0.057. (18h 41m on gtx 1080). The quality was bad 3/5 of the generated sample was silent and not all the sentence was there, only a few words, then changed rate to 0.001 trained 178 more epochs, while changing dropouts to 0.5. Now the sample was much better, loss did not change. But one thing I noticed this time the evaluation loss was around 0.06, I believe it was 0.2 before or something like that. Either the loss functions are different for eval and training test or i am missing something here, I believe the losses should be the same since the test data is the same and the network is the same. I didn't have much time to read the code but I am planning to in a couple of days.
As for another text I didn't try it. Machine is busy training with a much bigger dataset atm. But I saved the model.
from tacotron.
@onyedikilo Thank you for the prompt reply. I'm also reading and trying to understand the code. And as you said, if training set and testing set are the same (the same single wav file), the loss MUST be the same. If they are not, then something is wrong in the code.
from tacotron.
@basuam I think loss should not be the same. On training, one feeds ground truth for previous frame, while on eval, predicted previous frame.
from tacotron.
@onyedikilo @basuam just like @Durham said, seq2seq model is trained with ground truth while training and uses the previous outputs of decoder at each step in inference. This causes exposure bias which can be solved by introducing CBHG module (post-processing). We could also try feeding the previous decoder output every once in a while during training to reduce this error. This is likely the reason why @onyedikilo's training and evaluation loss was different.
from tacotron.
@basuam The loss figure you loaded is the epoch loss or the batch loss?
from tacotron.
Related Issues (20)
- Can anyone guide me how to get Audio out from eval.py file for testing ?? HOT 1
- Bus error: 10 at training HOT 6
- How to synthesize long sentences?
- About the performance of synthesis
- speaker adaption : not to update the encoder parameters
- ref_db=20, max_db=100, Where did these values come from? statistics?
- Generated wave were empty HOT 8
- I've uploaded Donald Trump speeches and transcripts HOT 1
- error in utils,py
- How can we exploit forced alignments?
- Error in data_load.py---TypeError: a bytes-like object is required, not 'str'
- Segmentation fault on training HOT 1
- Get MelSpectogram for wavenet
- IOError: [Errno 2] No such file or directory: '/data/private/voice/LJSpeech-1.0/transcript.csv'
- IOError: [Errno 2] No such file or directory: '/data/private/voice/LJSpeech-1.0/transcript.csv' HOT 1
- Transcript.csv file? HOT 1
- Different result in train and eval and synthesize mode
- how much it takes to train and on what hardware?
- Tensorflow error
- [CONTRIBUTION] Speech Dataset Generator
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tacotron.