Comments (25)
That makes sense, thanks!
Will keep you posted and summarize (for future readers) what I have done
from flowtron.
ciao dario,
Our paper has detailed about how we trained the LibriTTS model. You will not be able to exactly match our training because the LSH model was trained on LJSpeech and two proprietary datasets. Nonetheless, you should be able to reproduce our results by following the steps on the paper substituting the LSH dataset with the LJS dataset. Post issues on this repo if you have them.
- Cuda compilation tools, release 10.0, V10.0.130
- Ubuntu 16.04.6 LTS
- https://github.com/NVIDIA/flowtron/blob/master/config.json#L10
- https://arxiv.org/abs/2005.05957
from flowtron.
Ciao Rafael,
Thank you for your answer.
I decided to train on LibriTTS with a warm start from your pretrained LibriTTS model.
1 Flow
As suggested I started with 1 flow.
After more than 1 million steps the training and validation loss look good, together with the attention weights
Training Loss | Validation Loss | Attention Weights |
---|---|---|
Results
After running the inference at different steps, I found that the ones that "sounded" the best were the ones at approximately step 580,000 (that's also where the validation loss is at its minimum)
Still, the output wasn't satisfactory but at least intelligible.
2 Flows
I am training now with 2 flows, I started from the checkpoint at step 580,000 set the appropriate include layers to null
and so far this is how the training is going:
Training Loss | Validation Loss | Attention Weights 0 | Attention Weights 1 |
---|---|---|---|
Results
When I run the inference on the early steps of this 2 flow training (step 10,000) the output is still "ok"
Step 10,000 - Output 1 | Step 10,000 - Output 2 |
---|---|
At step 240,000 even though the losses are lower, the inference results are bad
Step 240,000 - Output 1 | Step 240,000 - Output 2 |
---|---|
My questions:
- Is it expected that during the training of the 2 flow network, the output will momentarily get worse?
- Why are the attention weights so bad at inference time, when they are not bad during training? (See Tensorboard plots)
Thanks a lot again @rafaelvalle
from flowtron.
- Yes, because the model does not know how to attend on the most recently added flow.
- During training we're performing a forward pass and the first flow step knows how to attend to the inputs. When we perform inference, the last flow step (closest to z) is the first to attend to the inputs but this flow step does not know how to attend given your Attention Weights 1 image.
Try inference again once your Attention Weights 1 look better.
from flowtron.
Ok, I have been running the training with 2 flows now for a while.
This is what I see on TensorBoard
Attention Weights 1 | Attention Weights 0 | Validation Loss | Training Loss |
---|---|---|---|
I would say that everything looks great.
When I run the inference everything looks (and sounds) bad
Attention Weights 0 | Attention Weights 1 |
---|---|
@rafaelvalle What would you recommend?
Things looked and sounded better at the end of training with 1 flow
Thanks
from flowtron.
Confirm that during inference the hyperparams in config.json match what is used during training.
As a sanity check, generate a few sentences from the training data.
Then check if the issue is sentence or speaker dependent.
from flowtron.
config.json
is the same
A couple of training sentences with speaker 40 and 887:
Speaker 40
Attention Weights 0 | Attention Weights 1 |
---|---|
Speaker 887
Attention Weights 0 | Attention Weights 1 |
---|---|
Better but not good.
It seems to be sentence dependent.
from flowtron.
If you're not, make sure to add punctuation to phrases.
from flowtron.
I did add punctuation.
Should I just train longer?
from flowtron.
Did you try a lower value of sigma?
from flowtron.
I was already running it with sigma=0.5
from flowtron.
Try something even more conservative, 0.25.
Is this model trained with speaker embeddings?
Also, can you share the phrases you've been evaluating?
from flowtron.
What happens if you set n_frames
to be 6 times the number of tokens?
from flowtron.
Yes, the model is trained with speaker embeddings.
Here are some examples:
I set sigma as low as 0.25 as you suggested.
"I was good enough not to contradict this startling assertion." -i 887 -s 0.25
"Then one begins to appraise." -i 1116 -s 0.25
"Now let us return to your particular world." -i 40 -s 0.25
And in the inference.py
script I added the computation for n_frames
text = trainset.get_text(text).cuda()
n_frames = len(text)*6
Still bad results
from flowtron.
Try these modifications to the phrases:
"I was good enough to contradict this startling assertion."
"Now let us return your particular world."
from flowtron.
Speaker 40: "Now let us return your particular world."
Attention Weights 0 | Attention Weights 1 |
---|---|
Speaker 887: "I was good enough to contradict this startling assertion."
Attention Weights 0 | Attention Weights 1 |
---|---|
from flowtron.
That's very surprising. Give us some time to look into it.
from flowtron.
Thanks a lot! I really appreciate your help.
Please let me know if I can be more involved in the investigation
from flowtron.
One thing: there are differences in the output when running the inference on different checkpoints.
Still, none of them are good enough, but there are significant fluctuations of course.
from flowtron.
Are the speaker ids you're sharing the LibriTTS ids? The model should have about 123 speakers.
from flowtron.
Yes, from the LibriTTS ids: list
from flowtron.
I synthesized the 3 phrases with our LibriTTS-100 model trained with speaker embedding using sigma=0.75
and n_frames = 1000
.
Your attention weights during training look really good and your validation loss is similar to what we reached.
Can you share your model weights?
from flowtron.
Those phrases sound like what I'd like to hear.
I uploaded the checkpoint I used here
There is one small difference in the dataset:
Speaker number 40 has a few sentences that were taken away
This is the config file
This is the training files list
from flowtron.
@rafaelvalle did you manage to run the inference using the weights I shared?
Thanks
from flowtron.
Yes, I get similar results to your results by using your model.
Will take a look at your model once the paper deadlines are over.
from flowtron.
Related Issues (20)
- Inference starting repeat itself. HOT 5
- List index out of range
- Request for clarification on some of the readme scripts. HOT 8
- Custom model resumed from pre-trained model has a stuttering problem.
- How would one keep the model loaded for immediate synthesis? HOT 17
- Inference on pre-trained model (flowtron_ljs) speaking nonsense. HOT 4
- Inference Demo "Hitting gate limit" HOT 2
- .
- inference speed on CPU
- Accelerated inference with TensorRT HOT 2
- Single word input leads to ValueError: Expected more than 1 spatial element when training, got input size torch.Size([1, 512, 1]) HOT 1
- Error on loading training model "_pickle.UnpicklingError: invalid load key, '<'"
- Custom trained model and dataset problem
- Index out of range for custom dataset.
- value error while training custom dataset
- TypeError: guvectorize() missing 1 required positional argument 'signature' HOT 1
- _pickle.UnpicklingError: invalid load key, '<'. in inference.py in colab HOT 3
- What's the filelist used to train LibriTTS2k pretrained embedding?
- Unable to train on custom data with multiple speakers HOT 6
- Which torch version to use?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flowtron.