Comments (25)
I think warmstart has a error. it havn't a parameter for include_layers.
_________________________________________________original
if warmstart_checkpoint_path != "":
model = warmstart(warmstart_checkpoint_path, model)
_________________________________________________currect
if warmstart_checkpoint_path != "":
model = warmstart(warmstart_checkpoint_path, model, include_layers)
from flowtron.
@altsoph Yes my issue has been solved. I retrained a better taco2 model for warmstart, which has no bias in the attention plot. For English training, maybe you could try to warmstart with the official pretrained taco2 model.
from flowtron.
Make sure you're training the steps of flow progressively, that is, train a Flowtron with 1 step of flow until it's able to attend to the text. Once this happens, use this model to warmstart a Flowtron with 2 steps of flow and so on.
from flowtron.
@altsoph When you warmstart_checkpoint_path
from the Flowtron with 1 step of flow that is able to attend, make sure you set include_layers to null
otherwise it won't load all weights from the pre-trained model
from flowtron.
Same issue here, I used on non-English speaker dataset > 10h, and used the fixed token/speaker embedding from multi-speaker tacotron2, and start training with n_flow=1. (I did not reuse the text encoder from tacotron2, probably I should also apply this strategy). The model is able to attend for the first flow, but cannot learn the second flow.
from flowtron.
@blx0102 when you started training the 2nd step of flow, did you set include_layers to null
?
do your attention always have that bias on the last token or is it something that shows up over time?
is this model with speaker embeddings?
from flowtron.
@rafaelvalle Thanks for your reply.
- Yes, I did set include_layers to None while started the 2nd step of flow
- Attention almost always has the bias on the last token. Bias can be seen from about 50k-steps
- Yes, I trained with 2 speakers, and embedding dim remains 128
from flowtron.
I too am experiencing the same issue. I began training from the nvidia model with my own 20hr single speaker dataset. I ignored the speaker layer. Trained to 1.3mil and attention is good and inference sounds decent. I used 1e-4 then 1e-5 learning rates. Then I tried on flow 2 with warm start from previous run checkpoint, include layers to none, with both 1e-4 and 1e-5 rates and still same horizontal lines in attention that eventually turn most blank screen.
from flowtron.
@jhjungCode nice find! this does not solve their issue but suggests that they were using include_layers=None
.
from flowtron.
@blx0102 @youmebangbang please try resuming from a 1 step of flow checkpoint that does not have a bias on the attention maps.
@youmebangbang can you show similar plots to the ones we have here?
from flowtron.
@rafaelvalle I apologize I do not understand, how does one resume checkpoint without bias on the attention?
from flowtron.
Do you have a checkpoint in which the model has good attention in no bias?
from flowtron.
@blx0102 BTW, were you able to solve this issue? I'm experiencing the same kind of problems (even on the original LJS data)
from flowtron.
@blx0102 okay, thanks! I'm going to train it for Russian, but decided to check how it works from scratch on English first. It didn't so I suspected there is some trick :)
from flowtron.
@rafaelvalle I trained a 1-flow model to attend without problems, but still can't recover the attention after warmstart with 2 flows. I'll keep trying, thanks :)
from flowtron.
Hello, first, thanks for your work. As I really like waveglow I'm excited about flowtron. I'm having a similar problem while training flowtron in my brazilian portuguese dataset. For 1 flow I can get decent attention alignments, with quite good synthesized speech. Here is one allignment map:
When I try to use it to warmstart a 2 flow model the attention on the first flow gets worse over time and the attention map for the second flow is just noise.
I'm setting include_layers to null, using 3 speakers with 10 hours each, tried with and without speaker embeddings, I'm not using phoneme representations, so I set p_arpabet to 0.0.
So, do you have some suggestions of steps I might take to train the model?
Another question I have is about batch_size, I see people using only batch_size of 1, any reason for that? Should I try different batch sizes?
from flowtron.
@tunnermann can you share the config.json file you're using to train the 2nd flow?
from flowtron.
Thanks for your reply, here it is:
{
"train_config": {
"output_directory": "outdir",
"epochs": 10000000,
"learning_rate": 1e-4,
"weight_decay": 1e-6,
"sigma": 1.0,
"iters_per_checkpoint": 10000,
"batch_size": 1,
"seed": 1234,
"checkpoint_path": "",
"ignore_layers": [],
"include_layers": [],
"warmstart_checkpoint_path": "models/model_1_flow_3_speakers_2",
"with_tensorboard": true,
"fp16_run": true
},
"data_config": {
"training_files": "../dataset/metadata_flowtron.csv",
"validation_files": "../dataset/JC/metadata_flowtron_val.csv",
"text_cleaners": ["basic_cleaners"],
"p_arpabet": 0.0,
"cmudict_path": "data/cmudict_dictionary",
"sampling_rate": 22050,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"mel_fmin": 0.0,
"mel_fmax": 8000.0,
"max_wav_value": 32768.0
},
"dist_config": {
"dist_backend": "nccl",
"dist_url": "tcp://localhost:54321"
},
"model_config": {
"n_speakers": 3,
"n_speaker_dim": 128,
"n_text": 185,
"n_text_dim": 512,
"n_flows": 2,
"n_mel_channels": 80,
"n_attn_channels": 640,
"n_hidden": 1024,
"n_lstm_layers": 2,
"mel_encoder_n_hidden": 512,
"n_components": 0,
"mean_scale": 0.0,
"fixed_gaussian": true,
"dummy_speaker_embedding": false,
"use_gate_layer": true
}
}
from flowtron.
@tunnermann Can you share the loss curves for 1 step of flow and 2 steps flow?
Is there a moment in which your 1 step of flow model does not attends properly to the text but without attending to the first token like your attention plot shows?
from flowtron.
Sure, here are the loss curves:
1-Flow:
2-Flow(I lost the one with more steps, but it looks similar):
No, I can't find any 1 step model that does not attend to the first token.
Another thing I noticed now is that I may be doing something wrong in inference with 2 flows. In tensorboard images the attention in attention_weights_0 is still good(attention_weights_1 is noise), but when I do inference it is not, like in the image bellow(sid1_sigma05_attnlayer1.png):
from flowtron.
What about the validation loss?
from flowtron.
Here you go, 1-flow:
2-flow:
Inference keeps getting worse the more I train, even though the image of the attention in tensorboard does not change much.
from flowtron.
Keep training while the validation loss is going down.
from flowtron.
Thanks for your help, just knowing there is nothing obviously wrong with what I'm doing already clear things for me. I will continue with this training as you suggested, then I will try generating a 1-flow model without the first token attention problem, and if it is not solved yet I will try using phonemes to help with learning attention.
from flowtron.
Yes, the attention map on the first flow should remain correct while the attention map on the second flow improves and the validation loss goes down. Let us know when you're able to get good results.
from flowtron.
Related Issues (20)
- Inference starting repeat itself. HOT 5
- List index out of range
- Request for clarification on some of the readme scripts. HOT 8
- Custom model resumed from pre-trained model has a stuttering problem.
- How would one keep the model loaded for immediate synthesis? HOT 17
- Inference on pre-trained model (flowtron_ljs) speaking nonsense. HOT 4
- Inference Demo "Hitting gate limit" HOT 2
- .
- inference speed on CPU
- Accelerated inference with TensorRT HOT 2
- Single word input leads to ValueError: Expected more than 1 spatial element when training, got input size torch.Size([1, 512, 1]) HOT 1
- Error on loading training model "_pickle.UnpicklingError: invalid load key, '<'"
- Custom trained model and dataset problem
- Index out of range for custom dataset.
- value error while training custom dataset
- TypeError: guvectorize() missing 1 required positional argument 'signature' HOT 1
- _pickle.UnpicklingError: invalid load key, '<'. in inference.py in colab HOT 3
- What's the filelist used to train LibriTTS2k pretrained embedding?
- Unable to train on custom data with multiple speakers HOT 6
- Which torch version to use?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flowtron.