Hi there, thanks for your excellent work and sharing. I'm training a non-English m

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Can't attend in the second flow (non-english dataset) about flowtron HOT 25 CLOSED

nvidia commented on May 27, 2024

Can't attend in the second flow (non-english dataset)

from flowtron.

Comments (25)

jhjungCode commented on May 27, 2024 2

I think warmstart has a error. it havn't a parameter for include_layers.
_________________________________________________original
if warmstart_checkpoint_path != "":
model = warmstart(warmstart_checkpoint_path, model)

_________________________________________________currect
if warmstart_checkpoint_path != "":
model = warmstart(warmstart_checkpoint_path, model, include_layers)

from flowtron.

blx0102 commented on May 27, 2024 1

@altsoph Yes my issue has been solved. I retrained a better taco2 model for warmstart, which has no bias in the attention plot. For English training, maybe you could try to warmstart with the official pretrained taco2 model.

from flowtron.

rafaelvalle commented on May 27, 2024 1

Make sure you're training the steps of flow progressively, that is, train a Flowtron with 1 step of flow until it's able to attend to the text. Once this happens, use this model to warmstart a Flowtron with 2 steps of flow and so on.

from flowtron.

rafaelvalle commented on May 27, 2024 1

@altsoph When you warmstart_checkpoint_path from the Flowtron with 1 step of flow that is able to attend, make sure you set include_layers to null otherwise it won't load all weights from the pre-trained model

from flowtron.

KingStorm commented on May 27, 2024

Same issue here, I used on non-English speaker dataset > 10h, and used the fixed token/speaker embedding from multi-speaker tacotron2, and start training with n_flow=1. (I did not reuse the text encoder from tacotron2, probably I should also apply this strategy). The model is able to attend for the first flow, but cannot learn the second flow.

from flowtron.

rafaelvalle commented on May 27, 2024

@blx0102 when you started training the 2nd step of flow, did you set include_layers to null?
do your attention always have that bias on the last token or is it something that shows up over time?
is this model with speaker embeddings?

from flowtron.

blx0102 commented on May 27, 2024

@rafaelvalle Thanks for your reply.

Yes, I did set include_layers to None while started the 2nd step of flow
Attention almost always has the bias on the last token. Bias can be seen from about 50k-steps
Yes, I trained with 2 speakers, and embedding dim remains 128

from flowtron.

youmebangbang commented on May 27, 2024

I too am experiencing the same issue. I began training from the nvidia model with my own 20hr single speaker dataset. I ignored the speaker layer. Trained to 1.3mil and attention is good and inference sounds decent. I used 1e-4 then 1e-5 learning rates. Then I tried on flow 2 with warm start from previous run checkpoint, include layers to none, with both 1e-4 and 1e-5 rates and still same horizontal lines in attention that eventually turn most blank screen.

from flowtron.

rafaelvalle commented on May 27, 2024

@jhjungCode nice find! this does not solve their issue but suggests that they were using include_layers=None.

from flowtron.

rafaelvalle commented on May 27, 2024

@blx0102 @youmebangbang please try resuming from a 1 step of flow checkpoint that does not have a bias on the attention maps.
@youmebangbang can you show similar plots to the ones we have here?

from flowtron.

youmebangbang commented on May 27, 2024

@rafaelvalle I apologize I do not understand, how does one resume checkpoint without bias on the attention?

from flowtron.

rafaelvalle commented on May 27, 2024

Do you have a checkpoint in which the model has good attention in no bias?

from flowtron.

altsoph commented on May 27, 2024

@blx0102 BTW, were you able to solve this issue? I'm experiencing the same kind of problems (even on the original LJS data)

from flowtron.

altsoph commented on May 27, 2024

@blx0102 okay, thanks! I'm going to train it for Russian, but decided to check how it works from scratch on English first. It didn't so I suspected there is some trick :)

from flowtron.

altsoph commented on May 27, 2024

@rafaelvalle I trained a 1-flow model to attend without problems, but still can't recover the attention after warmstart with 2 flows. I'll keep trying, thanks :)

from flowtron.

tunnermann commented on May 27, 2024

Hello, first, thanks for your work. As I really like waveglow I'm excited about flowtron. I'm having a similar problem while training flowtron in my brazilian portuguese dataset. For 1 flow I can get decent attention alignments, with quite good synthesized speech. Here is one allignment map:

When I try to use it to warmstart a 2 flow model the attention on the first flow gets worse over time and the attention map for the second flow is just noise.

I'm setting include_layers to null, using 3 speakers with 10 hours each, tried with and without speaker embeddings, I'm not using phoneme representations, so I set p_arpabet to 0.0.

So, do you have some suggestions of steps I might take to train the model?

Another question I have is about batch_size, I see people using only batch_size of 1, any reason for that? Should I try different batch sizes?

from flowtron.

rafaelvalle commented on May 27, 2024

@tunnermann can you share the config.json file you're using to train the 2nd flow?

from flowtron.

tunnermann commented on May 27, 2024

Thanks for your reply, here it is:

{

"train_config": {
    "output_directory": "outdir",
    "epochs": 10000000,
    "learning_rate": 1e-4,
    "weight_decay": 1e-6,
    "sigma": 1.0,
    "iters_per_checkpoint": 10000,
    "batch_size": 1,
    "seed": 1234,
    "checkpoint_path": "",
    "ignore_layers": [],
    "include_layers": [],
    "warmstart_checkpoint_path": "models/model_1_flow_3_speakers_2",
    "with_tensorboard": true,
    "fp16_run": true 
},

"data_config": {
    "training_files": "../dataset/metadata_flowtron.csv", 
    "validation_files": "../dataset/JC/metadata_flowtron_val.csv",
    "text_cleaners": ["basic_cleaners"],
    "p_arpabet": 0.0,
    "cmudict_path": "data/cmudict_dictionary",
    "sampling_rate": 22050,
    "filter_length": 1024,
    "hop_length": 256,
    "win_length": 1024,
    "mel_fmin": 0.0,
    "mel_fmax": 8000.0,
    "max_wav_value": 32768.0
},

"dist_config": {
    "dist_backend": "nccl",
    "dist_url": "tcp://localhost:54321"
},

"model_config": {
    "n_speakers": 3,
    "n_speaker_dim": 128,
    "n_text": 185,
    "n_text_dim": 512,
    "n_flows": 2,
    "n_mel_channels": 80,
    "n_attn_channels": 640,
    "n_hidden": 1024,
    "n_lstm_layers": 2,
    "mel_encoder_n_hidden": 512,
    "n_components": 0,
    "mean_scale": 0.0,
    "fixed_gaussian": true,
    "dummy_speaker_embedding": false,
    "use_gate_layer": true
}

}

from flowtron.

rafaelvalle commented on May 27, 2024

@tunnermann Can you share the loss curves for 1 step of flow and 2 steps flow?
Is there a moment in which your 1 step of flow model does not attends properly to the text but without attending to the first token like your attention plot shows?

from flowtron.

tunnermann commented on May 27, 2024

Sure, here are the loss curves:
1-Flow:

2-Flow(I lost the one with more steps, but it looks similar):

No, I can't find any 1 step model that does not attend to the first token.

Another thing I noticed now is that I may be doing something wrong in inference with 2 flows. In tensorboard images the attention in attention_weights_0 is still good(attention_weights_1 is noise), but when I do inference it is not, like in the image bellow(sid1_sigma05_attnlayer1.png):

from flowtron.

rafaelvalle commented on May 27, 2024

What about the validation loss?

from flowtron.

tunnermann commented on May 27, 2024

Here you go, 1-flow:

2-flow:

Inference keeps getting worse the more I train, even though the image of the attention in tensorboard does not change much.

from flowtron.

rafaelvalle commented on May 27, 2024

Keep training while the validation loss is going down.

from flowtron.

tunnermann commented on May 27, 2024

Thanks for your help, just knowing there is nothing obviously wrong with what I'm doing already clear things for me. I will continue with this training as you suggested, then I will try generating a 1-flow model without the first token attention problem, and if it is not solved yet I will try using phonemes to help with learning attention.

from flowtron.

rafaelvalle commented on May 27, 2024

Yes, the attention map on the first flow should remain correct while the attention map on the second flow improves and the validation loss goes down. Let us know when you're able to get good results.

from flowtron.

Can't attend in the second flow (non-english dataset) about flowtron HOT 25 CLOSED

Comments (25)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent