Git Product home page Git Product logo

Comments (25)

jhjungCode avatar jhjungCode commented on May 27, 2024 2

I think warmstart has a error. it havn't a parameter for include_layers.
_________________________________________________original
if warmstart_checkpoint_path != "":
model = warmstart(warmstart_checkpoint_path, model)


_________________________________________________currect
if warmstart_checkpoint_path != "":
model = warmstart(warmstart_checkpoint_path, model, include_layers)


from flowtron.

blx0102 avatar blx0102 commented on May 27, 2024 1

@altsoph Yes my issue has been solved. I retrained a better taco2 model for warmstart, which has no bias in the attention plot. For English training, maybe you could try to warmstart with the official pretrained taco2 model.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024 1

Make sure you're training the steps of flow progressively, that is, train a Flowtron with 1 step of flow until it's able to attend to the text. Once this happens, use this model to warmstart a Flowtron with 2 steps of flow and so on.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024 1

@altsoph When you warmstart_checkpoint_path from the Flowtron with 1 step of flow that is able to attend, make sure you set include_layers to null otherwise it won't load all weights from the pre-trained model

from flowtron.

KingStorm avatar KingStorm commented on May 27, 2024

Same issue here, I used on non-English speaker dataset > 10h, and used the fixed token/speaker embedding from multi-speaker tacotron2, and start training with n_flow=1. (I did not reuse the text encoder from tacotron2, probably I should also apply this strategy). The model is able to attend for the first flow, but cannot learn the second flow.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024

@blx0102 when you started training the 2nd step of flow, did you set include_layers to null?
do your attention always have that bias on the last token or is it something that shows up over time?
is this model with speaker embeddings?

from flowtron.

blx0102 avatar blx0102 commented on May 27, 2024

@rafaelvalle Thanks for your reply.

  1. Yes, I did set include_layers to None while started the 2nd step of flow
  2. Attention almost always has the bias on the last token. Bias can be seen from about 50k-steps
  3. Yes, I trained with 2 speakers, and embedding dim remains 128

from flowtron.

youmebangbang avatar youmebangbang commented on May 27, 2024

I too am experiencing the same issue. I began training from the nvidia model with my own 20hr single speaker dataset. I ignored the speaker layer. Trained to 1.3mil and attention is good and inference sounds decent. I used 1e-4 then 1e-5 learning rates. Then I tried on flow 2 with warm start from previous run checkpoint, include layers to none, with both 1e-4 and 1e-5 rates and still same horizontal lines in attention that eventually turn most blank screen.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024

@jhjungCode nice find! this does not solve their issue but suggests that they were using include_layers=None.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024

@blx0102 @youmebangbang please try resuming from a 1 step of flow checkpoint that does not have a bias on the attention maps.
@youmebangbang can you show similar plots to the ones we have here?

from flowtron.

youmebangbang avatar youmebangbang commented on May 27, 2024

@rafaelvalle I apologize I do not understand, how does one resume checkpoint without bias on the attention?

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024

Do you have a checkpoint in which the model has good attention in no bias?

from flowtron.

altsoph avatar altsoph commented on May 27, 2024

@blx0102 BTW, were you able to solve this issue? I'm experiencing the same kind of problems (even on the original LJS data)

from flowtron.

altsoph avatar altsoph commented on May 27, 2024

@blx0102 okay, thanks! I'm going to train it for Russian, but decided to check how it works from scratch on English first. It didn't so I suspected there is some trick :)

from flowtron.

altsoph avatar altsoph commented on May 27, 2024

@rafaelvalle I trained a 1-flow model to attend without problems, but still can't recover the attention after warmstart with 2 flows. I'll keep trying, thanks :)

from flowtron.

tunnermann avatar tunnermann commented on May 27, 2024

Hello, first, thanks for your work. As I really like waveglow I'm excited about flowtron. I'm having a similar problem while training flowtron in my brazilian portuguese dataset. For 1 flow I can get decent attention alignments, with quite good synthesized speech. Here is one allignment map:

sid2_sigma0 5_attnlayer0

When I try to use it to warmstart a 2 flow model the attention on the first flow gets worse over time and the attention map for the second flow is just noise.

I'm setting include_layers to null, using 3 speakers with 10 hours each, tried with and without speaker embeddings, I'm not using phoneme representations, so I set p_arpabet to 0.0.

So, do you have some suggestions of steps I might take to train the model?

Another question I have is about batch_size, I see people using only batch_size of 1, any reason for that? Should I try different batch sizes?

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024

@tunnermann can you share the config.json file you're using to train the 2nd flow?

from flowtron.

tunnermann avatar tunnermann commented on May 27, 2024

Thanks for your reply, here it is:

{

"train_config": {
    "output_directory": "outdir",
    "epochs": 10000000,
    "learning_rate": 1e-4,
    "weight_decay": 1e-6,
    "sigma": 1.0,
    "iters_per_checkpoint": 10000,
    "batch_size": 1,
    "seed": 1234,
    "checkpoint_path": "",
    "ignore_layers": [],
    "include_layers": [],
    "warmstart_checkpoint_path": "models/model_1_flow_3_speakers_2",
    "with_tensorboard": true,
    "fp16_run": true 
},

"data_config": {
    "training_files": "../dataset/metadata_flowtron.csv", 
    "validation_files": "../dataset/JC/metadata_flowtron_val.csv",
    "text_cleaners": ["basic_cleaners"],
    "p_arpabet": 0.0,
    "cmudict_path": "data/cmudict_dictionary",
    "sampling_rate": 22050,
    "filter_length": 1024,
    "hop_length": 256,
    "win_length": 1024,
    "mel_fmin": 0.0,
    "mel_fmax": 8000.0,
    "max_wav_value": 32768.0
},

"dist_config": {
    "dist_backend": "nccl",
    "dist_url": "tcp://localhost:54321"
},

"model_config": {
    "n_speakers": 3,
    "n_speaker_dim": 128,
    "n_text": 185,
    "n_text_dim": 512,
    "n_flows": 2,
    "n_mel_channels": 80,
    "n_attn_channels": 640,
    "n_hidden": 1024,
    "n_lstm_layers": 2,
    "mel_encoder_n_hidden": 512,
    "n_components": 0,
    "mean_scale": 0.0,
    "fixed_gaussian": true,
    "dummy_speaker_embedding": false,
    "use_gate_layer": true
} 

}

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024

@tunnermann Can you share the loss curves for 1 step of flow and 2 steps flow?
Is there a moment in which your 1 step of flow model does not attends properly to the text but without attending to the first token like your attention plot shows?

from flowtron.

tunnermann avatar tunnermann commented on May 27, 2024

Sure, here are the loss curves:
1-Flow:
Captura de tela de 2020-07-23 04-32-59

2-Flow(I lost the one with more steps, but it looks similar):
Captura de tela de 2020-07-23 04-52-08

No, I can't find any 1 step model that does not attend to the first token.

Another thing I noticed now is that I may be doing something wrong in inference with 2 flows. In tensorboard images the attention in attention_weights_0 is still good(attention_weights_1 is noise), but when I do inference it is not, like in the image bellow(sid1_sigma05_attnlayer1.png):
sid1_sigma0 5_attnlayer1

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024

What about the validation loss?

from flowtron.

tunnermann avatar tunnermann commented on May 27, 2024

Here you go, 1-flow:

Captura de tela de 2020-07-23 11-07-25

2-flow:

Captura de tela de 2020-07-23 11-02-22

Inference keeps getting worse the more I train, even though the image of the attention in tensorboard does not change much.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024

Keep training while the validation loss is going down.

from flowtron.

tunnermann avatar tunnermann commented on May 27, 2024

Thanks for your help, just knowing there is nothing obviously wrong with what I'm doing already clear things for me. I will continue with this training as you suggested, then I will try generating a 1-flow model without the first token attention problem, and if it is not solved yet I will try using phonemes to help with learning attention.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 27, 2024

Yes, the attention map on the first flow should remain correct while the attention map on the second flow improves and the validation loss goes down. Let us know when you're able to get good results.

from flowtron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.