Describe the bug Hi, When running YourTTS recipe

I could correct this by doing the following things : In LJspee

[Bug] Outputs referenced before assignment error while using YourTTS recipe about trainer HOT 2 CLOSED

coqui-ai commented on June 2, 2024

[Bug] Outputs referenced before assignment error while using YourTTS recipe

from trainer.

Comments (2)

chankl3579 commented on June 2, 2024 1

I tried to finetune the YourTTS model with my own small dataset and faced the same error as this issue.
My dataset includes 256 audio data and I made it in LJSpeech format.
Thanks for the suggested solutions above but since changing the source code is not preferable to me, I studied this problem a little bit.

Let me get straight to the point, I think the reason of this error is not about multi-speaker or single-speaker, this issue occurs when the dataset is relatively small.
I tried to train from scratch using only the LJSpeech-1.1 dataset but the error did not occur. So we can tell single-speaker format is not the problem.

Then I then made a subset of LJSpeech with only the first 1024 data and train from scratch again, the error is reproduced in this case.
From the Python log, we can see the error occurs during the evaluation stage of the training.
By default, the evaluation split proportion is 0.01. In this simulation, the size of evaluation set would be 1024*0.01=10, which is smaller than the default batch size 32.
By explicitly declaring eval_split_size=32, the problem is solved.
Furthermore, it should be aware that, when any of the training data is discarded by MAX_AUDIO_LEN_IN_SECONDS and the size of evaluation set is less than batch size, this problem will happen.

To conclude, this bug occurs when the actual size of evaluation set is less an 1x batch size. The training-evaluation split proportion, discarding of samples, and inappropriate hyperparameters (such as inconsistency between BATCH_SIZE and eval_split_max_size) may cause the problem.

from trainer.

Ca-ressemble-a-du-fake commented on June 2, 2024

I could correct this by doing the following things :

In LJspeech formatter I added speaker_name = cols[1] so that the formatter outputs the name of the speaker (stored in second column in my csv)
Brought 4 other datasets with different speakers
Used only 16kHz datasets and changed the sampling rate to 16000.
Deleted the already generated speaker embedding files.

Now the training is working. So I believe this recipe must be run against a multi speaker datasets. Single speaker dataset may not be supported.

Maybe it can work with 22kHz audio but I did not test it as I only have 16kHz multi speaker datasets and a single one in 22kHz.

from trainer.

Recommend Projects

[Bug] Outputs referenced before assignment error while using YourTTS recipe about trainer HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent