Git Product home page Git Product logo

Comments (2)

chankl3579 avatar chankl3579 commented on June 2, 2024 1

I tried to finetune the YourTTS model with my own small dataset and faced the same error as this issue.
My dataset includes 256 audio data and I made it in LJSpeech format.
Thanks for the suggested solutions above but since changing the source code is not preferable to me, I studied this problem a little bit.

Let me get straight to the point, I think the reason of this error is not about multi-speaker or single-speaker, this issue occurs when the dataset is relatively small.
I tried to train from scratch using only the LJSpeech-1.1 dataset but the error did not occur. So we can tell single-speaker format is not the problem.

Then I then made a subset of LJSpeech with only the first 1024 data and train from scratch again, the error is reproduced in this case.
From the Python log, we can see the error occurs during the evaluation stage of the training.
By default, the evaluation split proportion is 0.01. In this simulation, the size of evaluation set would be 1024*0.01=10, which is smaller than the default batch size 32.
By explicitly declaring eval_split_size=32, the problem is solved.
Furthermore, it should be aware that, when any of the training data is discarded by MAX_AUDIO_LEN_IN_SECONDS and the size of evaluation set is less than batch size, this problem will happen.

To conclude, this bug occurs when the actual size of evaluation set is less an 1x batch size. The training-evaluation split proportion, discarding of samples, and inappropriate hyperparameters (such as inconsistency between BATCH_SIZE and eval_split_max_size) may cause the problem.

from trainer.

Ca-ressemble-a-du-fake avatar Ca-ressemble-a-du-fake commented on June 2, 2024

I could correct this by doing the following things :

  • In LJspeech formatter I added speaker_name = cols[1] so that the formatter outputs the name of the speaker (stored in second column in my csv)
  • Brought 4 other datasets with different speakers
  • Used only 16kHz datasets and changed the sampling rate to 16000.
  • Deleted the already generated speaker embedding files.

Now the training is working. So I believe this recipe must be run against a multi speaker datasets. Single speaker dataset may not be supported.

Maybe it can work with 22kHz audio but I did not test it as I only have 16kHz multi speaker datasets and a single one in 22kHz.

from trainer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.