Git Product home page Git Product logo

non-english-tacotron-2-training-notebook's Introduction

Non-English-Tacotron-2-Training-Notebook

Tacotron 2 training notebook supporting Japanese, French, and Mandarin

Overview

This notebook is meant to provide easier access to training Tacotron 2 models in languages other than English. Currently, Japanese (TALQu and neuTalk phonetics), French, and Mandarin pretrained models are included, but the plan is to include more in the future, such as German. For Japanese, it is recommended to use the neuTalk phonetics and pretrained model.

Supported audio

Audio for training should be 16-bit 22050hz mono WAV files. Do not include spaces in filenames. Files should only include alphanumerics (half-width), dashes, and underscores. This means no Japanese or Chinese filenames, or diacritics. Audio clips should be 10 seconds or less to facilitate learning. Based on my tests, I recommend having at least 15 minutes of audio.

Transcriptions

The transcription file should be a text document with each line having the following format: wavs/{name_of_file}.wav|{text}. Use one of the included G2Ps to convert the transcription to the appropriate phonetic input.

Training

The steps in the notebook should be rather self-explanitory, I hope. Upload your audio into the wavs/ folder before starting training. Here are some notes to keep in mind:

  • Batch size should ideally be a factor of the amount of wavs you have. For example, when training a model with 15 wavs I set the batch size to 5.
  • If you have the T4 GPU on Colab, do not set the batch size higher than 14.
  • Output directory for training should be in Google Drive in case you get disconnected.
  • As you train, checkpoints will build up. Delete old ones and empty trash to keep your drive storage available.
  • Stop training when you get to an appropriate validation loss. For example, what I do is: less than 30 files = under 0.07; 30-100 files = under 0.09; 150+ files = under 0.1; more than 30 minutes of data = under 0.14

Attributions

non-english-tacotron-2-training-notebook's People

Contributors

mildemelwe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

non-english-tacotron-2-training-notebook's Issues

cannot download french pretrain model

!gdown -O pretrained_model 1--lPwGhqFkqFZrd04Qhm90ndrepXifCf
Failed to retrieve file url:

Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
Check FAQ in https://github.com/wkentaro/gdown?tab=readme-ov-file#faq.

You may still be able to access the file from the browser:

https://drive.google.com/uc?id=1--lPwGhqFkqFZrd04Qhm90ndrepXifCf

but Gdown can't. Please check connections and permissions.

Some kind of colab update numpy? now crashes

Some kind of colab update numpy? now crashes
I tried various things, but I'm not a programmer, so I don't know what the error is or how to fix it.
Could you please fix it?

colabの何らかのアップデートおそらくnumpy? のせいでクラッシュするようになりました。
私も色々試しましたが、プログラマーでは無いので、どうもエラー内容や、フィックスの仕方がわかりません。
どうか、直していただけないでしょうか。
XS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.