Git Product home page Git Product logo

mandarin-tts's Introduction

Chinese mandarin text to speech (MTTS)

This is a modularized Text-to-speech framework aiming to support fast research and product developments. Main features include

  • all modules are configurable via yaml,
  • speaker embedding / prosody embeding/ multi-stream text embedding are supported and configurable,
  • various vocoders (VocGAN, hifi-GAN, waveglow, melGAN) are supported by adapter so that comparison across different vocoders can be done easily,
  • durations/pitch/energy variance predictor are supported, and other variances can be added easily,
  • and more on the road-map.

Contributions are welcome.

Audio samples

Checkout the demo here eg

Quick start

Install

git clone https://github.com/ranchlai/mandarin-tts.git
cd mandarin-tts
git submodule update --force --recursive --init --remote
pip install -e . f

Training

Two examples are provided here: biaobei and aishell3.

To train your own models, first make a copy from existing examples, then prepare the melspectrogram features using wav2mel.py by

cd examples
python wav2mel.py -c ./aishell3/config.yaml -w <aishell3_wav_folder> -m <mel_folder> -d cpu

prepare the scp files necessary for training,

cd examples/aishell3
python prepare.py --wav_folder <aishell3_wav_folder>  --mel_folder <mel_folder> --dst_folder ./train/

This will generate scp files required by config.yaml (in the dataset/train section). You would also need to check that everything is fine in the config file. Usually you don't need to change the code.

Now you can start your training by

cd examples/aishell3
python ../../mtts/train.py -c config.yaml -d cuda

For biaobei dataset, the workflow is the same, except that there is no speaker embedding but you can add prosody embedding.

More examples will be added. Please stay.

Synthesize

Pretrained mtts checkpoints

Currently two examples are provided, and the corresponding checkpoints/configs are summarized as follows.

dataset checkpoint config
aishell3 link link
biaobei link link

Supported vocoders

Vocoders play the role of converting melspectrograms to waveforms. They are added as submodules and will be be trained in this project. Hence you should download the checkpoints before synthesizing. In training, vocoders are not necessary, as you can monitor the training process from generated melspectrograms and also the loss curve. Current we support the following vocoders,

Vocoder checkpoint github
Waveglow link link
hifi-gan link link
VocGAN link link link
MelGAN link link

All vocoders will be ready after running git submodule update --force --recursive --init --remote. However, you have to download the checkpoint manually and properly set the path in the config.yaml file.

Preparing your input text

The input.txt should be consistent with your setting of emb_type1 to emb_type_n in config file, i.e., same type, same order.

To facilitate transcription of hanzi to pinyin, you can try:

cd examples/aishell3/
python ../../mtts/text/gp2py.py -t "为适应新的网络传播方式和读者阅读习惯"
>> sil wei4 shi4 ying4 xin1 de5 wang3 luo4 chuan2 bo1 fang1 shi4 he2 du2 zhe3 yue4 du2 xi2 guan4 sil|sil 为 适 应 新 的 网 络 传 播 方 式 和 读 者 阅 读 习 惯 sil

Not you can copy the text to input.txt, and remember to put down the self-defined name and speaker id, separated by '|'.

Synthesizing your waves

With the above checkpoints and text ready, finally you can run the synthesis process,

python ../../mtts/synthesize.py  -d cuda --c config.yaml --checkpoint ./checkpoints/checkpoint_1240000.pth.tar -i input.txt

Please check the config.yaml file for the vocoder settings.

If lucky, audio examples can be found in the output folder.

mandarin-tts's People

Contributors

ranchlai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.