This is the recipe of Kazakh text-to-speech model based on KazakhTTS corpus.
Our code builds upon ESPnet, and requires prior installation of the framework. Please follow the installation guide and put the KazakhTTS folder inside espnet/egs2/
directory:
cd espnet/egs2
git clone https://github.com/IS2AI/Kazakh_TTS.git
Go to Kazakh_TTS/tts1 folder and create links to the dependencies:
ln -s ../../TEMPLATE/tts1/path.sh .
ln -s ../../TEMPLATE/asr1/pyscripts .
ln -s ../../TEMPLATE/asr1/scripts .
ln -s ../../../tools/kaldi/egs/wsj/s5/steps .
ln -s ../../TEMPLATE/tts1/tts.sh .
ln -s ../../../tools/kaldi/egs/wsj/s5/utils .
Download KazakhTTS dataset and untar in the directory of your choice. Specify the path to the dataset inside KazakhTTS/tts1/local/data.sh
script:
db_root=/path-to-speaker-folder
For example db_root=/home/datasets/ISSAI_KazakhTTS/M1_Iseke/
To train the models, run the script ./run.sh
inside KazakhTTS/tts1/
folder. GPU and RAM specifications can be found in the configuration (conf/
) folder.
./run.sh --stage 1 --stop_stage 6 --train_config conf/train.yaml
If you would like to train fastspeech/transformer models, change train_config=conf/train.yaml
accordingly. The detailed description of each stage are documented in ESPNet's repository.
If you want to use pretrained models, download them from the links below and unzip inside KazakhTTS/tts1/
folder.
You would also need the pre-trained vocoder to convert generated mel-spectrogram to wav. This repository used ParallelWaveGAN to train the vocoders on the same KazakhTTS corpus.
You can synthesize an arbitrary text using synthesize.py
script. Modify the following lines in the script:
## specify the path to vocoder's checkpoint, i.e
vocoder_checkpoint="exp/vocoder/checkpoint-400000steps.pkl"
## specify path to the main model(transformer/tacotron2/fastspeech) and its config file
config_file = "exp/tts_train_raw_char/config.yaml"
model_path = "exp/tts_train_raw_char/train.loss.ave_5best.pth"
Now you can run the script using an arbitrary text, for example:
python synthesize.py --text "бүгінде өңірде тағы бес жобаның құрылысы жүргізілуде."
The generated file will be saved in tts1/synthesized_wavs
folder.