Git Product home page Git Product logo

chainer-ettts's Introduction

chainer-ETTTS

This is an implementation of Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention with chainer.

It can train a speech synthesis model with single GTX 1080Ti in a day.

python 3.5.4 + chainer 3.2.0

Demo

Installing dependencies

$ pip3 install -r requirements.txt

Usage

First, you have to prepare dataset. If you want to use the LJSpeech dataset, you can use the following commands.

$ wget http://data.keithito.com/data/speech/LJSpeech-1.0.tar.bz2
$ tar xvf LJSpeech-1.0.tar.bz2
$ python3 ./preprocess_LJSpeech.py ./LJSpeech1.0 ./LJSpeech_data

If you want to use the JSUT dataset, you can use the following commands.

$ wget http://ss-takashi.sakura.ne.jp/corpus/jsut_ver1.1.zip
$ unzip jsut_ver1.1
$ python3 ./preprocess_JSUT.py ./jsut_ver1.1/basic5000 ./JSUT_data

If you want to use original dataset, you can see ./sample_data.

Text2Mel

You can train Text2Mel model with ./Text2Mel/train.py. If you want to train with the LJSpeech dataset, you can use following commands.

$ cd Text2Mel
$ python3 ./train.py --gpu 0 --data ../LJSpeech_data

If you want to use another dataset, you can replace ../LJSpeech_data with the corresponding directory (e.g. ../JSUT_data, ../sample_data).

You can also download pretrained models.

$ wget https://www.dropbox.com/s/6wfisuis3t1q2wr/LJSpeech_text2mel_model
$ wget https://www.dropbox.com/s/wghw567rzolc6oh/JSUT_text2mel_model

The first is the LJSpeech model, which can generate woman English speech. The LJSpeech model is provided under the public domain.

The second is the JSUT model, which can generate woman Japanese speech. The JSUT model is licensed with the CC-BY-SA 4.0.

SSRN

You can train SSRN model with ./SSRN/train.py. If you want to train with LJSpeech dataset, you can use following commands.

$ cd SSRN
$ python3 ./train.py --gpu 0 --data ../LJSpeech_data

If you want to use another dataset, you can replace ../LJSpeech_data with the corresponding directory (e.g. ../JSUT_data, ../sample_data).

You can also download a pretrained model.

$ wget https://www.dropbox.com/s/5kcw54b5l9n5r1q/SSRN_model

This is the LJSpeech model, which can synthesis not only English speech but also Japanese speech etc. The LJSpeech model is provided under the public domain.

Speech synthesis

$ cd Text2Mel
$ python3 ./generate.py --text "no event good life" --model ./results/model

First, you can make mel spectrogram with Text2Mel model. These commands generates mel spectrogram in ./Text2Mel/results/res.npy.

The input text should be written in small capital roman alphabet. If you want to use Japanese characters, use --ja option.

$ cd ../SSRN
$ python3 ./generate.py --mel ../Text2Mel/results/res.npy --model ./results/model

Then, you can make wavefile with SSRN model. These commands generates audio file in ./SSRN/res.wav.

If you want to use pretrained model or other models, you can replace ./results/model with the corresponding file (e.g. ./LJSpeech_text2mel_model, ./SSRN_model)

Bibliography

[1] Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. In ICASSP 2018.

the original paper

[2] Keith Ito. The LJ Speech Dataset.

the English speech dataset

[3] Ryosuke Sonobe, Shinnosuke Takamichi and Hiroshi Saruwatari, "JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis," arXiv preprint, 1711.00354, 2017.

the Japanese speech dataset

[4] http://r9y9.github.io/blog/2017/11/23/dctts/

the blog post about the paper (Japanese)

[5] http://joisino.hatenablog.com/entry/2017/12/24/000000

the blog post related to this repository (Japanese)

chainer-ettts's People

Contributors

joisino avatar

Watchers

James Cloos avatar Ewald Enzinger avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.