Comments (10)
Hi,
I'm currently working on a short tutorial, but it's going to take some time.
First of all, you need to install numpy
, librosa
, scipy
and dynet
.
DyNET has to be installed with CUDA support, which means that you can't go for the PIP package. You need to follow the instructions here: http://dynet.readthedocs.io/en/latest/python.html
Then you need to follow these steps:
- Preprocess your speech corpus. Just create two folders, one for train and one for dev. Put your files there and make sure that you have pairs of TXT and WAV file (with the same base name)
Like speech_001.wav
and speech_001.txt
-
Run a corpus inport:
python cube/trainer.py --phase=1 --train-folder=<your train folder> --dev-folder=<your dev folder>
-
Train the vocoder:
python cube/trainer.py --phase=2 --autobatch --set-mem=7000 --use-gpu --batch-size=4000
This will train the vocoder and output vocoded dev files every 200 timesteps
For 2 hours of input data it takes about 19 epochs to converge (1-2 weeks - one epoch takes about one day on a GTX 1080 and half a day on a GTX 1080 TI)
- Finally train the encoder:
python cube/trainer.py --phase=3 --autobatch --set-mem=7000 --use-gpu --batch-size=4000
It is much faster than the vocoder (1 epoch takes 1 hour for 2 hours of speech)
There is no correlation between NLL and the quality of the vocoded speech, so, from time to time, just listen to the samples and create a copy of your model (data/models/rnn.network
)
Also, I'm still working on providing runtime for TTS, so right now, if you want to test the e2e system, you need to play with copying files from one folder to another.
The runtime update should come in shortly. I'm just finishing up some work with multispeaker training.
Let me know if you need any help.
Best,
Tibi
from tts-cube.
By the way, the whole code is python2. If you use python3, you'll have to change some things in the dataset reader. UTF-8 conversion will no longer be required, so you just have to remove .encode('utf-8')
and .decode('utf-8')
from everywhere
from tts-cube.
@tiberiu44 Thanks for sharing this repo , any pretrained model could be shared? And what about the performance(both rhythm and speed on generate wavs) on the unseen texts synthesized
from tts-cube.
@butterl
The examples on the website (https://tiberiu44.github.io/TTS-Cube/) are obtained on unseen data. For 14 seconds of audio, it takes about 3 seconds to generate the spectogram and another 110 seconds for the audio.
There is already a pretrained vocoder (encoder still not good enough) on the repo, but it is for a single voice (Romanian). If you try to convert a spectogram computed from another voice it does not sound ok (the vocoder sometimes just ignores portions of the speech).
I'm training an English model based on LS Speech Dataset (https://keithito.com/LJ-Speech-Dataset/) right now, but it takes a couple of weeks to get good vocoding performance. Also, someone else is handling models for Romanian and training a multispeaker model based on SWARA (https://speech.utcluj.ro/swarasc/).
I will share the model for English and the single speaker model I made (i'm still working on getting the encoder generate better results). I think the models based on SWARA will also be available.
from tts-cube.
Hey @tiberiu44, nice work! Any updates on the tutorial you mentioned?
I would also be interested in helping to extend the number of pre-trained models in different languages.
from tts-cube.
Hi @erikvdplas,
It will be up later next week. Thanks for wanting to contribute.
Best,
Tibi
from tts-cube.
Hi @erikvdplas
I've just commented on #3 with some updates on the documentation. If anyone could try the first installation steps and let me know if everything works, it would be really great.
Also, @butterl, I've added a couple of pre-trained models. I will include usage instructions in the following updates on the documentation branch.
Thank you
from tts-cube.
Hi,
I've just added tutorials for installation and training.
from tts-cube.
Where is I can find pretrained model?
from tts-cube.
Hi @mrgloom ,
I've just updated the repository with pretrained models. The Vocoder is generic, but the encoder is just for Romanian. If you want to test end-to-end, you need to copy the files from data/models/ro
to data/models
and unzip rnn_encoder.network.bz2
:
After you pull, in the TTS-Cube folder do:
cp data/models/ro/* data/models/
bunzip2 data/models/rnn_encoder.network.bz2
echo "Acesta este un test." > test.txt
python3 cube/synthesis.py --input-file=test.txt --output-file=test.wav --speaker=anca
This will generate two files: test.wav and test.wav.png (which is the spectral representation).
Let me know if it works.
Best,
Tibi
from tts-cube.
Related Issues (20)
- How to use the G2P model HOT 1
- What is BeeCoder? HOT 5
- Negative loss when training step2 HOT 19
- Pretrained text encoder for included IAF model HOT 2
- English model and hardware requirements HOT 26
- Training times HOT 1
- Is there any interest in providing a model trained in Brazilian Portuguese? HOT 2
- Integration with LPCNET HOT 1
- Demo on Colab, possible improvements? HOT 11
- what is the present inference for generating 10sec audio using vocoder? HOT 2
- how to synthesize from melspectogram directly without using encoder HOT 3
- some words are missing during synthesizing HOT 5
- what should the development set's content be in a speech dataset and g2p? HOT 5
- colab notebook missing command to enter the github folder
- How to train non-English?
- How can I synthesize my own text to speech?
- cum folosesc vocea in romana?
- melgan vocoder is fast, let's integrate it? HOT 1
- Install fails on ARCH during pip command HOT 1
- Fine-tuning/Speaker adaptation HOT 15
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tts-cube.