Comments (1)
Hi @roodrallec ,
Glad to hear you find this project useful.
A) Currently, we don't use the G2P model in any way. I found it useful to rather rely on the encoder to learn phonetic transcriptions, because it reflects the distribution of words that are actually used in conversations. Training on CMU (or other lexicons), just adds noise, because the examples in this lexicons don't reflect the actual occurrences of words. Also, there are a lot of exceptions where the pronunciation of words (in terms of phonetics and accent) depends on context.
B) Training the vocoder takes about a month (2 weeks for Wavenet and another 2 weeks for ParallelWavenet)
C) Training the encoder took me three weeks.
I suggest you just start by training the encoder and use the pretrained models for Vocoding. They are multi-speaker and it is likely that you won't have to training anything else. Just one important thing: the models are at 24Khz, so it is required that your data is also sampled at least 24Khz. TTS-Cube will downsample them automatically if it needs to.
You can try, every now and then, to synthesize new samples using the pre-trained models and your vocoder. Just stop when you feel that the results are good enough. As an observation: the encoder model will stay a lot of time in a state where it generated muffled sounds. After a while, the results will improve a lot.
Yes, it is fast enough for real-time TTS, given that you have a GPU. I got good results (for faster then realtime synthesis runtime) on a GTX 1060.
from tts-cube.
Related Issues (20)
- How to use the G2P model HOT 1
- What is BeeCoder? HOT 5
- Negative loss when training step2 HOT 19
- Pretrained text encoder for included IAF model HOT 2
- English model and hardware requirements HOT 26
- Is there any interest in providing a model trained in Brazilian Portuguese? HOT 2
- Integration with LPCNET HOT 1
- Demo on Colab, possible improvements? HOT 11
- what is the present inference for generating 10sec audio using vocoder? HOT 2
- how to synthesize from melspectogram directly without using encoder HOT 3
- some words are missing during synthesizing HOT 5
- what should the development set's content be in a speech dataset and g2p? HOT 5
- colab notebook missing command to enter the github folder
- How to train non-English?
- How can I synthesize my own text to speech?
- cum folosesc vocea in romana?
- melgan vocoder is fast, let's integrate it? HOT 1
- Install fails on ARCH during pip command HOT 1
- Fine-tuning/Speaker adaptation HOT 15
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tts-cube.