tiberiu44 / tts-cube Goto Github PK

View Code? Open in Web Editor NEW

224.0 20.0 47.0 771.09 MB

End-2-end speech synthesis with recurrent neural networks

Home Page: https://tiberiu44.github.io/TTS-Cube/

License: Apache License 2.0

Python 100.00%

speech synthesis text-to-speech end-2-end neural lstm character phoneme neural-network long-short-term-memory

tts-cube's People

Contributors

Stargazers

Watchers

tts-cube's Issues

How to train non-English?

I want to try training Thai, do I need to change the code?

audio samples on English dataset

Hello,

Thank you for the wonderful repository.

I read that you're currently training on LJSpeech dataset for english TTS.

Do you have any updates on audio samples?

Also would you be able to provide some rough training stats (number of GPU used, hours need per pass through data, etc).

Thanks again for the awesome repository and open-source effort.

phase 2 saves model at 'data/models/rnn_vocoder' but resumes from 'data/models/rnn'

Thank you for this awesome work.

When I am trying to resume my training phase 2, an error occurs that
RuntimeError: Could not read model from data/models/rnn.network

I check the code and found that the Trainer store the model at data/models/rnn_vocoder (code here) and resumes the model from data/models/rnn (code here)

Negative loss when training step2

Here is my output of training step2

/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Found 4995 training files and 5 development files
	Rendering devset
		1/5 processing file data/processed/dev/0000001 
		2/5 processing file data/processed/dev/0000002 
		3/5 processing file data/processed/dev/0000003 
		4/5 processing file data/processed/dev/0000004 
		5/5 processing file data/processed/dev/0000005 

Starting epoch 1
Shuffling training data
	1/4995 processing file data/processed/train/0000007
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.29it/s]
 avg loss=0.9230358004570007 execution time=0.8138909339904785
	2/4995 processing file data/processed/train/0000008
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.47it/s]
 avg loss=0.8607208132743835 execution time=0.714789867401123
...
avg loss=0.1945137083530426 execution time=0.626471757888794
	17/4995 processing file data/processed/train/0000022
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.46it/s]
 avg loss=0.0572626106441021 execution time=0.7513647079467773
	18/4995 processing file data/processed/train/0000023
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.68it/s]
 avg loss=-0.061442214995622635 execution time=0.6261122226715088
	19/4995 processing file data/processed/train/0000024
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.47it/s]
 avg loss=-0.18586862087249756 execution time=0.7162132263183594
	20/4995 processing file data/processed/train/0000025
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.68it/s]
 avg loss=0.06383810192346573 execution time=0.6265075206756592
	21/4995 processing file data/processed/train/0000026
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.67it/s]
 avg loss=-0.20782051980495453 execution time=0.628434419631958
	22/4995 processing file data/processed/train/0000027
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.46it/s]
 avg loss=-0.31225016713142395 execution time=0.7171187400817871
	23/4995 processing file data/processed/train/0000028
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.30it/s]
 avg loss=-0.5820147395133972 execution time=0.8073093891143799
	24/4995 processing file data/processed/train/0000029
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.68it/s]
 avg loss=-0.46214190125465393 execution time=0.6245768070220947
	25/4995 processing file data/processed/train/0000030
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.46it/s]
 avg loss=-0.7601555585861206 execution time=0.720339298248291
	26/4995 processing file data/processed/train/0000031

how to synthesize from melspectogram directly without using encoder

can we synthesize directly from melspectrogram without using encoder

some words are missing during synthesizing

I have trained an encoder on custom data in telugu language for about 4 days but during inference some words are not synthesized and the audio is skipping those words any do you suggest any hyperparameters adjustment or something else to make the synthesizer work correctly. I am using the ljspeech vocoder. and trained it on the last version of the repo before the new g2p pull. and the loss is around 1.8 to 2.3 it remained in that range for the past 20hrs.
Thank you

Integration with LPCNET

Any idea What needs to be done to replace vocoder from wavernn to lpcnet?

Install fails on ARCH during pip command

similar issue happens with Tortoise TTS installl...

My first attempt i may of installed it in a unusual directory from a previos app install... and it got a similar error...

[negatron@Negatron]-[~]

git clone https://github.com/tiberiu44/TTS-Cube.git
Cloning into 'TTS-Cube'...
remote: Enumerating objects: 2345, done.
remote: Counting objects: 100% (958/958), done.
remote: Compressing objects: 100% (286/286), done.
remote: Total 2345 (delta 723), reused 897 (delta 672), pack-reused 1387
Receiving objects: 100% (2345/2345), 697.70 MiB | 1.30 MiB/s, done.
Resolving deltas: 100% (1553/1553), done.
Updating files: 100% (258/258), done.
[negatron@Negatron]-[~]
cd TTS-Cube
pip3 install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting numpy==1.15.0
Using cached numpy-1.15.0.zip (4.5 MB)
Preparing metadata (setup.py) ... done
Collecting librosa==0.6.1
Using cached librosa-0.6.1.tar.gz (1.6 MB)
Preparing metadata (setup.py) ... done
Collecting scipy==1.1.0
Using cached scipy-1.1.0.tar.gz (15.6 MB)
Preparing metadata (setup.py) ... done
Collecting pysptk==0.1.11
Using cached pysptk-0.1.11.tar.gz (402 kB)
Preparing metadata (setup.py) ... done
Collecting Cython==0.27.3
Using cached Cython-0.27.3.tar.gz (1.8 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [50 lines of output]
Unable to find pgen, not compiling formal grammar.
running egg_info
creating /tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info
writing /tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info/dependency_links.txt
writing entry points to /tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info/entry_points.txt
writing top-level names to /tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info/SOURCES.txt'
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-f_dvzi41/cython_59458e29aa504995a3935636c2cacd7a/setup.py", line 229, in
setup(
File "/usr/lib/python3.10/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 308, in run
self.find_sources()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 316, in find_sources
mm.run()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 560, in run
self.add_defaults()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 597, in add_defaults
sdist.add_defaults(self)
File "/usr/lib/python3.10/site-packages/setuptools/command/sdist.py", line 106, in add_defaults
super().add_defaults()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 252, in add_defaults
self._add_defaults_ext()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 336, in _add_defaults_ext
build_ext = self.get_finalized_command('build_ext')
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 305, in get_finalized_command
cmd_obj.ensure_finalized()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 111, in ensure_finalized
self.finalize_options()
File "/tmp/pip-install-f_dvzi41/cython_59458e29aa504995a3935636c2cacd7a/Cython/Distutils/build_ext.py", line 18, in finalize_options
self.distribution.ext_modules[:] = cythonize(
File "/tmp/pip-install-f_dvzi41/cython_59458e29aa504995a3935636c2cacd7a/Cython/Build/Dependencies.py", line 913, in cythonize
module_list, module_metadata = create_extension_list(
File "/tmp/pip-install-f_dvzi41/cython_59458e29aa504995a3935636c2cacd7a/Cython/Build/Dependencies.py", line 742, in create_extension_list
elif isinstance(patterns, basestring) or not isinstance(patterns, collections.Iterable):
AttributeError: module 'collections' has no attribute 'Iterable'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
[negatron@Negatron]-[~/TTS-Cube]

cd TTS-Cube
sudo pip3 install -r requirements.txt
cd: no such file or directory: TTS-Cube
[sudo] password for negatron:
Collecting numpy==1.15.0
Downloading numpy-1.15.0.zip (4.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.5/4.5 MB 1.3 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting librosa==0.6.1
Downloading librosa-0.6.1.tar.gz (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 527.5 kB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting scipy==1.1.0
Downloading scipy-1.1.0.tar.gz (15.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.6/15.6 MB 1.3 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting pysptk==0.1.11
Downloading pysptk-0.1.11.tar.gz (402 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 402.5/402.5 kB 253.1 kB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting Cython==0.27.3
Downloading Cython-0.27.3.tar.gz (1.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 1.5 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [50 lines of output]
Unable to find pgen, not compiling formal grammar.
running egg_info
creating /tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info
writing /tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info/dependency_links.txt
writing entry points to /tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info/entry_points.txt
writing top-level names to /tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info/SOURCES.txt'
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-3w_rwq84/cython_2213ce863fd24e27ae02ae2fbe17f21d/setup.py", line 229, in
setup(
File "/usr/lib/python3.10/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 308, in run
self.find_sources()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 316, in find_sources
mm.run()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 560, in run
self.add_defaults()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 597, in add_defaults
sdist.add_defaults(self)
File "/usr/lib/python3.10/site-packages/setuptools/command/sdist.py", line 106, in add_defaults
super().add_defaults()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 252, in add_defaults
self._add_defaults_ext()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 336, in _add_defaults_ext
build_ext = self.get_finalized_command('build_ext')
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 305, in get_finalized_command
cmd_obj.ensure_finalized()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 111, in ensure_finalized
self.finalize_options()
File "/tmp/pip-install-3w_rwq84/cython_2213ce863fd24e27ae02ae2fbe17f21d/Cython/Distutils/build_ext.py", line 18, in finalize_options
self.distribution.ext_modules[:] = cythonize(
File "/tmp/pip-install-3w_rwq84/cython_2213ce863fd24e27ae02ae2fbe17f21d/Cython/Build/Dependencies.py", line 913, in cythonize
module_list, module_metadata = create_extension_list(
File "/tmp/pip-install-3w_rwq84/cython_2213ce863fd24e27ae02ae2fbe17f21d/Cython/Build/Dependencies.py", line 742, in create_extension_list
elif isinstance(patterns, basestring) or not isinstance(patterns, collections.Iterable):
AttributeError: module 'collections' has no attribute 'Iterable'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
[negatron@Negatron]-[~/TTS-Cube]

Fine-tuning/Speaker adaptation

First off, this is great work. Can't wait to play around with the code 👍

In the training instructions, I see that you do have multispeaker support. Is it possible to "fine-tune" from an existing checkpoint with another dataset using --resume? Has anyone tried it and see if the results are good?

Pretrained text encoder for included IAF model

Hi Tiberiu,

would you be able to share a pretrained text encoder model for the included IAF model?

Thanks,
Ewald

melgan vocoder is fast, let's integrate it?

Hello, I've did the required changes to integrate tts cube with melgan vocoder.

The inference is really fast, with wavenet it took 1.5 hour to vocode a few sentences, now it takes literally seconds.

I trained the melgan vocoder for 325 epochs (106170 iterations), it took a day I think and it's already understandable.

Problem is encoder takes so long to train I'm training for days and days and it still says what it wants. I wish a more faster gpu.

It speaks, just not exactly what is in the text file.

The datasets are Japanese and Russian. I want to do a common (multi lingual) model in the future (just for fun).

Is there an interest from others to reproduce my experiment on your dataset?? I can share my code.

Training times

Hello Tiberiu,
Thanks for making the code for TTS-Cube available, I'm finding it really useful for my thesis.

However, I'm currently looking to train TTS-Cube on audio and transcripts extracted from interviews. What kind of training times were you seeing for

A) training the G2P model;
B) training the Vocoder;
C) training the encoder.

Also, once trained is the synthesis fast enough for real-time TTS?

How to use the G2P model

Hi, I have trained a G2P model on CMUDICT as Step 0 described. I found 3 files(en-g2p-bestAcc.network en-g2p.encodings en-g2p-last.network) in data/model/.
But, how to use them to convert Grapheme to Phoneme? Is it automatically converted in Step 1/2/3?
Thank you.

Demo on Colab, possible improvements?

Hi,

I ran the updated English model in Google Colab, find the code here:
https://colab.research.google.com/drive/1L6BYGA0CmQhF6FULWbVr4-yZeKtp4uPK

The synthesis is good, but improvement could be in the pronunciation of the vowels, which sound as if they are underwater, and the consonants, which are a bit sharp. Any idea why this is, or how this could be solved?

Thank you for creating this project, and making it available :)

What is BeeCoder?

Hi I'm trying to understand your Beecoder vocoder, is it just a MLP with a fixed lookback window?

colab notebook missing command to enter the github folder

I tried using the colab, it works but there is an error in the paths.

After installing the dependencies, code assumes it's in the the repository folder.

add a line to enter the folder
%cd TTS-Cube
before
!git submodule update --init --recursive

or change the paths to include it

cum folosesc vocea in romana?

How can I synthesize my own text to speech?

how to try it

Hello, thanks for sharing. Can you write how to train and evaluate this model?

what is the present inference for generating 10sec audio using vocoder?

I want to know the general inference time of generating 10sec audio using the vocoder presently i don't have the setup to test it on my own can anyone help
thank you

what should the development set's content be in a speech dataset and g2p?

this is my first github issue, so please forgive me if there are any mistakes.
The problem i'm having right now though is simply not understanding what should be contained in a development folder of a training set
What I've done.
I've downloaded the M-AILABS italian training set, and have splitted the csv in txt files such as every one of them are corresponding to a wav file, and that's for the training set. My question is: what should i put in the other folder?
The readme says that there should not be more than 5 files in there, but when i start training with an empty dev folder it gives me an error about a lab file that was not found.
I have the same doubt about the g2p thing, but as i'm not going to use that feature that's a secondary thing for me, as well as adding custom things in the lab file which, in fact, i've not added any.

Is there any interest in providing a model trained in Brazilian Portuguese?

Hello, I recently developed a dataset for voice synthesis in Brazilian Portuguese using my own voice, called TTS-Porguese Corpus, the base has approximately 10 hours of talk, is available at: https://github.com/Edresson/TTS-Portuguese-Corpus

I have already successfully trained the DCTTS model in the dataset.

Is anyone interested in training/adjusting the hyperparameters of the model to get a TTS model in Portuguese?

Add requirements.txt

No idea what the dependencies are for this repo. Apparently it requires python 2 (due to the use of xrange) and is using an old version of scipy ("module scipy.misc has no attribute 'toimage'"). Also, did the folder structure change? Because trainer.py cannot find 'data/processed/train'. I had to manually correct it to '../data/processed/train' (many times because it is hardcoded all over the place).

You may be on to something, but this repo is unusable by others as it is.

English model and hardware requirements

Hello Tiberiu,

I'd love to test TTS-Cube, but unfortunately now i don't have access to a good GPU (and i don't think i could train a TTS on a laptop with a 940mx), do you have a pretrained english model? (it seems you were working on one, but i don't know the current status about that).

Also, do you have an idea what could be the hardware requirements to run the synthesis? For example the nvidia jetson nano seems a nice platform to have a self-hosted TTS, but i'm not sure if it's powerful enough to run TTS-Cube.

tiberiu44 / tts-cube Goto Github PK

tts-cube's People

Contributors

Stargazers

Watchers

Forkers

tts-cube's Issues

Recommend Projects

Recommend Topics

Recommend Org