Git Product home page Git Product logo

tts-cube's People

Contributors

gcioroiu avatar roodrallec avatar rscctest avatar tiberiu44 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tts-cube's Issues

audio samples on English dataset

Hello,

Thank you for the wonderful repository.

I read that you're currently training on LJSpeech dataset for english TTS.

Do you have any updates on audio samples?

Also would you be able to provide some rough training stats (number of GPU used, hours need per pass through data, etc).

Thanks again for the awesome repository and open-source effort.

Negative loss when training step2

Here is my output of training step2

/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Found 4995 training files and 5 development files
	Rendering devset
		1/5 processing file data/processed/dev/0000001 
		2/5 processing file data/processed/dev/0000002 
		3/5 processing file data/processed/dev/0000003 
		4/5 processing file data/processed/dev/0000004 
		5/5 processing file data/processed/dev/0000005 

Starting epoch 1
Shuffling training data
	1/4995 processing file data/processed/train/0000007
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.29it/s]
 avg loss=0.9230358004570007 execution time=0.8138909339904785
	2/4995 processing file data/processed/train/0000008
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.47it/s]
 avg loss=0.8607208132743835 execution time=0.714789867401123
...
avg loss=0.1945137083530426 execution time=0.626471757888794
	17/4995 processing file data/processed/train/0000022
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.46it/s]
 avg loss=0.0572626106441021 execution time=0.7513647079467773
	18/4995 processing file data/processed/train/0000023
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.68it/s]
 avg loss=-0.061442214995622635 execution time=0.6261122226715088
	19/4995 processing file data/processed/train/0000024
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.47it/s]
 avg loss=-0.18586862087249756 execution time=0.7162132263183594
	20/4995 processing file data/processed/train/0000025
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.68it/s]
 avg loss=0.06383810192346573 execution time=0.6265075206756592
	21/4995 processing file data/processed/train/0000026
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.67it/s]
 avg loss=-0.20782051980495453 execution time=0.628434419631958
	22/4995 processing file data/processed/train/0000027
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.46it/s]
 avg loss=-0.31225016713142395 execution time=0.7171187400817871
	23/4995 processing file data/processed/train/0000028
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.30it/s]
 avg loss=-0.5820147395133972 execution time=0.8073093891143799
	24/4995 processing file data/processed/train/0000029
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.68it/s]
 avg loss=-0.46214190125465393 execution time=0.6245768070220947
	25/4995 processing file data/processed/train/0000030
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.46it/s]
 avg loss=-0.7601555585861206 execution time=0.720339298248291
	26/4995 processing file data/processed/train/0000031

some words are missing during synthesizing

I have trained an encoder on custom data in telugu language for about 4 days but during inference some words are not synthesized and the audio is skipping those words any do you suggest any hyperparameters adjustment or something else to make the synthesizer work correctly. I am using the ljspeech vocoder. and trained it on the last version of the repo before the new g2p pull. and the loss is around 1.8 to 2.3 it remained in that range for the past 20hrs.
Thank you

Install fails on ARCH during pip command

similar issue happens with Tortoise TTS installl...

My first attempt i may of installed it in a unusual directory from a previos app install... and it got a similar error...

[negatron@Negatron]-[~]

git clone https://github.com/tiberiu44/TTS-Cube.git
Cloning into 'TTS-Cube'...
remote: Enumerating objects: 2345, done.
remote: Counting objects: 100% (958/958), done.
remote: Compressing objects: 100% (286/286), done.
remote: Total 2345 (delta 723), reused 897 (delta 672), pack-reused 1387
Receiving objects: 100% (2345/2345), 697.70 MiB | 1.30 MiB/s, done.
Resolving deltas: 100% (1553/1553), done.
Updating files: 100% (258/258), done.
[negatron@Negatron]-[~]
cd TTS-Cube
pip3 install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting numpy==1.15.0
Using cached numpy-1.15.0.zip (4.5 MB)
Preparing metadata (setup.py) ... done
Collecting librosa==0.6.1
Using cached librosa-0.6.1.tar.gz (1.6 MB)
Preparing metadata (setup.py) ... done
Collecting scipy==1.1.0
Using cached scipy-1.1.0.tar.gz (15.6 MB)
Preparing metadata (setup.py) ... done
Collecting pysptk==0.1.11
Using cached pysptk-0.1.11.tar.gz (402 kB)
Preparing metadata (setup.py) ... done
Collecting Cython==0.27.3
Using cached Cython-0.27.3.tar.gz (1.8 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [50 lines of output]
Unable to find pgen, not compiling formal grammar.
running egg_info
creating /tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info
writing /tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info/dependency_links.txt
writing entry points to /tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info/entry_points.txt
writing top-level names to /tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-5nzjjh0x/Cython.egg-info/SOURCES.txt'
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-f_dvzi41/cython_59458e29aa504995a3935636c2cacd7a/setup.py", line 229, in
setup(
File "/usr/lib/python3.10/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 308, in run
self.find_sources()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 316, in find_sources
mm.run()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 560, in run
self.add_defaults()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 597, in add_defaults
sdist.add_defaults(self)
File "/usr/lib/python3.10/site-packages/setuptools/command/sdist.py", line 106, in add_defaults
super().add_defaults()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 252, in add_defaults
self._add_defaults_ext()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 336, in _add_defaults_ext
build_ext = self.get_finalized_command('build_ext')
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 305, in get_finalized_command
cmd_obj.ensure_finalized()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 111, in ensure_finalized
self.finalize_options()
File "/tmp/pip-install-f_dvzi41/cython_59458e29aa504995a3935636c2cacd7a/Cython/Distutils/build_ext.py", line 18, in finalize_options
self.distribution.ext_modules[:] = cythonize(
File "/tmp/pip-install-f_dvzi41/cython_59458e29aa504995a3935636c2cacd7a/Cython/Build/Dependencies.py", line 913, in cythonize
module_list, module_metadata = create_extension_list(
File "/tmp/pip-install-f_dvzi41/cython_59458e29aa504995a3935636c2cacd7a/Cython/Build/Dependencies.py", line 742, in create_extension_list
elif isinstance(patterns, basestring) or not isinstance(patterns, collections.Iterable):
AttributeError: module 'collections' has no attribute 'Iterable'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
[negatron@Negatron]-[~/TTS-Cube]

cd TTS-Cube
sudo pip3 install -r requirements.txt
cd: no such file or directory: TTS-Cube
[sudo] password for negatron:
Collecting numpy==1.15.0
Downloading numpy-1.15.0.zip (4.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.5/4.5 MB 1.3 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting librosa==0.6.1
Downloading librosa-0.6.1.tar.gz (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 527.5 kB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting scipy==1.1.0
Downloading scipy-1.1.0.tar.gz (15.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.6/15.6 MB 1.3 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting pysptk==0.1.11
Downloading pysptk-0.1.11.tar.gz (402 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 402.5/402.5 kB 253.1 kB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting Cython==0.27.3
Downloading Cython-0.27.3.tar.gz (1.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 1.5 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [50 lines of output]
Unable to find pgen, not compiling formal grammar.
running egg_info
creating /tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info
writing /tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info/dependency_links.txt
writing entry points to /tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info/entry_points.txt
writing top-level names to /tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-nrfid8ec/Cython.egg-info/SOURCES.txt'
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-3w_rwq84/cython_2213ce863fd24e27ae02ae2fbe17f21d/setup.py", line 229, in
setup(
File "/usr/lib/python3.10/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 308, in run
self.find_sources()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 316, in find_sources
mm.run()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 560, in run
self.add_defaults()
File "/usr/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 597, in add_defaults
sdist.add_defaults(self)
File "/usr/lib/python3.10/site-packages/setuptools/command/sdist.py", line 106, in add_defaults
super().add_defaults()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 252, in add_defaults
self._add_defaults_ext()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 336, in _add_defaults_ext
build_ext = self.get_finalized_command('build_ext')
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 305, in get_finalized_command
cmd_obj.ensure_finalized()
File "/usr/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 111, in ensure_finalized
self.finalize_options()
File "/tmp/pip-install-3w_rwq84/cython_2213ce863fd24e27ae02ae2fbe17f21d/Cython/Distutils/build_ext.py", line 18, in finalize_options
self.distribution.ext_modules[:] = cythonize(
File "/tmp/pip-install-3w_rwq84/cython_2213ce863fd24e27ae02ae2fbe17f21d/Cython/Build/Dependencies.py", line 913, in cythonize
module_list, module_metadata = create_extension_list(
File "/tmp/pip-install-3w_rwq84/cython_2213ce863fd24e27ae02ae2fbe17f21d/Cython/Build/Dependencies.py", line 742, in create_extension_list
elif isinstance(patterns, basestring) or not isinstance(patterns, collections.Iterable):
AttributeError: module 'collections' has no attribute 'Iterable'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
[negatron@Negatron]-[~/TTS-Cube]

Fine-tuning/Speaker adaptation

First off, this is great work. Can't wait to play around with the code 👍

In the training instructions, I see that you do have multispeaker support. Is it possible to "fine-tune" from an existing checkpoint with another dataset using --resume? Has anyone tried it and see if the results are good?

melgan vocoder is fast, let's integrate it?

Hello, I've did the required changes to integrate tts cube with melgan vocoder.

The inference is really fast, with wavenet it took 1.5 hour to vocode a few sentences, now it takes literally seconds.

I trained the melgan vocoder for 325 epochs (106170 iterations), it took a day I think and it's already understandable.

Problem is encoder takes so long to train I'm training for days and days and it still says what it wants. I wish a more faster gpu.

It speaks, just not exactly what is in the text file.

The datasets are Japanese and Russian. I want to do a common (multi lingual) model in the future (just for fun).

Is there an interest from others to reproduce my experiment on your dataset?? I can share my code.

Training times

Hello Tiberiu,
Thanks for making the code for TTS-Cube available, I'm finding it really useful for my thesis.

However, I'm currently looking to train TTS-Cube on audio and transcripts extracted from interviews. What kind of training times were you seeing for

A) training the G2P model;
B) training the Vocoder;
C) training the encoder.

Also, once trained is the synthesis fast enough for real-time TTS?

How to use the G2P model

Hi, I have trained a G2P model on CMUDICT as Step 0 described. I found 3 files(en-g2p-bestAcc.network en-g2p.encodings en-g2p-last.network) in data/model/.
But, how to use them to convert Grapheme to Phoneme? Is it automatically converted in Step 1/2/3?
Thank you.

What is BeeCoder?

Hi I'm trying to understand your Beecoder vocoder, is it just a MLP with a fixed lookback window?

colab notebook missing command to enter the github folder

I tried using the colab, it works but there is an error in the paths.

After installing the dependencies, code assumes it's in the the repository folder.

add a line to enter the folder
%cd TTS-Cube
before
!git submodule update --init --recursive

or change the paths to include it

how to try it

Hello, thanks for sharing. Can you write how to train and evaluate this model?

what should the development set's content be in a speech dataset and g2p?

this is my first github issue, so please forgive me if there are any mistakes.
The problem i'm having right now though is simply not understanding what should be contained in a development folder of a training set
What I've done.
I've downloaded the M-AILABS italian training set, and have splitted the csv in txt files such as every one of them are corresponding to a wav file, and that's for the training set. My question is: what should i put in the other folder?
The readme says that there should not be more than 5 files in there, but when i start training with an empty dev folder it gives me an error about a lab file that was not found.
I have the same doubt about the g2p thing, but as i'm not going to use that feature that's a secondary thing for me, as well as adding custom things in the lab file which, in fact, i've not added any.

Add requirements.txt

No idea what the dependencies are for this repo. Apparently it requires python 2 (due to the use of xrange) and is using an old version of scipy ("module scipy.misc has no attribute 'toimage'"). Also, did the folder structure change? Because trainer.py cannot find 'data/processed/train'. I had to manually correct it to '../data/processed/train' (many times because it is hardcoded all over the place).

You may be on to something, but this repo is unusable by others as it is.

English model and hardware requirements

Hello Tiberiu,

I'd love to test TTS-Cube, but unfortunately now i don't have access to a good GPU (and i don't think i could train a TTS on a laptop with a 940mx), do you have a pretrained english model? (it seems you were working on one, but i don't know the current status about that).

Also, do you have an idea what could be the hardware requirements to run the synthesis? For example the nvidia jetson nano seems a nice platform to have a self-hosted TTS, but i'm not sure if it's powerful enough to run TTS-Cube.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.