barronalex / tacotron Goto Github PK

View Code? Open in Web Editor NEW

236.0 236.0 80.0 36.99 MB

Implementation of Google's Tacotron in TensorFlow

Python 96.69% Shell 3.31%

tacotron's People

Contributors

Stargazers

Watchers

tacotron's Issues

High Memory Usage?

Hi all,

Just wondering if anyone noticed that the memory usage is high?

My training data are:
925M Jun 16 08:09 mels.npy
56K Jun 16 08:09 meta.pkl
7.9K Jun 16 08:09 speech_lens.npy
21K Jun 16 13:45 stft_mean.npy
5.8G Jun 16 08:09 stfts.npy
21K Jun 16 13:45 stft_std.npy
7.9K Jun 16 08:09 text_lens.npy
771K Jun 16 08:09 texts.npy
However, a "top" shows that training uses more than 20GB RES.

Thanks

Jian

No download file http://data.cstr.ed.ac.uk

I not download file http://data.cstr.ed.ac.uk/blizzard2011/lessac/prompts.data
The friend can help me download it from baidu ( google)

                                                                                                                                                 THANKS

Annealing rate question

In a models/tacotron.py, we defined annealing_rate = 1 and learning rate init_lr = 0.0005. Does it mean there's no decay function in this model?
Am I right?

Hyperparameter estimation

What about tensorflow/tensorflow#7868 ?

loss exploded

Does anyone run into this issue ? In the training process, "loss exploded" threw out and the training is stopped.

Samples generated in train.py

Hi,
For some reason the samples generated in train.py are very long ~50 seconds.
Do you also have this issue?

MultiSpeaker test.py

Hello,

we have been experimenting with multispeaker dataset and since we started hearing some understandable words/sentences while training we tried to produce some other via test.py.
The problem with it is, as far as we understand, the ability to use the speaker embedding and testing usually broke on CBHG in ops file.
Could you please be kind and try to treat the general problem of testing given the multispaker dataset such as VCTK.

test.py arctic dataset out of range

Anyone getting good results on the multi-speaker VCTK dataset?

promising results

Thank you, very promising result.
Dataset: arctic only
Parameters: everything is default
Training time: 14 hours

individualAudio.zip

General Question

Hi just want to ask for educational purposes.Do you think that it easier to implement the paper "Tacotron: Towards End-To-End Speech Synthesis" or Tacotron 2 "Natural TTS Synthesis By Conditioning Wavenet On Mel Spectogram
Predictions"?

help with the output file

Right after the inference one file is generated and saved to ./log/nancy/tacotron/test. As I am a beginner in this, I am unable to understand how to make use of the file and how i can get the audio file of some input text. Please help.
Thanks

Terrible result at 180k steps with arctic dataset

I trained the arctic dataset 180k steps nearly 70hrs with GTX 1080 but the output audio are terrible, un-listenable.
Here is my loss graph:

P/S: I did not change anything in the code. Please help me to resolve it. Thank you in advance.

Utf-8 characters for non English words.

I replaced the open commands with codecs.open(txt_file, 'r', 'utf-8') to support non English characters.

Everything works fine. But if I use non English characters in prompts.txt when using test.py I get an error :

"xxxx" is not a valid scope name

If I remove the non English characters, script runs without errors, but no speech is generated only noise.

Only except tf.errors.OutOfRangeError: is raised which I think is intentional?

c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158]
Out of range: FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 32, current size 0)
[[Node: batch = QueueDequeueUpToV2[component_types=[DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]

Another reason to go multi speaker now

https://voice.mozilla.org/

Any way to freeze the model?

Hi, I was trying to freeze the model, but as the code is using the data loader from tensorflow, I cannot easily find the input nodes and feed the data through sess.run(output, feed_dict). So I am wandering is there anyone who has frozen the model? Did you add some extra nodes to help rebuild the input nodes? Thanks.

Inference error

Hello,

I downloaded the Nancy weights and tried 'python3 test.py < prompts.txt' with one sentence in the prompts.txt file. I get the following error. Is this a tensorflow version issue? I'm on Ubuntu 16.04 and my TF version is 1.12.0:

Traceback (most recent call last):
File "test.py", line 91, in
test(model, config, prompts)
File "test.py", line 31, in test
model = model(config, batch_inputs, train=False)
File "/home/kbalak18/Tacotron/models/tacotron.py", line 191, in init
self.seq2seq_output, self.output = self.inference(inputs, train)
File "/home/kbalak18/Tacotron/models/tacotron.py", line 137, in inference
(seq2seq_output, _), attention_state, _ = decoder.dynamic_decode(dec, maximum_iterations=config.max_decode_iter)
File "/home/kbalak18/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 209, in dynamic_decode
zero_outputs = _create_zero_outputs(decoder.output_size,
File "/home/kbalak18/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/basic_decoder.py", line 101, in output_size
sample_id=self._helper.sample_ids_shape)
File "/home/kbalak18/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/helper.py", line 144, in sample_ids_shape
return self._sample_ids_shape
AttributeError: 'InferenceHelper' object has no attribute '_sample_ids_shape'

different results output during training compared to test.py

I'm trying to reproduce some of the results I obtained during training by using the test.py script. Continuing to dig into this, but wondering if anyone else has come across the same issue?

maximum_audio_length in audio.py, what is it and what are its units?

It sounds like some maximum length in terms of time, but what is the unit for this variable? The default value of 108000 looks like milliseconds maybe?

Training time

How long did it take you to train this model from scratch on your mac using one GPU?

Better datasets to consider

I'm not sure if these are larger than what you are using, but have a look at the datasets used by speech-to-text-wavenet

don't get anything by running test.py, how to generate the wav?

hi, i am new to tts.
when i run the test.py, i only get a dir, weightes/arctic/tacotron, but in the dir , there is nothing in it.
so i want to know how can i generate the wav from the text???

French Model

I have trained a French model recently using a dataset with total of 10 hours (https://datashare.is.ed.ac.uk/handle/10283/2353). The results quality is low in comparison to English using "nancy" dataset with the same loss values. I wanted to know if there is certain limitations on the total number of hours of the training data or any other constrains on the data. In other words, how can we improve our results?

Pre-trained model?

Hi,
Could you provide a pre-trained model if possible?
Thanks

Test won't generate speech used in training while training does generate correctly.

I tired to generate a wav file after training for a day using test.py but the result was mostly noise no speech.

I used the same words as in the training data. It correctly renders speech while training, but using the same phrase for test.py results in just noise.

Anyone have any idea what might be the cause of it?

Attention RNN

I am wondering if the attention RNN described in the paper is included in the implementation. If so, could someone point out lines in code where it is used? The reason I ask is because it seems to me that the paper is keeping track of the attention states and using them as input to predict the next timestep's attention. This makes sense as they do a similar thing in the Chorowsky at al paper (this mechanism is also used in the Tacotron2 paper) to keep the attention moving forward. But I could very well be wrong about interpreting this.

weights.zip download issue

executing the following command give me unresolved host problem

./download_weights.sh

--2018-06-09 01:51:24-- https://www.dropbox.com/s/8lq7y9bhglthdjm/tacotron_weights.zip
Resolving www.dropbox.com (www.dropbox.com)... 162.125.66.1, 2620:100:6022:1::a27d:4201
Connecting to www.dropbox.com (www.dropbox.com)|162.125.66.1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://uce63a8aeb49c0def2795b1fbf40.dl.dropboxusercontent.com/cd/0/get/AIZ13hjTdc5cinjHjaaGngEl_PmJy2e_bgfsmJzaO9yBbKZm0YzGBpahyElIrgVnwgpU4-52DgPgZC6i8kX3vLj2xocAaVijus2AayncSXD2sZW0N4h8a4RxvwbvvBW1P6df0HF_SzSrSqBtl5kptiTQDjWWvFsVmfhczQHDMCN972zP_P_EOaEhbJaaWBGqWVo/file [following]
--2018-06-09 01:51:24-- https://uce63a8aeb49c0def2795b1fbf40.dl.dropboxusercontent.com/cd/0/get/AIZ13hjTdc5cinjHjaaGngEl_PmJy2e_bgfsmJzaO9yBbKZm0YzGBpahyElIrgVnwgpU4-52DgPgZC6i8kX3vLj2xocAaVijus2AayncSXD2sZW0N4h8a4RxvwbvvBW1P6df0HF_SzSrSqBtl5kptiTQDjWWvFsVmfhczQHDMCN972zP_P_EOaEhbJaaWBGqWVo/file
Resolving uce63a8aeb49c0def2795b1fbf40.dl.dropboxusercontent.com (uce63a8aeb49c0def2795b1fbf40.dl.dropboxusercontent.com)... failed: Name or service not known.
wget: unable to resolve host address ‘uce63a8aeb49c0def2795b1fbf40.dl.dropboxusercontent.com’
mv: cannot stat 'tacotron_weights.zip': No such file or directory
unzip: cannot find or open weights/nancy/tacotron_weights.zip, weights/nancy/tacotron_weights.zip.zip or weights/nancy/tacotron_weights.zip.ZIP.

Anyone getting good results on the nancy dataset?

It has been training for 120k global steps, but the loss curves seem to be fixed.
Any suggestions?

Why not use the mean of the loss?

Why not use the mean of loss like this？(https://github.com/barronalex/Tacotron/blob/master/models/tacotron.py#L136)

    seq2seq_loss = tf.reduce_mean(tf.abs(seq2seq_output - mel))
    output_loss = tf.reduce_mean(tf.abs(output - linear))

barronalex / tacotron Goto Github PK

tacotron's People

Contributors

Stargazers

Watchers

Forkers

tacotron's Issues

Recommend Projects

Recommend Topics

Recommend Org