barronalex / tacotron Goto Github PK
View Code? Open in Web Editor NEWImplementation of Google's Tacotron in TensorFlow
Implementation of Google's Tacotron in TensorFlow
Hi all,
Just wondering if anyone noticed that the memory usage is high?
My training data are:
925M Jun 16 08:09 mels.npy
56K Jun 16 08:09 meta.pkl
7.9K Jun 16 08:09 speech_lens.npy
21K Jun 16 13:45 stft_mean.npy
5.8G Jun 16 08:09 stfts.npy
21K Jun 16 13:45 stft_std.npy
7.9K Jun 16 08:09 text_lens.npy
771K Jun 16 08:09 texts.npy
However, a "top" shows that training uses more than 20GB RES.
Thanks
Jian
I not download file http://data.cstr.ed.ac.uk/blizzard2011/lessac/prompts.data
The friend can help me download it from baidu ( google)
THANKS
In a models/tacotron.py, we defined annealing_rate = 1 and learning rate init_lr = 0.0005. Does it mean there's no decay function in this model?
Am I right?
What about tensorflow/tensorflow#7868 ?
Does anyone run into this issue ? In the training process, "loss exploded" threw out and the training is stopped.
Hi,
For some reason the samples generated in train.py are very long ~50 seconds.
Do you also have this issue?
Hello,
we have been experimenting with multispeaker dataset and since we started hearing some understandable words/sentences while training we tried to produce some other via test.py.
The problem with it is, as far as we understand, the ability to use the speaker embedding and testing usually broke on CBHG in ops file.
Could you please be kind and try to treat the general problem of testing given the multispaker dataset such as VCTK.
Thank you, very promising result.
Dataset: arctic only
Parameters: everything is default
Training time: 14 hours
Hi just want to ask for educational purposes.Do you think that it easier to implement the paper "Tacotron: Towards End-To-End Speech Synthesis" or Tacotron 2 "Natural TTS Synthesis By Conditioning Wavenet On Mel Spectogram
Predictions"?
Right after the inference one file is generated and saved to ./log/nancy/tacotron/test. As I am a beginner in this, I am unable to understand how to make use of the file and how i can get the audio file of some input text. Please help.
Thanks
I replaced the open commands with codecs.open(txt_file, 'r', 'utf-8') to support non English characters.
Everything works fine. But if I use non English characters in prompts.txt when using test.py I get an error :
"xxxx" is not a valid scope name
If I remove the non English characters, script runs without errors, but no speech is generated only noise.
Only except tf.errors.OutOfRangeError: is raised which I think is intentional?
c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158]
Out of range: FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 32, current size 0)
[[Node: batch = QueueDequeueUpToV2[component_types=[DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]
Hi, I was trying to freeze the model, but as the code is using the data loader from tensorflow, I cannot easily find the input nodes and feed the data through sess.run(output, feed_dict). So I am wandering is there anyone who has frozen the model? Did you add some extra nodes to help rebuild the input nodes? Thanks.
Hello,
I downloaded the Nancy weights and tried 'python3 test.py < prompts.txt' with one sentence in the prompts.txt file. I get the following error. Is this a tensorflow version issue? I'm on Ubuntu 16.04 and my TF version is 1.12.0:
Traceback (most recent call last):
File "test.py", line 91, in
test(model, config, prompts)
File "test.py", line 31, in test
model = model(config, batch_inputs, train=False)
File "/home/kbalak18/Tacotron/models/tacotron.py", line 191, in init
self.seq2seq_output, self.output = self.inference(inputs, train)
File "/home/kbalak18/Tacotron/models/tacotron.py", line 137, in inference
(seq2seq_output, _), attention_state, _ = decoder.dynamic_decode(dec, maximum_iterations=config.max_decode_iter)
File "/home/kbalak18/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 209, in dynamic_decode
zero_outputs = _create_zero_outputs(decoder.output_size,
File "/home/kbalak18/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/basic_decoder.py", line 101, in output_size
sample_id=self._helper.sample_ids_shape)
File "/home/kbalak18/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/helper.py", line 144, in sample_ids_shape
return self._sample_ids_shape
AttributeError: 'InferenceHelper' object has no attribute '_sample_ids_shape'
I'm trying to reproduce some of the results I obtained during training by using the test.py script. Continuing to dig into this, but wondering if anyone else has come across the same issue?
It sounds like some maximum length in terms of time, but what is the unit for this variable? The default value of 108000 looks like milliseconds maybe?
How long did it take you to train this model from scratch on your mac using one GPU?
I'm not sure if these are larger than what you are using, but have a look at the datasets used by speech-to-text-wavenet
hi, i am new to tts.
when i run the test.py, i only get a dir, weightes/arctic/tacotron, but in the dir , there is nothing in it.
so i want to know how can i generate the wav from the text???
I have trained a French model recently using a dataset with total of 10 hours (https://datashare.is.ed.ac.uk/handle/10283/2353). The results quality is low in comparison to English using "nancy" dataset with the same loss values. I wanted to know if there is certain limitations on the total number of hours of the training data or any other constrains on the data. In other words, how can we improve our results?
Hi,
Could you provide a pre-trained model if possible?
Thanks
I tired to generate a wav file after training for a day using test.py but the result was mostly noise no speech.
I used the same words as in the training data. It correctly renders speech while training, but using the same phrase for test.py results in just noise.
Anyone have any idea what might be the cause of it?
I am wondering if the attention RNN described in the paper is included in the implementation. If so, could someone point out lines in code where it is used? The reason I ask is because it seems to me that the paper is keeping track of the attention states and using them as input to predict the next timestep's attention. This makes sense as they do a similar thing in the Chorowsky at al paper (this mechanism is also used in the Tacotron2 paper) to keep the attention moving forward. But I could very well be wrong about interpreting this.
executing the following command give me unresolved host problem
./download_weights.sh
--2018-06-09 01:51:24-- https://www.dropbox.com/s/8lq7y9bhglthdjm/tacotron_weights.zip
Resolving www.dropbox.com (www.dropbox.com)... 162.125.66.1, 2620:100:6022:1::a27d:4201
Connecting to www.dropbox.com (www.dropbox.com)|162.125.66.1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://uce63a8aeb49c0def2795b1fbf40.dl.dropboxusercontent.com/cd/0/get/AIZ13hjTdc5cinjHjaaGngEl_PmJy2e_bgfsmJzaO9yBbKZm0YzGBpahyElIrgVnwgpU4-52DgPgZC6i8kX3vLj2xocAaVijus2AayncSXD2sZW0N4h8a4RxvwbvvBW1P6df0HF_SzSrSqBtl5kptiTQDjWWvFsVmfhczQHDMCN972zP_P_EOaEhbJaaWBGqWVo/file [following]
--2018-06-09 01:51:24-- https://uce63a8aeb49c0def2795b1fbf40.dl.dropboxusercontent.com/cd/0/get/AIZ13hjTdc5cinjHjaaGngEl_PmJy2e_bgfsmJzaO9yBbKZm0YzGBpahyElIrgVnwgpU4-52DgPgZC6i8kX3vLj2xocAaVijus2AayncSXD2sZW0N4h8a4RxvwbvvBW1P6df0HF_SzSrSqBtl5kptiTQDjWWvFsVmfhczQHDMCN972zP_P_EOaEhbJaaWBGqWVo/file
Resolving uce63a8aeb49c0def2795b1fbf40.dl.dropboxusercontent.com (uce63a8aeb49c0def2795b1fbf40.dl.dropboxusercontent.com)... failed: Name or service not known.
wget: unable to resolve host address ‘uce63a8aeb49c0def2795b1fbf40.dl.dropboxusercontent.com’
mv: cannot stat 'tacotron_weights.zip': No such file or directory
unzip: cannot find or open weights/nancy/tacotron_weights.zip, weights/nancy/tacotron_weights.zip.zip or weights/nancy/tacotron_weights.zip.ZIP.
Why not use the mean of loss like this?(https://github.com/barronalex/Tacotron/blob/master/models/tacotron.py#L136)
seq2seq_loss = tf.reduce_mean(tf.abs(seq2seq_output - mel))
output_loss = tf.reduce_mean(tf.abs(output - linear))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.