cmusphinx / g2p-seq2seq Goto Github PK
View Code? Open in Web Editor NEWG2P with Tensorflow
License: Other
G2P with Tensorflow
License: Other
I run
python /home/ubuntu/g2p-seq2seq/g2p_seq2seq/g2p.py --train cmudict.dict --num_layers 4 --size 64 --model model
I get
WER : 0.964269283852
Accuracy : 0.0357307161478
If vocabulary file is already loaded, do not reread it again.
It does not read anything anymore.
Hi,
thanks so much for this great project!
I have it running in --decode
mode but run into this error when --interactive
where I receive this message:
$ sudo g2p-seq2seq --interactive --model g2p-seq2seq-cmudict
Creating 2 layers of 512 units.
Reading model parameters from g2p-seq2seq-cmudict
> hello
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/bin/g2p-seq2seq", line 11, in <module>
load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/g2p_seq2seq-5.0.0a0-py3.5.egg/g2p_seq2seq/app.py", line 78, in main
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/g2p_seq2seq-5.0.0a0-py3.5.egg/g2p_seq2seq/g2p.py", line 308, in interactive
TypeError: decoding str is not supported
Sorry if this is a newbie error. Any help much appreciated :)
g2p-seq2seq --evaluate NEWARABIC/test.wordlist --model NEWARABIC
Creating 2 layers of 64 units.
Reading model parameters from NEWARABIC
Beginning calculation word error rate (WER) on test sample.
Words : 0
Errors: 0
Traceback (most recent call last):
File "/usr/local/bin/g2p-seq2seq", line 9, in
load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 81, in main
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 348, in evaluate
ZeroDivisionError: float division by zero
When I decode the same wordlist, it works fine.
Would be interesting to compare with a similar CNTK model:
https://github.com/Microsoft/CNTK/blob/master/Examples/SequenceToSequence/Miscellaneous/G2P/G2P.cntk
I am trying to do everything right but this error still persists.
Creating 2 layers of 64 units.
Created model with fresh parameters.
global step 200 learning rate 0.5000 step-time 3.09 perplexity 1.57
Traceback (most recent call last):
File "/usr/local/bin/g2p-seq2seq", line 9, in
load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 67, in main
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 217, in train
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 253, in __run_evals
File "/usr/local/lib/python2.7/dist-packages/tensorflow/models/rnn/translate/seq2seq_model.py", line 250, in get_batch
encoder_input, decoder_input = random.choice(data[bucket_id])
File "/usr/lib/python2.7/random.py", line 275, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range
I want to test accuracy of the trained model on cmudict.
Are there any standard training, validation, test dict for this task.
How it is compared in papers for fair evaluation if there are no standard partitions?
Thanks a lot for this code.
create_vocabulary(ph_vocab_path, train_ph)
create_vocabulary(gr_vocab_path, train_gr)
# Initialize vocabularies.
ph_vocab = initialize_vocabulary(ph_vocab_path, False)
gr_vocab = initialize_vocabulary(gr_vocab_path, False)
Why do you need to initialize the vocabulary after you created it. Logic must be more straightforward. First initialize the vocabulary, then save it, then there is no need to reload it again.
I ran with 2 layers and 512 units but got nowhere close to reported?
Is this execution correct?
python -u g2p.py --train ../../cmudict/cmudict.dict --size 512
Preparing G2P data
Creating vocabularies in /tmp
Creating vocabulary /tmp/vocab.phoneme
Creating vocabulary /tmp/vocab.grapheme
Reading development and training data.
Creating 2 layers of 512 units.
Reading model parameters from /tmp/translate.ckpt-200
global step 400 learning rate 0.5000 step-time 2.78 perplexity 7.83
eval: bucket 0 perplexity 4.88
eval: bucket 1 perplexity 6.30
eval: bucket 2 perplexity 12.34
global step 600 learning rate 0.5000 step-time 2.71 perplexity 4.34
eval: bucket 0 perplexity 2.48
eval: bucket 1 perplexity 2.96
eval: bucket 2 perplexity 4.78
global step 800 learning rate 0.5000 step-time 2.63 perplexity 2.72
eval: bucket 0 perplexity 1.75
eval: bucket 1 perplexity 2.15
eval: bucket 2 perplexity 3.45
global step 1000 learning rate 0.5000 step-time 2.56 perplexity 2.26
eval: bucket 0 perplexity 1.65
eval: bucket 1 perplexity 1.84
eval: bucket 2 perplexity 3.17
global step 1200 learning rate 0.5000 step-time 2.68 perplexity 2.00
eval: bucket 0 perplexity 1.29
eval: bucket 1 perplexity 1.69
eval: bucket 2 perplexity 2.57
global step 1400 learning rate 0.5000 step-time 2.86 perplexity 1.84
eval: bucket 0 perplexity 1.48
eval: bucket 1 perplexity 1.70
eval: bucket 2 perplexity 2.15
global step 1600 learning rate 0.5000 step-time 3.40 perplexity 1.76
eval: bucket 0 perplexity 1.65
eval: bucket 1 perplexity 1.67
eval: bucket 2 perplexity 2.18
global step 1800 learning rate 0.5000 step-time 3.65 perplexity 1.71
eval: bucket 0 perplexity 1.42
eval: bucket 1 perplexity 1.79
eval: bucket 2 perplexity 2.04
global step 2000 learning rate 0.5000 step-time 2.68 perplexity 1.56
eval: bucket 0 perplexity 1.30
eval: bucket 1 perplexity 1.53
eval: bucket 2 perplexity 1.83
global step 2200 learning rate 0.5000 step-time 3.33 perplexity 1.61
eval: bucket 0 perplexity 1.50
eval: bucket 1 perplexity 1.66
eval: bucket 2 perplexity 1.70
global step 2400 learning rate 0.5000 step-time 3.01 perplexity 1.52
eval: bucket 0 perplexity 1.29
eval: bucket 1 perplexity 1.47
eval: bucket 2 perplexity 1.79
global step 2600 learning rate 0.5000 step-time 3.09 perplexity 1.53
eval: bucket 0 perplexity 1.34
eval: bucket 1 perplexity 1.57
eval: bucket 2 perplexity 1.90
global step 2800 learning rate 0.5000 step-time 2.92 perplexity 1.49
eval: bucket 0 perplexity 1.35
eval: bucket 1 perplexity 1.67
eval: bucket 2 perplexity 1.85
global step 3000 learning rate 0.5000 step-time 2.82 perplexity 1.44
eval: bucket 0 perplexity 1.39
eval: bucket 1 perplexity 1.55
eval: bucket 2 perplexity 1.81
global step 3200 learning rate 0.5000 step-time 2.68 perplexity 1.43
eval: bucket 0 perplexity 1.49
eval: bucket 1 perplexity 1.35
eval: bucket 2 perplexity 1.87
global step 3400 learning rate 0.5000 step-time 2.90 perplexity 1.41
eval: bucket 0 perplexity 1.35
eval: bucket 1 perplexity 1.56
eval: bucket 2 perplexity 1.73
global step 3600 learning rate 0.5000 step-time 2.79 perplexity 1.40
eval: bucket 0 perplexity 1.27
eval: bucket 1 perplexity 1.32
eval: bucket 2 perplexity 1.59
global step 3800 learning rate 0.5000 step-time 2.87 perplexity 1.38
eval: bucket 0 perplexity 1.52
eval: bucket 1 perplexity 1.46
eval: bucket 2 perplexity 1.52
global step 4000 learning rate 0.5000 step-time 2.74 perplexity 1.36
eval: bucket 0 perplexity 1.49
eval: bucket 1 perplexity 1.41
eval: bucket 2 perplexity 1.83
global step 4200 learning rate 0.5000 step-time 2.80 perplexity 1.37
eval: bucket 0 perplexity 1.23
eval: bucket 1 perplexity 1.36
eval: bucket 2 perplexity 1.58
global step 4400 learning rate 0.5000 step-time 2.94 perplexity 1.36
eval: bucket 0 perplexity 1.58
eval: bucket 1 perplexity 1.53
eval: bucket 2 perplexity 1.73
global step 4600 learning rate 0.5000 step-time 3.16 perplexity 1.35
eval: bucket 0 perplexity 1.25
eval: bucket 1 perplexity 1.54
eval: bucket 2 perplexity 1.58
global step 4800 learning rate 0.5000 step-time 2.74 perplexity 1.33
eval: bucket 0 perplexity 1.44
eval: bucket 1 perplexity 1.60
eval: bucket 2 perplexity 1.72
global step 5000 learning rate 0.5000 step-time 2.77 perplexity 1.33
eval: bucket 0 perplexity 1.36
eval: bucket 1 perplexity 1.38
eval: bucket 2 perplexity 1.60
global step 5200 learning rate 0.5000 step-time 2.97 perplexity 1.32
eval: bucket 0 perplexity 1.29
eval: bucket 1 perplexity 1.41
eval: bucket 2 perplexity 1.66
global step 5400 learning rate 0.5000 step-time 2.77 perplexity 1.30
eval: bucket 0 perplexity 1.31
eval: bucket 1 perplexity 1.52
eval: bucket 2 perplexity 1.45
global step 5600 learning rate 0.5000 step-time 2.80 perplexity 1.30
eval: bucket 0 perplexity 1.31
eval: bucket 1 perplexity 1.28
eval: bucket 2 perplexity 1.75
global step 5800 learning rate 0.5000 step-time 2.64 perplexity 1.29
eval: bucket 0 perplexity 1.42
eval: bucket 1 perplexity 1.33
eval: bucket 2 perplexity 1.41
global step 6000 learning rate 0.5000 step-time 2.76 perplexity 1.28
eval: bucket 0 perplexity 1.26
eval: bucket 1 perplexity 1.39
eval: bucket 2 perplexity 1.48
global step 6200 learning rate 0.5000 step-time 2.55 perplexity 1.28
eval: bucket 0 perplexity 1.37
eval: bucket 1 perplexity 1.37
eval: bucket 2 perplexity 1.67
global step 6400 learning rate 0.5000 step-time 2.68 perplexity 1.26
eval: bucket 0 perplexity 1.23
eval: bucket 1 perplexity 1.50
eval: bucket 2 perplexity 1.44
global step 6600 learning rate 0.5000 step-time 2.98 perplexity 1.26
eval: bucket 0 perplexity 1.12
eval: bucket 1 perplexity 1.54
eval: bucket 2 perplexity 1.47
global step 6800 learning rate 0.5000 step-time 2.87 perplexity 1.26
eval: bucket 0 perplexity 1.22
eval: bucket 1 perplexity 1.29
eval: bucket 2 perplexity 1.56
global step 7000 learning rate 0.5000 step-time 2.81 perplexity 1.26
eval: bucket 0 perplexity 1.22
eval: bucket 1 perplexity 1.45
eval: bucket 2 perplexity 1.54
global step 7200 learning rate 0.5000 step-time 2.76 perplexity 1.25
eval: bucket 0 perplexity 1.35
eval: bucket 1 perplexity 1.46
eval: bucket 2 perplexity 1.40
global step 7400 learning rate 0.5000 step-time 3.06 perplexity 1.24
eval: bucket 0 perplexity 1.18
eval: bucket 1 perplexity 1.26
eval: bucket 2 perplexity 1.48
global step 7600 learning rate 0.5000 step-time 3.15 perplexity 1.25
eval: bucket 0 perplexity 1.47
eval: bucket 1 perplexity 1.31
eval: bucket 2 perplexity 1.50
global step 7800 learning rate 0.5000 step-time 3.13 perplexity 1.24
eval: bucket 0 perplexity 1.50
eval: bucket 1 perplexity 1.43
eval: bucket 2 perplexity 1.46
global step 8000 learning rate 0.5000 step-time 2.76 perplexity 1.23
eval: bucket 0 perplexity 1.39
eval: bucket 1 perplexity 1.37
eval: bucket 2 perplexity 1.47
global step 8200 learning rate 0.5000 step-time 2.64 perplexity 1.22
eval: bucket 0 perplexity 1.30
eval: bucket 1 perplexity 1.25
eval: bucket 2 perplexity 1.59
global step 8400 learning rate 0.5000 step-time 2.38 perplexity 1.23
eval: bucket 0 perplexity 1.42
eval: bucket 1 perplexity 1.43
eval: bucket 2 perplexity 1.45
global step 8600 learning rate 0.5000 step-time 2.53 perplexity 1.21
eval: bucket 0 perplexity 1.42
eval: bucket 1 perplexity 1.33
eval: bucket 2 perplexity 1.39
global step 8800 learning rate 0.5000 step-time 2.58 perplexity 1.21
eval: bucket 0 perplexity 1.21
eval: bucket 1 perplexity 1.31
eval: bucket 2 perplexity 1.50
global step 9000 learning rate 0.5000 step-time 2.88 perplexity 1.21
eval: bucket 0 perplexity 1.36
eval: bucket 1 perplexity 1.30
eval: bucket 2 perplexity 1.57
global step 9200 learning rate 0.5000 step-time 3.03 perplexity 1.21
eval: bucket 0 perplexity 1.47
eval: bucket 1 perplexity 1.45
eval: bucket 2 perplexity 1.38
global step 9400 learning rate 0.5000 step-time 2.77 perplexity 1.20
eval: bucket 0 perplexity 1.39
eval: bucket 1 perplexity 1.29
eval: bucket 2 perplexity 1.55
global step 9600 learning rate 0.5000 step-time 2.86 perplexity 1.19
eval: bucket 0 perplexity 1.53
eval: bucket 1 perplexity 1.35
eval: bucket 2 perplexity 1.46
global step 9800 learning rate 0.5000 step-time 2.87 perplexity 1.19
eval: bucket 0 perplexity 1.43
eval: bucket 1 perplexity 1.43
eval: bucket 2 perplexity 1.80
global step 10000 learning rate 0.5000 step-time 2.74 perplexity 1.18
eval: bucket 0 perplexity 1.36
eval: bucket 1 perplexity 1.50
eval: bucket 2 perplexity 1.45
Training process stopped.
Beginning calculation word error rate (WER) on test sample.
WER : 0.469490521327
Accuracy : 0.530509478673
Currently have to create model direrctory manually
I used the following command to train G2P model:
python g2p.py --train /home/cmudict.dict --model /home/MyModel --max_steps 8400
here is the log:
Preparing G2P data
Creating vocabularies in /home/MyModel
Creating vocabulary /home/MyModel/vocab.phoneme
Creating vocabulary /home/MyModel/vocab.grapheme
Reading development and training data.
Creating 2 layers of 64 units.
........
Reading model parameters from /home/MyModel/translate.ckpt-8200
global step 8400 learning rate 0.4901 step-time 3.43 perplexity 1.37
eval: bucket 0 perplexity 1.46
eval: bucket 1 perplexity 1.29
eval: bucket 2 perplexity 1.47
Training process stopped.
Beginning calculation word error rate (WER) on test sample.
WER : 0.4961492891
Accuracy : 0.5038507109
In MyModel directory there are so many generated files present, but there is no "model" file.
translate.ckpt-200
translate.ckpt-200.meta
translate.ckpt-400
translate.ckpt-400.meta
translate.ckpt-600
translate.ckpt-600.meta
translate.ckpt-7200
translate.ckpt-7200.meta
translate.ckpt-7400
translate.ckpt-7400.meta
translate.ckpt-7600
translate.ckpt-7600.meta
translate.ckpt-7800
translate.ckpt-7800.meta
translate.ckpt-8000
translate.ckpt-8000.meta
translate.ckpt-8200
translate.ckpt-8200.meta
translate.ckpt-8400
translate.ckpt-8400.meta
model.params
vocab.phoneme
vocab.grapheme
translate.ckpt-8600
translate.ckpt-8600.meta
translate.ckpt-8800
checkpoint
translate.ckpt-8800.meta
Where to get that "model" file.
or do I have to rename file translate.ckpt-8800
to model
?
In the function g2p.py I added a time.time() function around the command
self.model.saver.restore(self.session, os.path.join(self.model_dir,
"model"))
to see how long it takes to load a pre-trained model to decode words. With a model trained with 512 nodes I get:
Time to load model: 2.53336596489
with only 64 nodes I don't get much savings:
Time to load model: 2.50763916969
which according to the python time module is output in seconds. That seems really slow. I am using the cpu instead of the gpu, because in the end if we are to include a similar NN model in our software, we don't have any gpu power on our servers. But still, when I compare it with a current openfst implementation of an n-gram model, that one is only 300ms or 0.3s to load in c++.
It may be faster if I can restore the saved file from c++ but I have to see about writing code to allow that.
File might disappear between exists check and open anyway, so check is redundant. Just open the files and proceed. Throw an error if open failed.
Like we discussed, what if we bias word pronunciation with word frequency
So, I was training a new model based on CMUSphinx dictionary, with --max_steps 10000 --size 512 --num_layers 3 --learning_rate 0.5
variables, after I finished the training on model, I got this output with trained model.
a
M HH HH HH HH UH UH UH UH UH
b
M M UH UH UH UH UH UH UH UH
c
M M M UH UH UH UH UH UH UH
d
M M M M UH UH UH UH UH UH
hello
M HH HH HH HH UW UW UW UW M M M M M M
aa
HH HH HH HH HH HH HH HH UH UH
Is there anything wrong with my approach ?
this was my last output in training model.
global step 10000 learning rate 0.4000 step-time 3.48 perplexity 1.15
eval: bucket 0 perplexity 1.40
eval: bucket 1 perplexity 1.25
eval: bucket 2 perplexity 1.34
Never convert integer to string to later convert it back to integer, this is very inefficient.
Never join list items in a string to later split them and join them again.
Remove the code which is not used.
Is it possible to get phone error rate in addition to word error rate?
Hi,
is it possible to output grapheme-phoneme alignment data from this model?
many thanks,
Daniel
When you convert letter and phoneme symbols to numerical ids, isn't it confusing for the model to train with integers for classes? Would it be better to have one-hot encoding or maybe even letter embeddings to make distances between letters or phonemes more meaningful?
Many steps should have the same perplexity before training ends. The number of steps thus could be reduced significantly if we stop when we have the same perplexity 4 times or so. It should not affect the accuracy.
Square brackets like "[model_folder_path]" are reserved for optional arguments, required arguments are usually simply underlined
There is no need to use two times
print("> ", end="")
With 2 digits after dot
It should not split pronunciation variants for same word for train and test
sam@speechws13:~/g2p-seq2seq-master$ g2p-seq2seq --interactive --model g2p-seq2seq-cmudict/g2p-seq2seq-cmudict/modle
Traceback (most recent call last):
File "/usr/local/bin/g2p-seq2seq", line 9, in <module>
load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 542, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
return ep.load()
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2229, in load
return self.resolve()
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2235, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/__init__.py", line 23, in <module>
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 36, in <module>
ImportError: No module named data_utils
how can I fix this question??
thank you~
Move
train_gr, train_ph = data_utils.split_to_grapheme_phoneme(train_dic)
valid_gr, valid_ph = data_utils.split_to_grapheme_phoneme(valid_dic)
test_gr, test_ph = data_utils.split_to_grapheme_phoneme(test_dic)
from the main function to train
Is there any plan to replace online lmtool with g2p-seq2seq or start a new service like that ?
http://www.speech.cs.cmu.edu/tools/lmtool-new.html
[shmyrev@alpha g2p_seq2seq]$ cat > word list
hello
world
how
are
you
[shmyrev@alpha g2p_seq2seq]$ python g2p.py --model /home/shmyrev/cmudict-g2p-model --decode word.list
HH EH L OW
W ER L D
HH AW
AA R
Last word is missing
Moreover, each line should contain a word, not just the phonemes. It should create a ready-to-use dictionary:
[shmyrev@alpha g2p_seq2seq]$ python g2p.py --model /home/shmyrev/cmudict-g2p-model --decode word.list
hello HH EH L OW
world W ER L D
how HH AW
are AA R
you Y UW
To make them read vocabulary file only once.
Dear All,
I got a coding error in test phase (training and interactive phase were all fine). My training dictionary is a mixture of cmudict (ascii) and Chinese (utf-8) lexicons. What should I do? Should I convert all cmudict entries to utf-8?
Thanks a lot in advance!
global step 91200 learning rate 0.2425 step-time 0.13 perplexity 1.02
Training done.
Creating 2 layers of 512 units.
Reading model parameters from g2p-seq2seq-oc16
Beginning calculation word error rate (WER) on test sample.
Traceback (most recent call last):
File "/home/liao/anaconda3/envs/python2.7/bin/g2p-seq2seq", line 9, in
load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 67, in main
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 234, in train
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 347, in evaluate
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 323, in calc_error
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 279, in decode_word
UnicodeEncodeError: 'ascii' codec can't encode character u'\u86c8' in position 9: ordinal not in range(128)
瘦西湖 sh ou4 x i1 h u2
睃 s uo1
supercuts S UW1 P ER0 K AH2 T S
电报机 d ian4 b ao4 j i1
galka G AE1 L K AH0
知 zh ix4
Unipus Y UW1 N IH0 P AH0 S
You can raise exceptions only in one place, now you check for extra symbols in several places
Fix LICENSE.txt
To make them dictionary-independent
Test dictionary is simply ignored
whether we can get alternate pronunciations
If you only need direct and reversed dictionary, it is better to change this method:
def initialize_vocabulary(vocabulary_path):
"""Initialize vocabulary from file.
We assume the vocabulary is stored one-item-per-line, so a file:
d
c
will result in a vocabulary {"d": 0, "c": 1}, and this function will
also return the reversed-vocabulary ["d", "c"].
To this method with optional reverse parameter:
def load_vocab(vocabulary_path, reverse = False)
This method should return only one vocabulary direct or reversed based on optional flag
Traceback (most recent call last):
File "g2p.py", line 442, in <module>
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "g2p.py", line 425, in main
g2p_model.train(g2p_params, FLAGS.train, FLAGS.valid, FLAGS.test)
File "g2p.py", line 243, in train
self.__run_evals()
File "g2p.py", line 269, in __run_evals
self.valid_set, bucket_id)
File "/usr/lib/python2.7/site-packages/tensorflow/models/rnn/translate/seq2seq_model.py", line 252, in get_batch
encoder_input, decoder_input = random.choice(data[bucket_id])
File "/usr/lib64/python2.7/random.py", line 274, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range
Here you can do with a single pass, and there is not need for list
lst = []
for line in inp_dictionary:
lst.append(line.strip().split())
graphemes, phonemes = [], []
for line in lst:
if len(line)>1:
graphemes.append(list(line[0]))
phonemes.append(line[1:])
SGD seems to converge slowly. Can we have an option for RMSProp, Adadelta and Adagrad? This should be easy to implement with the respective Tensorflow optimizers
Also update error rates in readme. Phonetisaurus error rate on this set is also 24.4%. Phonetisaurus on latest cmudict 33.89%. Provide our results on latest cmudict.
Does it support arabic language, if not what alternative method could be used for text to phoneme for arabic text?
Data which is not important must be removed from git and added to .gitignore file. For example train/model
Create a toy dictionary with 20 lines and train an g2p model with 2 elements on hidden layer.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.