xxbb1234021 / speech_recognition Goto Github PK
View Code? Open in Web Editor NEW中文语音识别
中文语音识别
楼主,您好。初学者有几个问题请教
test.world.txt文件里放的是wav里的所有中文数据吗?顺序必须一致吗?还有测试的时候wave_files是文件名?还是?我调用文件名好像不对。
您好,我想问一下比如训练需要输入x和y,x是对一句语音进行预处理而成的,那y怎么进行处理呢?如果是数字分类问题,只需要将y变成0-9对应的数据,可是这一句话要变成什么呢?一句话里面有很多的汉字呀,该怎么对应呢?希望您能解决我的疑问
我是用的train里面的数据训练的,训练了两天多,然后想用test里面的数据进行测试,直接改config.ini文件是不能够测
你好,你的conf文件夹内的路径我不太明白,label_file 具体是什么呀,还有wave_path 是你从清华数据集里面调出来的么,万分感谢
修改了代码中使用gpu的部分。
两处地方:
# with tf.device('/gpu:0'):
with tf.device('/cpu:0'):
# gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
# self.sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
self.sess = tf.Session()
修改完之后报错:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [36] rhs shape= [1788]
i want to test audio file, but test.py is rebuild bi_rnn model. how can i restore model from checkpoint file> thanks
ModuleNotFoundError: No module named 'python_speech_features',从哪里来的,大神们求告知
大家千万别相信他的结果,这个所谓的"准确"其实是作弊的方式得来的,大家可以随便测试,只要是自己录制的声音,哪怕你照着标签念一遍,你就会发现,这预测结果很狗屎,无论你用什么设备录制,真不知道,靠这种把式弄这么个结果有啥意思,大家看到这个语音识别就请绕道把,一点鸟用没有,别浪费时间
读入语音文件: /opt/wav/test/D13/D13_992.wav
开始识别语音数据......
语音原始文本: 山东省 烟台 奥尔 呼 斯 药业 有限公司 近日 研制 成功 外用 降血压 新药 利 压 平 霜
识别出来的文本: 局内但内但阿内碗内碗但内碗但碗内碗局碗章来琼章罔汁碗章局罔碗汁碗章内汁局汁陈迷内碗扬但陈肥碗肥碗来内碗电罔汁来肥来据来罔碗汁章汁碗汁扭汁罔汁碗来碗来汁语汁语碗语碗罔电局琼电琼电章琼来碗汁碗内碗内电无碗章碗汁碗内碗内汁碗来内来陈汁陈内电阿语碗汁碗来碗来罔来罔来陈来陈罔电碗电碗电章碗来碗局碗局引罔来汁来碗来局支章汁碗汁电来碗殖电汁琼很章祖汁来内来罔电罔来罔来锦来肥电碗著碗章碗汁碗汁单来碗来电汁语汁陈碗陈来很碗肥汁碗罔电罔电来电汁西支音
读入语音文件: /opt/wav/test/D13/D13_823.wav
开始识别语音数据......
语音原始文本: 五月 的 一天 下大雨 阳 台上 漏 进 许多 雨水 可 又 没有 排 水洞 只好 一盆盆 往 外 端
识别出来的文本: 电内电内碗罔碗汁碗章来章碗局碗汁局汁碗来碗章碗章碗章碗章碗琼来碗章碗扬碗电碗电罔引碗局碗局来很来碗来碗电扬电扬碗罔碗肥碗内碗章碗局碗局碗章汁碗章碗罔琼汁来汁锦汁碗来局汁碗汁碗汁语碗很碗汁碗汁碗汁碗汁来汁碗汁语扬碗罔碗支碗汁碗汁碗局碗肥碗局碗汁碗来碗来碗罔碗扬碗肥引碗汁碗章碗罔碗罔来罔来支电碗来碗来单内碗汁肥很汁肥汁章碗汁碗汁碗汁碗局碗罔碗局汁局罔局很局汁局引支章碗罔琼罔汁来
请问下大佬,在你的模型基础上我有添加了新的语音文件和相应的标签文件,但是这会导致词表变大,直接导致载入你的模型在layer6的节点和新的词表不匹配,请问如果想在你的模型基础上做迁移学习应该这么做呢?
请问在哪里可以找到你训练完的模型用于测试呢
tensorflow.python.framework.errors_impl.NotFoundError: FindFirstFile failed for: /home/kevin/workspaces/python/speech-new-test : ϵͳ\udcd5Ҳ\udcbb\udcb5\udcbdָ\udcb6\udca8\udcb5\udcc4·\udcbe\udcb6\udca1\udca3
; No such process
用你的模型测试会有这种错误,求问怎么解决
用CPU训练太慢。能否分享一下大佬训练好的model?
结果都是这样子的:
循环次数: 69 损失: 220.8472399030413
错误率: 0.87767243
语音原始文本: 风平浪静 的 漆黑 午夜 王云 州 王新 槐 王 羊子 首先 发现 怪物 闪 着 红光 的 眼睛
识别出来的文本: 这 人龚龚
这个模型是音频直接到字的, 训练错误率达0.8, 楼主说哟个改进的代码,可是那个是音频到拼音再到字的,不一样。
我的数据就是用test的。训练和测试时都用这个。 为什么错误率那么高?
比如具体的数据采集,还有配置文件。
还有结果能不能只是出一个调用的D L L,返回识别结果和相似度。
谢大佬。
config.read(os.path.join(current_dir, "conf", "%s.ini" % file_name))
File "C:\Users\zxchong\AppData\Local\Programs\Python\Python36\lib\configparser.py", line 697, in read
self._read(fp, filename)
File "C:\Users\zxchong\AppData\Local\Programs\Python\Python36\lib\configparser.py", line 1015, in _read
for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 56: illegal multibyte sequence
请问大佬这是什么问题啊?
请问一下,这个函数audiofile_to_input_vector里面,为什么特征值要这样捣鼓一翻,有什么资料可以助我理解一下?
but when train done i test model text is not recognition
wav_files[0]: data_test\VIVOSDEV01_R117.wav
Testing......
trans_array_to_text_ch len: 65
words [' ', 'A', 'B', 'C', 'D', 'E', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'X', 'Y', 'À', 'Á', 'Â', 'Ã', 'È', 'É', 'Ê', 'Ì', 'Í', 'Ò', 'Ó', 'Ô', 'Õ', 'Ù', 'Ú', 'Ý', 'Ă', 'Đ', 'Ĩ', 'Ũ', 'Ơ', 'Ư', 'Ạ', 'Ả', 'Ấ', 'Ầ', 'Ẩ', 'Ẫ', 'Ậ', 'Ắ', 'Ằ', 'Ẳ', 'Ặ', 'Ẹ', 'Ẻ', 'Ẽ', 'Ế', 'Ề', 'Ể', 'Ễ', 'Ệ', 'Ỉ', 'Ị', 'Ọ', 'Ỏ', 'Ố', 'Ồ', 'Ổ', 'Ỗ', 'Ộ', 'Ớ', 'Ờ', 'Ở', 'Ỡ', 'Ợ', 'Ụ', 'Ủ', 'Ứ', 'Ừ', 'Ử', 'Ữ', 'Ự', 'Ỳ', 'Ỷ', 'Ỹ'] value [21 67 21 24 48 24 48 24 48 24 4 24 4 24 4 16 4 24 21 16 21 24 21 4
21 4 21 4 24 4 21 48 21 4 24 16 21 16 21 16 21 16 4 87 21 16 21 16
24 21 24 4 24 4 16 45 16 24 4 16 4 16 4 46 16] results XỎXÁẦÁẦÁẦÁDÁDÁDRDÁXRXÁXDXDXDÁDXẦXDÁRXRXRXRDỸXRXRÁXÁDÁDRẠRÁDRDRDẢR
orig: VIỆC MỘT QUYỂN TIỂU THUYẾT NHƯ THẾ
decoded_str: XỎXÁẦÁẦÁẦÁDÁDÁDRDÁXRXÁXDXDXDÁDXẦXDÁRXRXRXRDỸXRXRÁXÁDÁDRẠRÁDRDRDẢR
错误内容如下:
Original stack trace for 'b1/Initializer/random_normal/RandomStandardNormal':
File "C:/Users/Administrator/Desktop/speech_recognition-master/test.py", line 18, in
bi_rnn.build_test()
File "C:\Users\Administrator\Desktop\speech_recognition-master\model.py", line 334, in build_test
self.bi_rnn_layer()
File "C:\Users\Administrator\Desktop\speech_recognition-master\model.py", line 77, in bi_rnn_layer
b1 = self.variable_on_device('b1', [n_hidden_1], tf.random_normal_initializer(stddev=b_stddev))
File "C:\Users\Administrator\Desktop\speech_recognition-master\model.py", line 348, in variable_on_device
var = tf.get_variable(name=name, shape=shape, initializer=initializer)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1496, in get_variable
aggregation=aggregation)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1239, in get_variable
aggregation=aggregation)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 562, in get_variable
aggregation=aggregation)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 514, in _true_getter
aggregation=aggregation)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 929, in _get_single_variable
aggregation=aggregation)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 259, in call
return cls._variable_v1_call(*args, **kwargs)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 220, in _variable_v1_call
shape=shape)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 198, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2511, in default_variable_creator
shape=shape)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 263, in call
return super(VariableMetaclass, cls).call(*args, **kwargs)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 1568, in init
shape=shape)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 1698, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 901, in
partition_info=partition_info)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\init_ops.py", line 323, in call
shape, self.mean, self.stddev, dtype, seed=self.seed)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\random_ops.py", line 79, in random_normal
shape_tensor, dtype, seed=seed1, seed2=seed2)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_random_ops.py", line 763, in random_standard_normal
seed2=seed2, name=name)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op
op_def=op_def)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()
Process finished with exit code 1
请问这是什么问题?
读入语音文件: E:\ASR\DeepSpeechRecognition-master\data\data_thchs30\test\D11_773.wav
开始识别语音数据......
语音原始文本: ** 婴幼儿 营养 专家 还 与 厦门市 同行 就 婴幼儿 的 生长 发育 进行 专题 探讨
识别出来的文本: 唢商唢诊礼输见输见前见前见输见输见诊输诊前姑锐含自慧自慧锐慧诊锐诊姑诊锐诊歪诊姑锐诊姑诊伸诊伸诊锐诊见 姑至自锐诊锐伸姑侧姑诊姑锐姑自姑自锐自锐究慧诊伸诊姑诊泥泉诊抛诊姑诊姑锐姑诊泥姑泥泡泥僻锐究锐靠究姑锐泡锐姑泡姑锐肿锐诊慧诊慧诊锐诊泡诊天诊姑锐姑锐姑锐姑僻输究靠究泥僻姑锐唢锐伸肿伸姑诊姑诊僻诊伸锐姑伸姑伸翻伸泥档伸姑伸姑诊姑泡姑伸桩究途玄泡锐泡姑诊姑诊姑伸右见前见诊姑归置
你好,我在运行您的test.py
文件的时候,代码报错,不知道什么情况,这是我的测试代码:
wav_files = ['/home/zh/sda2/语音转文本/speech_recognition/data/test/D8/D8_999.wav']
txt_labels = ['国务委员 兼 国务院 秘书长 罗干 民政部 部长 多吉 才 让 也 一同 前往 延安 看望 人民群众']
words_size, words, word_num_map = utils.create_dict(txt_labels)
bi_rnn = BiRNN(wav_files, txt_labels, words_size, words, word_num_map)
bi_rnn.build_target_wav_file_test(wav_files, txt_labels)
配置文件:
[FILE_DATA]
wav_path=/opt/wav/test/
label_file=/home/zh/sda2/语音转文本/speech_recognition/model/test.word.txt
savedir=/home/zh/sda2/语音转文本/speech_recognition/model
savefile=speech.cpkt
tensorboardfile=/home/zh/sda2/语音转文本/speech_recognition/model
checkpoint文件内容:
model_checkpoint_path: "/home/zh/sda2/语音转文本/speech_recognition/model/speech.cpkt-101"
all_model_checkpoint_paths: "/home/zh/sda2/语音转文本/speech_recognition/model/speech.cpkt-101"
但是我得到这样的报错:
raceback (most recent call last):
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [36] rhsshape= [1788]
[[Node: save/Assign_14 = Assign[T=DT_FLOAT, _class=["loc:@b6"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](b6/Adam_1, save/RestoreV2/_29)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 24, in <module>
bi_rnn.build_target_wav_file_test(wav_files, txt_labels)
File "/scratch2/hzhou/AI_voice/speech_recognition/model.py", line 344, in build_target_wav_file_test
self.init_session()
File "/scratch2/hzhou/AI_voice/speech_recognition/model.py", line 199, in init_session
self.saver.restore(self.sess, ckpt)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1752, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [36] rhsshape= [1788]
[[Node: save/Assign_14 = Assign[T=DT_FLOAT, _class=["loc:@b6"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](b6/Adam_1, save/RestoreV2/_29)]]
Caused by op 'save/Assign_14', defined at:
File "test.py", line 24, in <module>
bi_rnn.build_target_wav_file_test(wav_files, txt_labels)
File "/scratch2/hzhou/AI_voice/speech_recognition/model.py", line 344, in build_target_wav_file_test
self.init_session()
File "/scratch2/hzhou/AI_voice/speech_recognition/model.py", line 186, in init_session
self.saver = tf.train.Saver(max_to_keep=1) # 生成saver
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1284, in __init__
self.build()
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1296, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1333, in _build
build_save=build_save, build_restore=build_restore)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 781, in _build_internal
restore_sequentially, reshape)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 422, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 113, in restore
self.op.get_shape().is_fully_defined())
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 219, in assign
validate_shape=validate_shape)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 60, in assign
use_locking=use_locking, name=name)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
op_def=op_def)
File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [36] rhs shape= [1788]
[[Node: save/Assign_14 = Assign[T=DT_FLOAT, _class=["loc:@b6"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](b6/Adam_1, save/RestoreV2/_29)]]
请问是您提供的模型有问题还是别的原因呢?请给我一些指导,谢谢
您好,请问您的模型是基于哪个算法实现的呢?是DeepSpeech吗?
作者您好
请问增加语音素材怎么继续训练呢
我更换素材后运行 train.py 报错:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [317] rhs shape= [212]
我没有使用全部的数据进行训练,但是训练错误率一值为零是为什么?
大佬,我测试用的thchs30中的test,但是运行时并没有遍历一遍test中的语音
测试的整个过程如下
C:\Users\zxchong\Desktop\speech_recognition-master> python test.py
C:\Users\zxchong\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
F:\xunlei\data\data_thchs30\test\D11_750.wav 东北军 的 一些 爱国 将士 马 占 山 李杜 唐 聚 伍 苏 炳 艾 邓 铁梅 等 也 奋起 抗战
wav: 2132 label 2132
字表大小: 1787
ckpt: C:\Users\zxchong\Desktop\speech_recognition-master\voice\model\speech.cpkt-101
101
读入语音文件: F:\xunlei\data\data_thchs30\test\D11_773.wav
开始识别语音数据......
语音原始文本: ** 婴幼儿 营养 专家 还 与 厦门市 同行 就 婴幼儿 的 生长 发育 进行 专题 探讨
识别出来的文本: ** 婴幼儿 营养 专家 还 厦门市 同行 就 婴幼儿 的 生长 发育 进行 专题 探讨
读入语音文件: F:\xunlei\data\data_thchs30\test\D11_774.wav
开始识别语音数据......
语音原始文本: 像 张 王村 八十年代 四百 户 拥有 耕牛 七百 多头 而今 五百 五十多 农户 仅 拥有 耕牛 十 一头
识别出来的文本: 像 张 王村 八十年代 四百 户 拥有 耕牛 七百 多头 今 五百 五十多 农户 仅 拥有 耕牛 十 一头
读入语音文件: F:\xunlei\data\data_thchs30\test\D11_775.wav
开始识别语音数据......
语音原始文本: 看 羊 狗 跑前跑后 一只 惊 飞 的 山雀 惹得 它 汪汪汪 咬 几 声 嗡 嗡嗡 的 在 山间 回荡
识别出来的文本: 看 羊 狗 跑前跑后 一只 惊 飞 的 山雀 惹得 汪汪汪 咬 几 声 省 嗡 嗡嗡 的 在 山间 回荡
读入语音文件: F:\xunlei\data\data_thchs30\test\D11_776.wav
开始识别语音数据......
语音原始文本: 承运人 有权 要求 托运 人 填写 航空 货运单 托运 人 有权 要求 承运人 接受 该 航空 货运单
识别出来的文本: 承运人 有权 要求 托运 人 填写 航空 货运单 托运 人 有权 要求 承运人 接受 该 航空 货运单
读入语音文件: F:\xunlei\data\data_thchs30\test\D11_779.wav
开始识别语音数据......
语音原始文本: 当然 他们 不 像 鳄鱼 那样 吞下 石块 当做 压 舱 石 而是 磨成 粉 以后 才 服用
识别出来的文本: 当 他 不 鳄鱼 鳄样 那 吞下 石块 做 压 石 到 多声 草 用
读入语音文件: F:\xunlei\data\data_thchs30\test\D11_780.wav
开始识别语音数据......
语音原始文本: 刀 光 枪 影 白骨 如 麻 我 仿佛 听到 五 十年前 三十万 冤魂 的 呐喊 痛 贯 心肝
识别出来的文本: 刀 光 枪 影 白骨 如 麻 我 仿佛 听到 五 十年前 三如万 冤魂 呐喊 痛 贯 心肝
读入语音文件: F:\xunlei\data\data_thchs30\test\D11_781.wav
开始识别语音数据......
语音原始文本: 不仅 要 宣传 少生 还要 宣传 晚婚晚育 优生优育 宣传 生 男生 女 都一样
识别出来的文本: 不仅 要 宣传 少生 还要 宣传 晚婚晚育 优生优育 宣传 生 男生 女 都一样
读入语音文件: F:\xunlei\data\data_thchs30\test\D11_782.wav
开始识别语音数据......
语音原始文本: 七 我国 培育 出 首 株 抗病毒 转基因 小麦 为 小麦 抗病 育种 奠定 了 坚实 基础
识别出来的文本: 七 我国 培育 出 首 株 抗病毒 转基因 小麦 为 为 小麦 抗病 育种 奠定 了 坚实 基础
读入语音文件: F:\xunlei\data\data_thchs30\test\D11_783.wav
开始识别语音数据......
语音原始文本: 天空 一些 云 忙 走 月亮 陷进 云 围 时 云和 烟 样 和 煤 山 样 快要 燃烧 似地
识别出来的文本: 天空 一些 云 忙 走 月亮 陷进 云 围 时 云和 烟 样 和 煤 山 样 快要 燃烧 似地
读入语音文件: F:\xunlei\data\data_thchs30\test\D11_785.wav
开始识别语音数据......
语音原始文本: 北京 丰台区 农民 自己 花钱 筹办 万 佛 延寿 寺 迎春 庙会 吸引 了 区内 六十 支 秧歌队 参赛
识别出来的文本: 北京 丰台区 农民 自己 花钱 筹办 万 佛 延寿 寺 迎春 庙会 吸引 了 区内 六十 支 秧歌队 参赛
PS C:\Users\zxchong\Desktop\speech_recognition-master>
大佬知道这是什么问题吗?
请大佬帮忙看一眼,,谢谢了
The Precision and Recall is amazing for small corpus, through it's the contribution of tensorflow it's own.
The author delivered codes with a clear architecture, consized and full functional.
For me, this code is the best one, comparing to several other similar codes.
朋友的代码非常诚恳。感谢!
相比之下,有个AS*什么的code,训练代码根本跑不起来。结果提供训练好的模型,没什么用。
I've try several ways to test my custom test audio file including use last 4 lines in test.py(actually it's 5 line) and change file location in conf.ini and none of these work for me,I think it's a code problem and I cant find the reason now,you did a really good job build this mandarin speech model! please fix test problem and make it better!
训练好的模型怎么识别不在训练内的语音文件呢?比如我目前训练了清华大学中文语料库(thchs30),现在我自己说一句话怎么识别?一换语音文件测试就报错了呢
你好,我在测试的时候,由于环境的限制,我不得不改在CPU上运行,但是我更改了model.py
中如下代码:
def variable_on_device(self, name, shape, initializer):
with tf.device('/cpu:0'):
var = tf.get_variable(name=name, shape=shape, initializer=initializer)
return var
然后运行我得到如下错误:
Traceback (most recent call last):
File "tf_speech.py", line 27, in <module>
re = X.speech_to_text('./test_voice/D4_750.wav')
File "tf_speech.py", line 22, in speech_to_text
res = bi_rnn.build_target_wav_file_test(wav_files, self.text_labels)
File "/home/zh/sda2/sg-ai/Detox_AI/tf_recong/model.py", line 213, in build_target_wav_file_test
self.init_session()
File "/home/zh/sda2/sg-ai/Detox_AI/tf_recong/model.py", line 184, in init_session
ckpt = tf.train.latest_checkpoint(self.savedir)
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1805, in latest_checkpoint
ckpt = get_checkpoint_state(checkpoint_dir, latest_filename)
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1061, in get_checkpoint_state
text_format.Merge(file_content, ckpt)
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_format.py", line 536, in Merge
descriptor_pool=descriptor_pool)
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_format.py", line 590, in MergeLines
return parser.MergeLines(lines, message)
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_format.py", line 623, in MergeLines
self._ParseOrMerge(lines, message)
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_format.py", line 638, in _ParseOrMerge
self._MergeField(tokenizer, message)
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_format.py", line 763, in _MergeField
merger(tokenizer, message, field)
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_format.py", line 888, in _MergeScalarField
value = tokenizer.ConsumeString()
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_format.py", line 1251, in ConsumeString
the_bytes = self.ConsumeByteString()
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_format.py", line 1266, in ConsumeByteString
the_list = [self._ConsumeSingleByteString()]
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_format.py", line 1291, in _ConsumeSingleByteString
result = text_encoding.CUnescape(text[1:-1])
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_encoding.py", line 103, in CUnescape
result = ''.join(_cescape_highbit_to_str[ord(c)] for c in result)
File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/google/protobuf/text_encoding.py", line 103, in <genexpr>
result = ''.join(_cescape_highbit_to_str[ord(c)] for c in result)
IndexError: list index out of range
请问我能怎么在CPU上运行呢?谢谢
当我模型训练完毕以后,是否可以用于识别简短的词语(长度为:2~5的汉字)?
一定要传入一个lable才行?
我在清华数据集中没有找到test.word.txt,你能传一下么,还有个问题,我可以在gpu上面训练么,速度好慢呀
您好,我用您的模型测试您提供的数据,识别很准确,但是我把您音频中的信息自己念一遍,然后再识别,就发现一点不准,这是为什么呢?难道是数据的采样率不一致导致的吗?我使用的采样率是16K
楼主, 您好,我用所有D_**.wav文件训练完事后,训练样本内的准确率很高,但测试样本准确率很低,这个问题不知道楼主有没有遇到,谢谢!
root@zp:/home/zp/Desktop/voice# python3 test.py
./train_data/wav/test/D21/D21_847.wav **队 首先 出场 的 是 赖 亚文 孙悦 李艳 吴 咏梅 崔 咏梅 和 何 琦 这 也是 首选 阵容
wav: 2132 label 2132
字表大小: 1787
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1305, in _run_fn
self._extend_graph()
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'h6/Adam_1': Operationwas explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Makesure the device specification refers to a valid device.
[[Node: h6/Adam_1 = VariableV2_class=["loc:@h6"], container="", dtype=DT_FLOAT, shape=[512,1788], shared_name="", _device="/device:GPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 18, in
bi_rnn.build_test()
File "/home/zp/Desktop/voice/model.py", line 336, in build_test
self.init_session()
File "/home/zp/Desktop/voice/model.py", line 192, in init_session
self.sess.run(tf.global_variables_initializer())
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'h6/Adam_1': Operationwas explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Makesure the device specification refers to a valid device.
[[Node: h6/Adam_1 = VariableV2_class=["loc:@h6"], container="", dtype=DT_FLOAT, shape=[512,1788], shared_name="", _device="/device:GPU:0"]]
Caused by op 'h6/Adam_1', defined at:
File "test.py", line 18, in
bi_rnn.build_test()
File "/home/zp/Desktop/voice/model.py", line 335, in build_test
self.loss()
File "/home/zp/Desktop/voice/model.py", line 156, in loss
self.optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(self.avg_loss)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 424, in minimize
name=name)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 600, in apply_gradients
self._create_slots(var_list)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/adam.py", line 132, in _create_slots
self._zeros_slot(v, "v", self._name)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 1150, in _zeros_slot
new_slot_variable = slot_creator.create_zeros_slot(var, op_name)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py", line 181, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py", line 155, in create_slot_with_initializer
dtype)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py", line 65, in _create_slot_var
validate_shape=validate_shape)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1317, in get_variable
constraint=constraint)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1079, in get_variable
constraint=constraint)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 425, in get_variable
constraint=constraint)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 786, in _get_single_variable
use_resource=use_resource)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 2220, in variable
use_resource=use_resource)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 2210, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 2193, in default_variable_creator
constraint=constraint)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 235, in init
constraint=constraint)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 349, in _init_from_args
name=name)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 137, in variable_op_v2
shared_name=shared_name)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 1255, in variable_v2
shared_name=shared_name, name=name)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'h6/Adam_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
[[Node: h6/Adam_1 = VariableV2_class=["loc:@h6"], container="", dtype=DT_FLOAT, shape=[512,1788], shared_name="", _device="/device:GPU:0"]]
训练了25epoch后进行模型的测试,修改配置的数据路径为test.出现以下错误:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1787] rhs shape= [2664]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.