Git Product home page Git Product logo

z-yq / tensorflowasr Goto Github PK

View Code? Open in Web Editor NEW
458.0 22.0 110.0 272.62 MB

一个执着于让CPU\端侧-Model逼近GPU-Model性能的项目,CPU上的实时率(RTF)小于0.1

License: Apache License 2.0

Python 66.72% C++ 32.32% CMake 0.04% Shell 0.01% C 0.91%
transformer bert tensorflow2 automatic-speech-recognition state-of-the-art ctc listen-attend-and-spell transducers cpp tensorflow-cpp

tensorflowasr's People

Contributors

z-yq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorflowasr's Issues

标点预测模型转换TFLITE问题

您好,根据您百度盘分享的H5模型文件,进行转换,提示缺少模型配置。是否可以存一个带模型图结构和参数的H5文件,麻烦您了。

ValueError: No model config found in the file at models/model_0.h5.

train failed

(tf2) root@adminer-X10SRA:~/debug# python train_am.py
2020-11-03 14:07:11.171313: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-03 14:07:11.213020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.635GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2020-11-03 14:07:11.213266: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-03 14:07:11.214429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-03 14:07:11.215292: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-03 14:07:11.215560: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-03 14:07:11.216746: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-03 14:07:11.217707: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-03 14:07:11.220552: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-03 14:07:11.222021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-11-03 14:07:11,222 - root - INFO - valid gpus:1
2020-11-03 14:07:11.236938: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-11-03 14:07:11.243922: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2500075000 Hz
2020-11-03 14:07:11.244808: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fb304000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-03 14:07:11.244827: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-03 14:07:11.386453: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5642b12f8f90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-03 14:07:11.386519: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-11-03 14:07:11.388405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.635GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2020-11-03 14:07:11.388517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-03 14:07:11.388569: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-03 14:07:11.388614: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-03 14:07:11.388660: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-03 14:07:11.388705: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-03 14:07:11.388751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-03 14:07:11.388798: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-03 14:07:11.392179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-11-03 14:07:11.392285: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-03 14:07:11.395987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-03 14:07:11.396028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-11-03 14:07:11.396044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-11-03 14:07:11.399441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9620 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
not found state file
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
2020-11-03 14:07:12,898 - tensorflow - INFO - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
2020-11-03 14:07:12.948908: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-03 14:07:13.979204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:14,960 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:14,961 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,066 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,067 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,169 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,169 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,257 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,258 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,346 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,346 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[Train] [Epoch 1/2] | | 0/216 [00:00<?, ?batch/s]

def generator(self,train=True):
    while 1:
        x, wavs,bert_feature, input_length, words_label, words_label_length, phone_label, phone_label_length, py_label, py_label_length, txt_label, txt_label_length=self.generate(train)

        guide_matrix = self.guided_attention(input_length, txt_label_length, np.max(input_length),
                                             txt_label_length.max())
        yield x, wavs, bert_feature,input_length, words_label, words_label_length, phone_label, phone_label_length, py_label, py_label_length, txt_label, txt_label_length,guide_matrix

没有新数据

transducer解码时lstm存在大量重复计算

在Transducer类中的perform_greedy方法里,循环中的pred_net用到了lstm进行解码,每次都会对所有decoded的完整序列走一遍lstm,实际只需要解码最后一步来判断是否输出0或者blank即可,在长序列预测下能节省大量时间image

mel_layer可能存在问题

你好:

基于你给的框架我用一维膨胀卷积(dilate Conv1D)搭建了一个TDNNTransducer的网络,尝试训练时遇到下面这种情况的报错:

indices[1] = [1 , 776, 22] does not index into shape [4, 776, 23]

错误位置在
这里
last_grads_blank = -1 * tf.scatter_nd( tf.concat([tf.reshape(tf.range(batch_size, dtype=tf.int32), shape=[batch_size, 1]), indices], axis=1), tf.ones(batch_size, dtype=tf.float32), [batch_size, input_max_len, target_max_len])

打印中间的信息:

input_max_len = 776

logit_length = [398, 777, 378, 777]

indices = 
[[0 397 12]
 [1 776 22]
 [2 377 12]
 [3 776 22]]

[batch_size, input_max_len, target_max_len]  = [4, 776, 23]

tf.scatter_nd() 使用时要求indices 每一列的最大值要比 [batch_size, input_max_len, target_max_len] 对应的值小,否则就会报出前面的索引错误。

至于为什么会出现相等的情况(示例中indices是776,与input_max_len值相等),从logit_length 可以看出,网络计算的logits对应维度大小为777,而给定的是776,实际上logits对应维度应该是776,计算indices时 776-1 = 775 ,这样才能满足条件。

这里可以看到,input_max_len是生成batch时,用传统方法提取声学特征计算出来的,当使用mel_layer时,是直接输入的raw数据,也就是说用传统方法提取声学特征和mel_layer自动处理raw数据生成的特征帧数是不一致的,mel_layer多了一帧

将 use_mel_layer 设置为False后,训练正常了,验证了我的判断。

另外,这个错误不是立刻出现的,是经过十几个batch的正常计算后出现的。

成功保存checkpoint 后程序不执行了。但gpu还在占用

环境:ubuntu16 tensorflow2.2
数据集:aishell1
配置文件:conformer.yml model_config.name=ConformerCTC\ConformerLAS
运行代码:python train_am.py --data_config ./configs/am_data.yml --model_config ./configs/conformer.yml

2020-11-16 15:36:09,744 - numba.core.ssa - DEBUG - on stmt: $478.2 = cast(value=$const478.1)
2020-11-16 15:36:09,744 - numba.core.ssa - DEBUG - on stmt: return $478.2
INFO:tensorflow:batch_all_reduce: 203 all-reduces with algorithm = nccl, num_packs = 1
2020-11-16 15:36:18,367 - tensorflow - INFO - batch_all_reduce: 203 all-reduces with algorithm = nccl, num_packs = 1
[Train] [Epoch 123/2] |▋ | 500/14016 [34:39<12:23:38, 3.30s/batch, Successfully Saved Checkpoint]

大家有遇到这个问题的吗?
有知道是什么原因的吗?

关于使用Tester 来大批量测试

请问可以直接下载您训练好的模型,如ConformerCTC(S),用来测试吗?
我使用您训练好的ConformerCTC(S),在准备好的test_list下,当运行python eval_am.py --data_config ./asr/configs/am_data.yml --model_config ./asr/configs/ConformerS.yml 的时候会出现一下错误:
outputs = call_fn(inputs, *args, **kwargs)
TypeError: call() got multiple values for argument 'training'
想请问一下这怎么解决

No reference in the code

Did you copy the code from my repository https://github.com/usimarit/TiramisuASR?
I looked into the code and I see it look like the code I wrote.
It would be nice if you could write references to where you learn and take source code or any kind of knowledge involving in this repo. Other authors would appreciate it.

transducer data process

hello:
I want to train rnn-transducer, where to use prepand_blank() of TextFeaturizer? thank you.

随机出现“generator”取数据报错及处理

你好:

有一个小小的疑问,

在CPU上训练,linux 16.04,使用aishell_1中的几个人的数据(2100条音频,验证代码用);训练 ConformerTransducer, 其它参数默认。

020-10-05 10:28:11,241 - root - INFO - trainer resume failed020-10-05 10:28:11,241 - root - INFO - trainer resume failed
[Train] [Epoch 1/2] |                    | 7/2096 [00:36<2:07:57,  3.68s/batch, transducer_loss=373.089]
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-10-05 10:28:47,972 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-10-05 10:28:48,185 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7648c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
...
[Train] [Epoch 1/2] |████▊               | 500/2096 [09:30<26:07,  1.02batch/s, Successfully Saved Checkpoint]
...
[Train] [Epoch 1/2] |█████▏              | 547/2096 [10:39<23:14,  1.11batch/s, transducer_loss=85.972]
...
ValueError: `generator` yielded an element of shape (0,) where an element of shape (None, None, 80, 1) was expected.

第547步出现报错,但是报错并不是只出现在某个固定的步数,是随机出现的。

经过对内部数据出里过程的了解,我发现你在数据处理脚本中做了如下的过滤处理:

if len(data) < 400:
    continue
elif len(data) > self.speech_featurizer.sample_rate * 7:
    continue

也就是说当音频数据(16K采样)时长小于25ms以及大于7s的时候,丢弃。当一个batch的所有音频数据时长都大于7s时,全丢弃,generator就生成None,也就造成上述的错误。

解决方法也很简单,把数字7改大一点就行。

那么问题来了,小于25ms的数据丢弃我可以理解,那大于7s 的也丢弃是为什么呢,超过7s会造成模型识别效果变差所以不用的吗?

你在处理AISHELL2数据集的时候是把所有大于7s的音频都丢弃不用吗?

此外,tensorflow - WARNING部分是什么情况,没看明白?

在执行train_am.py 文件时出现如下警告

我在执行下面代码的时候。出现如下警告。其他人有吗?
python train_am.py --data_config ./configs/am_data.yml --model_config ./configs/conformer.yml

WARNING:tensorflow:7 out of the last 9 calls to <function MultiHeadAttention.call at 0x7f2b902fb840> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-11-17 10:08:31,362 - tensorflow - WARNING - 7 out of the last 9 calls to <function MultiHeadAttention.call at 0x7f2b902fb840> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

关于推断速度

你好,请问你写的推断速度是指一次推断吗?因为我这试着一句话的推断还是挺慢的,每一个输出都依赖上一次输出,推断次数是帧数乘以你写的毫秒数吧?

关于Conformer模型中使用的位置编码的一些疑惑

这是一个很棒的项目。据我所知,你是第一个用tensorflow将目前最先进的ASR模型整合实现的人。
在我读到Conformer模型的实现时产生了两个疑惑。第一个疑惑是你在Conformer中的每一个block都加入了位置编码,这与我在其他地方了解到的实现有点不一样。这样做会降低模型的错误率吗?
另外是你只使用了Conformer中的卷积结构,而没有使用相对位置编码,这样做会对模型产生什么影响?

想问一下楼主这个出错是什么问题

Traceback (most recent call last):
File "run-test.py", line 82, in
asr = ASR(am_config, lm_config,punc_config)
File "run-test.py", line 12, in init
self.lm = LM(lm_config,punc_config)
File "/home/w/ASR/LMmodel/trm_lm.py", line 16, in init
self.am_featurizer = TextFeaturizer(config['am_token'])
File "/home/w/ASR/utils/text_featurizers.py", line 78, in init
self.stop = self.endid()
File "/home/w/ASR/utils/text_featurizers.py", line 90, in endid
return self.token_to_index['']
KeyError: ''

Incompatible shapes in lm_runner

tensorflow.python.framework.errors_impl.InvalidArgumentError: 4 root error(s) found.
  (0) Invalid argument:  Incompatible shapes: [1,51,768] vs. [1,52,768]
         [[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]
         [[Adamax/concat_8/_2086]]
  (1) Invalid argument:  Incompatible shapes: [1,51,768] vs. [1,52,768]
         [[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]
         [[replica_2/Cast_5/_1460]]
  (2) Invalid argument:  Incompatible shapes: [1,51,768] vs. [1,52,768]
         [[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]
         [[replica_1/transformer/decoder/sequential_5/dense_42/Tensordot/Prod/_1302]]
  (3) Invalid argument:  Incompatible shapes: [1,51,768] vs. [1,52,768]
         [[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]

对应的是 def bert_feature_loss(self, real, pred):

我现在的方案是 判定 如果 real 和 pred shape 不同,就先跳过

关于标点恢复模型OOV的问问

您好,我在使用标点恢复模型时,发现存在未登录词的情况,我看词表里没有“unk” 这样的字符, 对于不存在字表里的中文字会报错

希望添加命令词模型或例子教程

现在几乎所有语音库感觉都是语音训练库,不好直接使用,名不副实。
希望分个库专门提供特定语音识别功能(功能,而不是学术倾向的训练器)。提供VAD函数及分析函数,参数是音频,返回是否含有语音,及音频识别结果。
库的功能可以是:语音转命令/词组、语音转音素、语音转句子。

我现在最需要的是语音转音素,用在单片机,训练出来的库越小越好,不能超过1M尺寸。识别库需要支持c/c++。
作为参考,上海乐鑫ESP32这MCU只有4M存储500KB不到的内存,但是官方的语音识别可以做到语音转音素,但不开源。

关于train_loss、ctc_loss和translate_loss的关系有些疑问

您好,我在代码中看了一下loss的计算。
我的理解是ctc_loss是encoder+decoder的loss,translate_loss是translator的loss,train_loss=ctc_loss+translate_loss*2,最后用train_loss对整个模型进行端到端的训练,是这样吗
但是我看训练过程打印出来的loss不符合这个规律,比如某时刻是train_loss=210.327, ctc_loss=3.709, translate_loss=1.432,是我的理解有什么问题吗,希望可以得到解答,谢谢!

How to streaming?

你好:
关于流式处理,请问
我看到现在的测试推理都是直接把整个音频输入模型,好像并不是流式处理。
那么ComformerTransducer 或者 DeepSpeech2Transducer 要怎么实现流式的识别?

谢谢!

about AMmodel/am_tokens.txt

感谢分享。
请教一个小白的问题:看到am_tokens.txt里一共有1301个拼音,对应到AM model decoder的输出应该也是1304 纬 的向量。这里面am_tokens.txt里面的拼音是如何选定的呢?

hi, i re-implement conformer based on your project, when i train conformer with ctc loss on chinese dataset, loss is fluctuating

hi, thanks to your peoject, i re-implement conformer for tf.1.15. When i train it with ctc loss on 1000-h chinese audio dataset, the loss is fluctuating and could not decline.
the conv-sampling is 3-layer conv2d with 144 filters and kernel_size=3, reduction_factor=4
the conformer is just like the bert-base but with relative position encoding using t-5
the optimizer is adam-with-weight-decay, params are default as bert-base
.
could you help me with this?

About ConformerTransfucer structure

Hello,I have another question. In your code ConformerTransfucer‘s TransducerPrediction is implemented with LSTM, CanTransducerPrediction be implemented with Conformer block? thank you

训练语言模型遇到问题

我尝试基于本项目训练语言模型。
修改了 configs/lm_data.yml

train_list: './common.all.1w'
eval_list: './common.all.1w'
...

bert:
  config_json: './LMmodel/bert/bert_config.json'
  bert_ckpt: './LMmodel/bert/bert_model.ckpt'
  bert_vocab: './LMmodel/bert/vocab.txt'

后,运行 python train_lm.py,总是失败。下面是出错日志:

2020-12-22 23:56:06,255 - root - INFO - start training language model
Traceback (most recent call last):
  File "train_lm.py", line 90, in <module>
    train.train()
  File "train_lm.py", line 45, in train
    self.runner.set_datasets(train, test)
  File "/home/user/TensorflowASR/trainer/base_runners.py", line 164, in set_datasets
    self.train_datasets=self.strategy.experimental_distribute_dataset(train)
  File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 805, in experimental_distribute_dataset
    return self._extended._experimental_distribute_dataset(dataset)  # pylint: disable=protected-access
  File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 638, in _experimental_distribute_dataset
    return input_lib.get_distributed_dataset(
  File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/input_lib.py", line 84, in
get_distributed_dataset
    return DistributedDataset(
  File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/input_lib.py", line 659, in __init__
    with ops.colocate_with(dataset._variant_tensor):
AttributeError: 'generator' object has no attribute '_variant_tensor'

我尝试基于 cpu和gpu的tf2.2.0,得到一样的错误日志。

有的语音识别不成功

大佬好,先给项目点个赞。
我使用CPP进行inference,为什么有的语音可以成功识别,有的语音运行后没结果呢,也没报错。是语音的问题吗?

流式识别

想做一个流式识别的demo,实时采集麦克风的数据。想到的思路就是 缓存接收麦克风的数据 隔一段时间(比如0.5s)送一次识别,检测到一句话结束后清除缓存。求教这样是否合理,有没有更优的方案。

语言模型训练太慢

我计划基于 TensorflowASR 训练一个语言模型。使用的数据规模在1000万句。使用默认的 train_lm.py,且按照说明把 BERT 模型也下载了下来。

按照目前的速度,一个 epoch要的时间要上百天。
我使用的是 3090Ti 单卡。

训练loss不下降

您好,我用conformerS模型在AISHELL-1数据集上训练了20个epoch,但是loss一直不下降,准确度为0,似乎没有训练效果。
config用的都是项目默认的配置,只有数据位置改了一下

data.config

speech_config:
mel_layer_type: Melspectrogram #Spectrogram/Melspectrogram/leaf
mel_layer_trainable: False #leaf support train
add_wav_info: False
sample_rate: 16000
frame_ms: 25
stride_ms: 10
num_feature_bins: 80
reduction_factor: 4 #should keep the same with model_config, DS2 : time_reduction_factor *= s[0] for s in 'conv_strides'
train_list: '/remote-home/jzhan/Datasets/AISHELL-1/train/transcripts.txt'
eval_list: '/remote-home/jzhan/Datasets/AISHELL-1/dev/transcripts.txt'
wav_max_duration: 30 # s
only_chinese: True
streaming: False
streaming_bucket: 0.5 #s
pinyin_map: './asr/configs/dict/pinyin2phone.map'
inp_config:
vocabulary: './asr/configs/dict/pinyin.txt'
blank_at_zero: False
beam_width: 1
tar_config:
vocabulary: './asr/configs/dict/lm_tokens.txt'
blank_at_zero: False
beam_width: 1
augments_config:
noise:
active: False
sample_rate: 16000
SNR: [0,15]
noises: './noise'
masking:
active: False
zone: (0.1,0.9)
mask_ratio: 0.3
mask_with_noise: False
pitch:
active: False
zone: (0.0,1.0)
sample_rate: 16000
factor: (-1,3)
speed:
active: False
factor: (0.9,1.2)
hz:
active: False
optimizer_config:
lr: 0.001
warmup_steps: 10000
beta1: 0.9
beta2: 0.98
epsilon: 0.000001
running_config:
batch_size: 32
train_steps_per_batches: 10
eval_steps_per_batches: 10
num_epochs: 20
outdir: './models'
log_interval_steps: 300
eval_interval_steps: 500
save_interval_steps: 500

conformerS.config

model_config:
name: OfflineConformerCTC
dmodel: 144
reduction_factor: 4
num_blocks: 13
head_size: 36
num_heads: 4
kernel_size: 32
fc_factor: 0.5
dropout: 0.1
ctcdecoder_num_blocks: 1
ctcdecoder_kernel_size: 32
ctcdecoder_fc_factor: 0.5
ctcdecoder_dropout: 0.1
translator_num_blocks: 2
translator_kernel_size: 32
translator_fc_factor: 0.5
translator_dropout: 0.1

训练结果
image
image
image

Fix model input error

更新代码后尝试训练transducer模型时遇到如下报错:

 Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'>

经过分析后发现问题出在这里:

实际需要的输入是这样
call(self, features, predicted=None, training=False)

features是一个信号或者特征的tensor,predicted是label的tensor

给模型的输入是这样
logits = self.model([wavs, pred_inp], training=True)
logits = self.model([features, pred_inp], training=True)

此时的features 变成了一个list从而导致在mel_layer里对 tensor进行tanspose操作时报错,应该把中括号去掉。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.