z-yq / tensorflowasr Goto Github PK
View Code? Open in Web Editor NEW一个执着于让CPU\端侧-Model逼近GPU-Model性能的项目,CPU上的实时率(RTF)小于0.1
License: Apache License 2.0
一个执着于让CPU\端侧-Model逼近GPU-Model性能的项目,CPU上的实时率(RTF)小于0.1
License: Apache License 2.0
I wanto use it in a windows system, there are something I need pay attention to ?
Hello,I have another question. In your code ConformerTransfucer‘s TransducerPrediction is implemented with LSTM, CanTransducerPrediction be implemented with Conformer block? thank you
可实现字级别时间戳吗? 标出每个字的开始时间和duration
感谢开源这么好的工作,请教一下是否支持单机多卡和多机多卡训练?
大佬好,先给项目点个赞。
我使用CPP进行inference,为什么有的语音可以成功识别,有的语音运行后没结果呢,也没报错。是语音的问题吗?
现在几乎所有语音库感觉都是语音训练库,不好直接使用,名不副实。
希望分个库专门提供特定语音识别功能(功能,而不是学术倾向的训练器)。提供VAD函数及分析函数,参数是音频,返回是否含有语音,及音频识别结果。
库的功能可以是:语音转命令/词组、语音转音素、语音转句子。
我现在最需要的是语音转音素,用在单片机,训练出来的库越小越好,不能超过1M尺寸。识别库需要支持c/c++。
作为参考,上海乐鑫ESP32这MCU只有4M存储500KB不到的内存,但是官方的语音识别可以做到语音转音素,但不开源。
hello:
I want to train rnn-transducer, where to use prepand_blank() of TextFeaturizer? thank you.
(tf2) root@adminer-X10SRA:~/debug# python train_am.py
2020-11-03 14:07:11.171313: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-03 14:07:11.213020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.635GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2020-11-03 14:07:11.213266: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-03 14:07:11.214429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-03 14:07:11.215292: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-03 14:07:11.215560: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-03 14:07:11.216746: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-03 14:07:11.217707: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-03 14:07:11.220552: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-03 14:07:11.222021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-11-03 14:07:11,222 - root - INFO - valid gpus:1
2020-11-03 14:07:11.236938: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-11-03 14:07:11.243922: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2500075000 Hz
2020-11-03 14:07:11.244808: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fb304000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-03 14:07:11.244827: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-03 14:07:11.386453: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5642b12f8f90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-03 14:07:11.386519: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-11-03 14:07:11.388405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.635GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2020-11-03 14:07:11.388517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-03 14:07:11.388569: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-03 14:07:11.388614: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-03 14:07:11.388660: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-03 14:07:11.388705: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-03 14:07:11.388751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-03 14:07:11.388798: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-03 14:07:11.392179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-11-03 14:07:11.392285: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-03 14:07:11.395987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-03 14:07:11.396028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-11-03 14:07:11.396044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-11-03 14:07:11.399441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9620 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
not found state file
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
2020-11-03 14:07:12,898 - tensorflow - INFO - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
2020-11-03 14:07:12.948908: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-03 14:07:13.979204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:14,960 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:14,961 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,066 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,067 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,169 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,169 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,257 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,258 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,346 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,346 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[Train] [Epoch 1/2] | | 0/216 [00:00<?, ?batch/s]
def generator(self,train=True):
while 1:
x, wavs,bert_feature, input_length, words_label, words_label_length, phone_label, phone_label_length, py_label, py_label_length, txt_label, txt_label_length=self.generate(train)
guide_matrix = self.guided_attention(input_length, txt_label_length, np.max(input_length),
txt_label_length.max())
yield x, wavs, bert_feature,input_length, words_label, words_label_length, phone_label, phone_label_length, py_label, py_label_length, txt_label, txt_label_length,guide_matrix
没有新数据
您好,我在代码中看了一下loss的计算。
我的理解是ctc_loss是encoder+decoder的loss,translate_loss是translator的loss,train_loss=ctc_loss+translate_loss*2,最后用train_loss对整个模型进行端到端的训练,是这样吗
但是我看训练过程打印出来的loss不符合这个规律,比如某时刻是train_loss=210.327, ctc_loss=3.709, translate_loss=1.432,是我的理解有什么问题吗,希望可以得到解答,谢谢!
环境:ubuntu16 tensorflow2.2
数据集:aishell1
配置文件:conformer.yml model_config.name=ConformerCTC\ConformerLAS
运行代码:python train_am.py --data_config ./configs/am_data.yml --model_config ./configs/conformer.yml
2020-11-16 15:36:09,744 - numba.core.ssa - DEBUG - on stmt: $478.2 = cast(value=$const478.1)
2020-11-16 15:36:09,744 - numba.core.ssa - DEBUG - on stmt: return $478.2
INFO:tensorflow:batch_all_reduce: 203 all-reduces with algorithm = nccl, num_packs = 1
2020-11-16 15:36:18,367 - tensorflow - INFO - batch_all_reduce: 203 all-reduces with algorithm = nccl, num_packs = 1
[Train] [Epoch 123/2] |▋ | 500/14016 [34:39<12:23:38, 3.30s/batch, Successfully Saved Checkpoint]
大家有遇到这个问题的吗?
有知道是什么原因的吗?
请问在语言模型中设置bert_feature_loss的动机是什么呢?
hi:
Getting error when use pip install -r requirements.txt
.
It should be keras-bert
and tensorflow-addons
.
您好,我用conformerS模型在AISHELL-1数据集上训练了20个epoch,但是loss一直不下降,准确度为0,似乎没有训练效果。
config用的都是项目默认的配置,只有数据位置改了一下
data.config
speech_config:
mel_layer_type: Melspectrogram #Spectrogram/Melspectrogram/leaf
mel_layer_trainable: False #leaf support train
add_wav_info: False
sample_rate: 16000
frame_ms: 25
stride_ms: 10
num_feature_bins: 80
reduction_factor: 4 #should keep the same with model_config, DS2 : time_reduction_factor *= s[0] for s in 'conv_strides'
train_list: '/remote-home/jzhan/Datasets/AISHELL-1/train/transcripts.txt'
eval_list: '/remote-home/jzhan/Datasets/AISHELL-1/dev/transcripts.txt'
wav_max_duration: 30 # s
only_chinese: True
streaming: False
streaming_bucket: 0.5 #s
pinyin_map: './asr/configs/dict/pinyin2phone.map'
inp_config:
vocabulary: './asr/configs/dict/pinyin.txt'
blank_at_zero: False
beam_width: 1
tar_config:
vocabulary: './asr/configs/dict/lm_tokens.txt'
blank_at_zero: False
beam_width: 1
augments_config:
noise:
active: False
sample_rate: 16000
SNR: [0,15]
noises: './noise'
masking:
active: False
zone: (0.1,0.9)
mask_ratio: 0.3
mask_with_noise: False
pitch:
active: False
zone: (0.0,1.0)
sample_rate: 16000
factor: (-1,3)
speed:
active: False
factor: (0.9,1.2)
hz:
active: False
optimizer_config:
lr: 0.001
warmup_steps: 10000
beta1: 0.9
beta2: 0.98
epsilon: 0.000001
running_config:
batch_size: 32
train_steps_per_batches: 10
eval_steps_per_batches: 10
num_epochs: 20
outdir: './models'
log_interval_steps: 300
eval_interval_steps: 500
save_interval_steps: 500
conformerS.config
model_config:
name: OfflineConformerCTC
dmodel: 144
reduction_factor: 4
num_blocks: 13
head_size: 36
num_heads: 4
kernel_size: 32
fc_factor: 0.5
dropout: 0.1
ctcdecoder_num_blocks: 1
ctcdecoder_kernel_size: 32
ctcdecoder_fc_factor: 0.5
ctcdecoder_dropout: 0.1
translator_num_blocks: 2
translator_kernel_size: 32
translator_fc_factor: 0.5
translator_dropout: 0.1
您好,根据您百度盘分享的H5模型文件,进行转换,提示缺少模型配置。是否可以存一个带模型图结构和参数的H5文件,麻烦您了。
ValueError: No model config found in the file at models/model_0.h5.
I have noticed that enable_tflite_convertible:
is commented as `not support true' in all models' config.
Any plan to implement this function?
请问可以直接下载您训练好的模型,如ConformerCTC(S),用来测试吗?
我使用您训练好的ConformerCTC(S),在准备好的test_list下,当运行python eval_am.py --data_config ./asr/configs/am_data.yml --model_config ./asr/configs/ConformerS.yml 的时候会出现一下错误:
outputs = call_fn(inputs, *args, **kwargs)
TypeError: call() got multiple values for argument 'training'
想请问一下这怎么解决
Plz update qr code ;)
这是一个很棒的项目。据我所知,你是第一个用tensorflow将目前最先进的ASR模型整合实现的人。
在我读到Conformer模型的实现时产生了两个疑惑。第一个疑惑是你在Conformer中的每一个block都加入了位置编码,这与我在其他地方了解到的实现有点不一样。这样做会降低模型的错误率吗?
另外是你只使用了Conformer中的卷积结构,而没有使用相对位置编码,这样做会对模型产生什么影响?
tensorflow.python.framework.errors_impl.InvalidArgumentError: 4 root error(s) found.
(0) Invalid argument: Incompatible shapes: [1,51,768] vs. [1,52,768]
[[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]
[[Adamax/concat_8/_2086]]
(1) Invalid argument: Incompatible shapes: [1,51,768] vs. [1,52,768]
[[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]
[[replica_2/Cast_5/_1460]]
(2) Invalid argument: Incompatible shapes: [1,51,768] vs. [1,52,768]
[[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]
[[replica_1/transformer/decoder/sequential_5/dense_42/Tensordot/Prod/_1302]]
(3) Invalid argument: Incompatible shapes: [1,51,768] vs. [1,52,768]
[[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]
对应的是 def bert_feature_loss(self, real, pred):
我现在的方案是 判定 如果 real 和 pred shape 不同,就先跳过
训练am模型
数据是aishell 1
am_data.yml
conformerS.yml
有遇到过这个问题的吗?
感恩你做出来的完美工作
不知您是否能提供完整的测试样例呢~
从语音到输出文字 我新手 看这代码完全不知道如何下手
感谢
想进群交流交流,不知道您是否可以再放一次新的二维码
我计划基于 TensorflowASR
训练一个语言模型。使用的数据规模在1000万句。使用默认的 train_lm.py
,且按照说明把 BERT 模型也下载了下来。
按照目前的速度,一个 epoch要的时间要上百天。
我使用的是 3090Ti 单卡。
作者您好,您在此博文中提到2020/12/1的开源的自动打标点的语言模型链接已经失效,能麻烦您补一下链接吗?谢谢!如有打扰,请多见谅
你好,有tpu实现的脚本吗
The above are some installation package versions,when I run "python train_am.py", error "tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed " happended.
Thanks for your help.
您好!能否给出训练VAD模型的链接
我在执行下面代码的时候。出现如下警告。其他人有吗?
python train_am.py --data_config ./configs/am_data.yml --model_config ./configs/conformer.yml
WARNING:tensorflow:7 out of the last 9 calls to <function MultiHeadAttention.call at 0x7f2b902fb840> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-11-17 10:08:31,362 - tensorflow - WARNING - 7 out of the last 9 calls to <function MultiHeadAttention.call at 0x7f2b902fb840> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
你好,请问你写的推断速度是指一次推断吗?因为我这试着一句话的推断还是挺慢的,每一个输出都依赖上一次输出,推断次数是帧数乘以你写的毫秒数吧?
感谢分享。
请教一个小白的问题:看到am_tokens.txt里一共有1301个拼音,对应到AM model decoder的输出应该也是1304 纬 的向量。这里面am_tokens.txt里面的拼音是如何选定的呢?
Hi
I see the multiheadattention in conformer needs mask, Is there the code to generate the mask? thank you
您好,我在使用标点恢复模型时,发现存在未登录词的情况,我看词表里没有“unk” 这样的字符, 对于不存在字表里的中文字会报错
dict目录下缺少pinyin.txt和lm_tokens.txt,希望大佬能加上来
看到说要求tensorflow-gpu,并且里面也没有单独说明不用gpu的方式,想请教一下如何只使用cpu运行,谢谢
如何加载pretrain 模型, 目录结构是什么样的?
Traceback (most recent call last):
File "run-test.py", line 82, in
asr = ASR(am_config, lm_config,punc_config)
File "run-test.py", line 12, in init
self.lm = LM(lm_config,punc_config)
File "/home/w/ASR/LMmodel/trm_lm.py", line 16, in init
self.am_featurizer = TextFeaturizer(config['am_token'])
File "/home/w/ASR/utils/text_featurizers.py", line 78, in init
self.stop = self.endid()
File "/home/w/ASR/utils/text_featurizers.py", line 90, in endid
return self.token_to_index['']
KeyError: ''
你好:
基于你给的框架我用一维膨胀卷积(dilate Conv1D)搭建了一个TDNNTransducer的网络,尝试训练时遇到下面这种情况的报错:
indices[1] = [1 , 776, 22] does not index into shape [4, 776, 23]
错误位置在
这里
last_grads_blank = -1 * tf.scatter_nd( tf.concat([tf.reshape(tf.range(batch_size, dtype=tf.int32), shape=[batch_size, 1]), indices], axis=1), tf.ones(batch_size, dtype=tf.float32), [batch_size, input_max_len, target_max_len])
打印中间的信息:
input_max_len = 776
logit_length = [398, 777, 378, 777]
indices =
[[0 397 12]
[1 776 22]
[2 377 12]
[3 776 22]]
[batch_size, input_max_len, target_max_len] = [4, 776, 23]
tf.scatter_nd() 使用时要求indices 每一列的最大值要比 [batch_size, input_max_len, target_max_len] 对应的值小,否则就会报出前面的索引错误。
至于为什么会出现相等的情况(示例中indices是776,与input_max_len值相等),从logit_length 可以看出,网络计算的logits对应维度大小为777,而给定的是776,实际上logits对应维度应该是776,计算indices时 776-1 = 775 ,这样才能满足条件。
这里可以看到,input_max_len是生成batch时,用传统方法提取声学特征计算出来的,当使用mel_layer时,是直接输入的raw数据,也就是说用传统方法提取声学特征和mel_layer自动处理raw数据生成的特征帧数是不一致的,mel_layer多了一帧。
将 use_mel_layer 设置为False后,训练正常了,验证了我的判断。
另外,这个错误不是立刻出现的,是经过十几个batch的正常计算后出现的。
我尝试基于本项目训练语言模型。
修改了 configs/lm_data.yml
train_list: './common.all.1w'
eval_list: './common.all.1w'
...
bert:
config_json: './LMmodel/bert/bert_config.json'
bert_ckpt: './LMmodel/bert/bert_model.ckpt'
bert_vocab: './LMmodel/bert/vocab.txt'
后,运行 python train_lm.py
,总是失败。下面是出错日志:
2020-12-22 23:56:06,255 - root - INFO - start training language model
Traceback (most recent call last):
File "train_lm.py", line 90, in <module>
train.train()
File "train_lm.py", line 45, in train
self.runner.set_datasets(train, test)
File "/home/user/TensorflowASR/trainer/base_runners.py", line 164, in set_datasets
self.train_datasets=self.strategy.experimental_distribute_dataset(train)
File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 805, in experimental_distribute_dataset
return self._extended._experimental_distribute_dataset(dataset) # pylint: disable=protected-access
File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 638, in _experimental_distribute_dataset
return input_lib.get_distributed_dataset(
File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/input_lib.py", line 84, in
get_distributed_dataset
return DistributedDataset(
File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/input_lib.py", line 659, in __init__
with ops.colocate_with(dataset._variant_tensor):
AttributeError: 'generator' object has no attribute '_variant_tensor'
我尝试基于 cpu和gpu的tf2.2.0,得到一样的错误日志。
比如:语音2020你那12月12号,识别出结果: "年。月。号。/S"
你好:
有一个小小的疑问,
在CPU上训练,linux 16.04,使用aishell_1中的几个人的数据(2100条音频,验证代码用);训练 ConformerTransducer, 其它参数默认。
020-10-05 10:28:11,241 - root - INFO - trainer resume failed020-10-05 10:28:11,241 - root - INFO - trainer resume failed
[Train] [Epoch 1/2] | | 7/2096 [00:36<2:07:57, 3.68s/batch, transducer_loss=373.089]
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-10-05 10:28:47,972 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-10-05 10:28:48,185 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7648c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
...
[Train] [Epoch 1/2] |████▊ | 500/2096 [09:30<26:07, 1.02batch/s, Successfully Saved Checkpoint]
...
[Train] [Epoch 1/2] |█████▏ | 547/2096 [10:39<23:14, 1.11batch/s, transducer_loss=85.972]
...
ValueError: `generator` yielded an element of shape (0,) where an element of shape (None, None, 80, 1) was expected.
第547步出现报错,但是报错并不是只出现在某个固定的步数,是随机出现的。
经过对内部数据出里过程的了解,我发现你在数据处理脚本中做了如下的过滤处理:
if len(data) < 400:
continue
elif len(data) > self.speech_featurizer.sample_rate * 7:
continue
也就是说当音频数据(16K采样)时长小于25ms以及大于7s的时候,丢弃。当一个batch的所有音频数据时长都大于7s时,全丢弃,generator就生成None
,也就造成上述的错误。
解决方法也很简单,把数字7改大一点就行。
那么问题来了,小于25ms的数据丢弃我可以理解,那大于7s 的也丢弃是为什么呢,超过7s会造成模型识别效果变差所以不用的吗?
你在处理AISHELL2数据集的时候是把所有大于7s的音频都丢弃不用吗?
此外,tensorflow - WARNING
部分是什么情况,没看明白?
想做一个流式识别的demo,实时采集麦克风的数据。想到的思路就是 缓存接收麦克风的数据 隔一段时间(比如0.5s)送一次识别,检测到一句话结束后清除缓存。求教这样是否合理,有没有更优的方案。
hi, thanks to your peoject, i re-implement conformer for tf.1.15. When i train it with ctc loss on 1000-h chinese audio dataset, the loss is fluctuating and could not decline.
the conv-sampling is 3-layer conv2d with 144 filters and kernel_size=3, reduction_factor=4
the conformer is just like the bert-base but with relative position encoding using t-5
the optimizer is adam-with-weight-decay, params are default as bert-base
.
could you help me with this?
更新代码后尝试训练transducer模型时遇到如下报错:
Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'>
经过分析后发现问题出在这里:
实际需要的输入是这样
call(self, features, predicted=None, training=False)
features是一个信号或者特征的tensor,predicted是label的tensor
给模型的输入是这样
logits = self.model([wavs, pred_inp], training=True)
logits = self.model([features, pred_inp], training=True)
此时的features 变成了一个list从而导致在mel_layer里对 tensor进行tanspose操作时报错,应该把中括号去掉。
Did you copy the code from my repository https://github.com/usimarit/TiramisuASR?
I looked into the code and I see it look like the code I wrote.
It would be nice if you could write references to where you learn and take source code or any kind of knowledge involving in this repo. Other authors would appreciate it.
你好:
关于流式处理,请问
我看到现在的测试推理都是直接把整个音频输入模型,好像并不是流式处理。
那么ComformerTransducer 或者 DeepSpeech2Transducer 要怎么实现流式的识别?
谢谢!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.