yeyupiaoling / ppasr Goto Github PK

基于PaddlePaddle实现端到端中文语音识别，从入门到实战，超简单的入门案例，超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型

License: Apache License 2.0

Python 97.37% CSS 0.29% JavaScript 1.38% HTML 0.96%

asr paddlepaddle deep-learning chinese speech-to-text speech speech-recognition streaming-asr conformer squeezeformer

ppasr's Introduction

PPASR流式与非流式语音识别项目

本项目将分三个阶段分支，分别是入门级、进阶级和最终级分支，当前为最终级的V2版本，如果想使用最终级的V1版本，请在这个分支r1.x。PPASR中文名称PaddlePaddle中文语音识别（PaddlePaddle Automatic Speech Recognition），是一款基于PaddlePaddle实现的语音识别框架，PPASR致力于简单，实用的语音识别项目。可部署在服务器，Nvidia Jetson设备，未来还计划支持Android等移动设备。别忘了star

欢迎大家扫码入知识星球或者QQ群讨论，知识星球里面提供项目的模型文件和博主其他相关项目的模型文件，也包括其他一些资源。

在线使用

1. 在AI Studio平台训练预测

2. 在线使用Dome

3. inscode

本项目使用的环境：

Anaconda 3
Python 3.8
PaddlePaddle 2.5.1
Windows 10 or Ubuntu 18.04

项目快速了解

本项目支持流式识别模型deepspeech2、conformer、squeezeformer，efficient_conformer，每个模型都支持流式识别和非流式识别，在配置文件中streaming参数设置。
本项目支持两种解码器，分别是集束搜索解码器ctc_beam_search和贪心解码器ctc_greedy，集束搜索解码器ctc_beam_search准确率更高。
下面提供了一系列预训练模型的下载，下载预训练模型之后，需要把全部文件复制到项目根目录，并执行导出模型才可以使用语音识别。

更新记录

2023.01.28: 调整配置文件结构，支持efficient_conformer模型。
2022.12.05: 支持自动混合精度训练和导出量化模型。
2022.11.26: 支持Squeezeformer模型。
2022.11.01: 修改Conformer模型的解码器为BiTransformerDecoder，增加SpecSubAugmentor数据增强器。
2022.10.29: 正式发布最终级的V2版本。

视频讲解

快速使用

这里介绍如何使用PPASR快速进行语音识别，前提是要安装PPASR，文档请看快速安装。执行过程不需要手动下载模型，全部自动完成。

短语音识别

from ppasr.predict import PPASRPredictor

predictor = PPASRPredictor(model_tag='conformer_streaming_fbank_wenetspeech')

wav_path = 'dataset/test.wav'
result = predictor.predict(audio_data=wav_path, use_pun=False)
score, text = result['score'], result['text']
print(f"识别结果: {text}, 得分: {int(score)}")

长语音识别

from ppasr.predict import PPASRPredictor

predictor = PPASRPredictor(model_tag='conformer_streaming_fbank_wenetspeech')

wav_path = 'dataset/test_long.wav'
result = predictor.predict_long(audio_data=wav_path, use_pun=False)
score, text = result['score'], result['text']
print(f"识别结果: {text}, 得分: {score}")

模拟流式识别

import time
import wave

from ppasr.predict import PPASRPredictor

predictor = PPASRPredictor(model_tag='conformer_streaming_fbank_wenetspeech')

# 识别间隔时间
interval_time = 0.5
CHUNK = int(16000 * interval_time)
# 读取数据
wav_path = 'dataset/test.wav'
wf = wave.open(wav_path, 'rb')
data = wf.readframes(CHUNK)
# 播放
while data != b'':
    start = time.time()
    d = wf.readframes(CHUNK)
    result = predictor.predict_stream(audio_data=data, use_pun=False, is_end=d == b'')
    data = d
    if result is None: continue
    score, text = result['score'], result['text']
    print(f"【实时结果】：消耗时间：{int((time.time() - start) * 1000)}ms, 识别结果: {text}, 得分: {int(score)}")
# 重置流式识别
predictor.reset_stream()

模型下载

WenetSpeech (10000小时) 的预训练模型列表：

使用模型	是否为流式	预处理方式	语言	测试集字错率	下载地址
conformer	True	fbank	普通话	0.03579(aishell_test) 0.11081(test_net) 0.16031(test_meeting)	加入知识星球获取
deepspeech2	True	fbank	普通话	0.05379(aishell_test)	加入知识星球获取

WenetSpeech (10000小时)+中文语音数据集 (3000+小时) 的预训练模型列表：

使用模型	是否为流式	预处理方式	语言	测试集字错率	下载地址
conformere	True	fbank	普通话	0.02923(aishell_test) 0.11876(test_net) 0.18346(test_meeting)	加入知识星球获取

AIShell (179小时) 的预训练模型列表：

使用模型	是否为流式	预处理方式	语言	测试集字错率	下载地址
squeezeformer	True	fbank	普通话	0.04675	加入知识星球获取
conformer	True	fbank	普通话	0.04178	加入知识星球获取
efficient_conformer	True	fbank	普通话	0.04143	加入知识星球获取
deepspeech2	True	fbank	普通话	0.09732	加入知识星球获取

Librispeech (960小时) 的预训练模型列表：

使用模型	是否为流式	预处理方式	语言	测试集词错率	下载地址
squeezeformer	True	fbank	英文	0.13033	加入知识星球获取
conformer	True	fbank	英文	0.08109	加入知识星球获取
efficient_conformer	True	fbank	英文		加入知识星球获取
deepspeech2	True	fbank	英文	0.15294	加入知识星球获取

说明：

这里字错率或者词错率是使用eval.py程序并使用集束搜索解码ctc_beam_search方法计算得到的。
没有提供预测模型，需要把全部文件复制到项目的根目录下，执行export_model.py导出预测模型。
由于算力不足，这里只提供了流式模型，但全部模型都支持流式和非流式的，在配置文件中streaming参数设置。

有问题欢迎提 issue 交流

文档教程

特别感谢

感谢 JetBrains开源社区提供开发工具。

打赏作者

打赏一块钱支持一下作者

参考资料

ppasr's People

Contributors

Stargazers

Watchers

Forkers

linglinduan oyleanu edencfc fiyen zelda3721 yang14279 shuiniu86 buyersystem jawaechan xiahongjin luogaara chenhaohan88 qgzang huanri-666 allensmile dyfcode livingbody tiqq111 lhwjsj bpppppp chengjunjian ganjunhong zhongpengrui xupt-glf icango dimwalker convect-bot lonelyxmas chuangweilai shehold zcswdt mc261670164 liangdazhu p-shinebeam kavenwang qibinran errolyan lingxufeng shichaoying85 durianlian someshire rexiome iemonemon mrjson1 tommy13579 a00147600 qiaoruntao whz-nj road2018 real-cliang tasseldeng usench sniper-xx antonizdp lichuanqi iablee tian14267 aimoment xbsdsongnan hecongqing qq56521 jeffy995 straitrobot liuxiaocs7 sdqdxzh ljwang1986 jingsongliujing ted0201 wen704654328 zhuzcz russellle chaos-observer gsn516 sunny635 fanhuafeng alexandajerry shuianxi97 yourengod eamesh aotumn ishine wallaceliu sunying1985 jackycheng86 dingjia123 396175371 lwppwl mdys weitajinjucha cellinlab csljingyu yuhuofei zomun cuiyc kiddog99 cobanka veryquant dcmouth pink-soda lyhiving

ppasr's Issues

使用预下载模型导出并预测时，测试样例预测出错

当我使用预下载模型导出并预测test.wav时，输出结果为：

D:\anaconda3\python.exe F:/PPASR-master/infer_path.py
D:\anaconda3\lib\site-packages\librosa\core\constantq.py:1058: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
D:\anaconda3\lib\site-packages\pydub-0.25.1-py3.9.egg\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)

缺少 paddlespeech-ctcdecoders 库，请安装，如果是Windows系统，只能使用ctc_greedy。
【注意】已自动切换为ctc_greedy解码器。

-----------  Configuration Arguments -----------
alpha: 2.2
beam_size: 300
beta: 4.3
cutoff_prob: 0.99
cutoff_top_n: 40
decoder: ctc_beam_search
feature_method: linear
is_long_audio: False
lang_model_path: lm/zh_giga.no_cna_cmn.prune01244.klm
model_dir: models/deepspeech2/infer/
pun_model_dir: models/pun_models/
real_time_demo: False
to_an: False
use_gpu: True
use_model: deepspeech2
use_pun: False
vocab_path: dataset/vocabulary.txt
wav_path: dataset/test.wav
------------------------------------------------
消耗时间：164ms, 识别结果: 逐屈肮霸故罅咽罅鳟物鳟马悖鳟忑敌茧忑龙茧忑物裁唱勤疡掩物婷忑马窗马物鳟疡层悖忑混忑恐物层, 得分: 0

导出模型输出为：
-----------  Configuration Arguments -----------
dataset_vocab: dataset/vocabulary.txt
feature_method: linear
mean_std_path: dataset/mean_std.npz
resume_model: models/deepspeech2/epoch_50/
save_model: models/
use_model: deepspeech2
------------------------------------------------
W0121 01:57:48.756676 15560 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.2, Runtime API Version: 10.2
W0121 01:57:48.765652 15560 device_context.cc:465] device: 0, cuDNN Version: 7.6.
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for conv.conv1.conv.weight. conv.conv1.conv.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for conv.conv1.conv.bias. conv.conv1.conv.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for conv.conv2.conv.weight. conv.conv2.conv.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for conv.conv2.conv.bias. conv.conv2.conv.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.0.rnn.weight_ih_l0. rnn.rnn.0.rnn.weight_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.0.rnn.weight_hh_l0. rnn.rnn.0.rnn.weight_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.0.rnn.bias_ih_l0. rnn.rnn.0.rnn.bias_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.0.rnn.bias_hh_l0. rnn.rnn.0.rnn.bias_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.0.rnn.0.cell.weight_ih. rnn.rnn.0.rnn.0.cell.weight_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.0.rnn.0.cell.weight_hh. rnn.rnn.0.rnn.0.cell.weight_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.0.rnn.0.cell.bias_ih. rnn.rnn.0.rnn.0.cell.bias_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.0.rnn.0.cell.bias_hh. rnn.rnn.0.rnn.0.cell.bias_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.0.norm.weight. rnn.rnn.0.norm.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.0.norm.bias. rnn.rnn.0.norm.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.1.rnn.weight_ih_l0. rnn.rnn.1.rnn.weight_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.1.rnn.weight_hh_l0. rnn.rnn.1.rnn.weight_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.1.rnn.bias_ih_l0. rnn.rnn.1.rnn.bias_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.1.rnn.bias_hh_l0. rnn.rnn.1.rnn.bias_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.1.rnn.0.cell.weight_ih. rnn.rnn.1.rnn.0.cell.weight_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.1.rnn.0.cell.weight_hh. rnn.rnn.1.rnn.0.cell.weight_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.1.rnn.0.cell.bias_ih. rnn.rnn.1.rnn.0.cell.bias_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.1.rnn.0.cell.bias_hh. rnn.rnn.1.rnn.0.cell.bias_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.1.norm.weight. rnn.rnn.1.norm.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.1.norm.bias. rnn.rnn.1.norm.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.2.rnn.weight_ih_l0. rnn.rnn.2.rnn.weight_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.2.rnn.weight_hh_l0. rnn.rnn.2.rnn.weight_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.2.rnn.bias_ih_l0. rnn.rnn.2.rnn.bias_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.2.rnn.bias_hh_l0. rnn.rnn.2.rnn.bias_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.2.rnn.0.cell.weight_ih. rnn.rnn.2.rnn.0.cell.weight_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.2.rnn.0.cell.weight_hh. rnn.rnn.2.rnn.0.cell.weight_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.2.rnn.0.cell.bias_ih. rnn.rnn.2.rnn.0.cell.bias_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.2.rnn.0.cell.bias_hh. rnn.rnn.2.rnn.0.cell.bias_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.2.norm.weight. rnn.rnn.2.norm.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.2.norm.bias. rnn.rnn.2.norm.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.3.rnn.weight_ih_l0. rnn.rnn.3.rnn.weight_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.3.rnn.weight_hh_l0. rnn.rnn.3.rnn.weight_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.3.rnn.bias_ih_l0. rnn.rnn.3.rnn.bias_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.3.rnn.bias_hh_l0. rnn.rnn.3.rnn.bias_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.3.rnn.0.cell.weight_ih. rnn.rnn.3.rnn.0.cell.weight_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.3.rnn.0.cell.weight_hh. rnn.rnn.3.rnn.0.cell.weight_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.3.rnn.0.cell.bias_ih. rnn.rnn.3.rnn.0.cell.bias_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.3.rnn.0.cell.bias_hh. rnn.rnn.3.rnn.0.cell.bias_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.3.norm.weight. rnn.rnn.3.norm.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.3.norm.bias. rnn.rnn.3.norm.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.4.rnn.weight_ih_l0. rnn.rnn.4.rnn.weight_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.4.rnn.weight_hh_l0. rnn.rnn.4.rnn.weight_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.4.rnn.bias_ih_l0. rnn.rnn.4.rnn.bias_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.4.rnn.bias_hh_l0. rnn.rnn.4.rnn.bias_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.4.rnn.0.cell.weight_ih. rnn.rnn.4.rnn.0.cell.weight_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.4.rnn.0.cell.weight_hh. rnn.rnn.4.rnn.0.cell.weight_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.4.rnn.0.cell.bias_ih. rnn.rnn.4.rnn.0.cell.bias_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.4.rnn.0.cell.bias_hh. rnn.rnn.4.rnn.0.cell.bias_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.4.norm.weight. rnn.rnn.4.norm.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for rnn.rnn.4.norm.bias. rnn.rnn.4.norm.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for output.weight. output.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\anaconda3\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for output.bias. output.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
[2022-01-21 01:57:51.640273] 成功恢复模型参数和优化方法参数：models/deepspeech2/epoch_50/model.pdparams
D:\anaconda3\lib\site-packages\paddle\fluid\layers\utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
  return (isinstance(seq, collections.Sequence) and
预测模型已保存：models/deepspeech2\infer

请问这是说明我的模型导出有很多问题，然后导致我的预测结果也出问题了吗，我该如何修复呢

An error occurred when use pretrained model to train WenetSpeech dataset

Hello~
l copy the pretrained model form "PPASR_大数据集/models/deepspeech2/best_model" to the "/PPASR/models/deepspeech2/last_model", then l start trainning, but an error occurred. l wonder what's the reason of this problem?

AssertionError: Variable Shape not match, Variable [ linear_0.w_0_moment1_0 ] need tensor with shape (1024, 5451) but load set tensor with shape (1024, 6436)

inference

I found that the accuracy maybe worser than before,how you guys think?

您好，这个工程有没有可能实现用websocket来进行连续实时语音识别

您好，我看了您的工程，请问下这种模型能不能实现出用websocket当服务端，然后客户端将麦克风的语音连续发送到服务器进行实时语音识别。

ValueError: 无法将段规范化到 -20.000000 dB，因为可能的增益已经超过max_gain_db (300.000000 dB)

我在进行语音识别预测时遇到了这个错误，输入的音频均为单声道，16k采样率。

有时候会出现这个报错，有时候就不会。

老哥，有没有别的预处理或者解码方法可以推荐尝试

vocabulary.txt和mean_std.npz文件

您好，我下载了您的预训练模型，想要导出进行预测，请问这两个文件要怎么得到呢？
add_arg('dataset_vocab', str, 'dataset/vocabulary.txt', '数据字典的路径') add_arg('mean_std_path', str, 'dataset/mean_std.npz', '数据集的均值和标准值的npy文件路径')

python train.py 运行时没有显示错误就退出，而且没有生成models文件夹。

(asr) D:\ASR>python train.py
-----------  Configuration Arguments -----------
alpha: 2.2
augment_conf_path: conf/augmentation.json
batch_size: 32
beam_size: 300
beta: 4.3
cutoff_prob: 0.99
cutoff_top_n: 40
dataset_vocab: dataset/vocabulary.txt
decoder: ctc_beam_search
feature_method: linear
lang_model_path: lm/zh_giga.no_cna_cmn.prune01244.klm
learning_rate: 5e-05
max_duration: 20
mean_std_path: dataset/mean_std.npz
metrics_type: cer
min_duration: 0.5
num_epoch: 65
num_proc_bsearch: 10
num_workers: 8
pretrained_model: None
resume_model: None
save_model_path: models/
test_manifest: dataset/manifest.test
train_manifest: dataset/manifest.train
use_model: deepspeech2
------------------------------------------------
dataset/manifest.noise不存在，已经忽略噪声增强操作！
[2022-02-12 12:55:37.853290] 数据增强配置：{'type': 'speed', 'aug_type': 'audio', 'params': {'min_speed_rate': 0.9, 'max_speed_rate': 1.1, 'num_rates': 3}, 'prob': 1.0}
[2022-02-12 12:55:37.853290] 数据增强配置：{'type': 'shift', 'aug_type': 'audio', 'params': {'min_shift_ms': -5, 'max_shift_ms': 5}, 'prob': 1.0}
[2022-02-12 12:55:37.853290] 数据增强配置：{'type': 'volume', 'aug_type': 'audio', 'params': {'min_gain_dBFS': -15, 'max_gain_dBFS': 15}, 'prob': 1.0}
[2022-02-12 12:55:37.861290] 数据增强配置：{'type': 'specaug', 'aug_type': 'feature', 'params': {'W': 0, 'warp_mode': 'PIL', 'F': 10, 'n_freq_masks': 2, 'T': 50, 'n_time_masks': 2, 'p': 1.0, 'adaptive_number_ratio': 0, 'adaptive_size_ratio': 0, 'max_n_time_masks': 20, 'replace_with_zero': True}, 'prob': 1.0}
D:\XXXX\lib\site-packages\paddle\fluid\reader.py:355: UserWarning: DataLoader with multi-process mode is not supported on MacOs and Windows currently. Please use signle-process mode with num_workers = 0 instead
  warnings.warn(
W0212 12:55:37.974066 17176 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 10.2
W0212 12:55:37.990077 17176 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022-02-12 12:55:41.438553] 训练数据：13361
[2022-02-12 12:55:42.453420] Train epoch: [1/65], batch: [0/417], loss: 855.68878, learning rate: 0.00005000, eta: 7:38:27
**就是这个地方，莫名其妙退出。**
(asr) D:\ASR>

生成数据时遇到的问题

您好，我在ai studio跑该项目的时候，在处理数据部分遇到了下面这个问题，请问是为什么？
_以及，在数据划分中，test.txt中每一条音频的路径有一些错误。被写入到test.txt中的路径是_aset/_开头的，而不是_dataset_开头的，您有空可以去修改一下。

IsADirectoryError Traceback (most recent call last)
/tmp/ipykernel_189/311691653.py in
7 num_workers=1)
8
----> 9 trainer.create_data(annotation_path='dataset/annotation/')

~/PPASR/ppasr/trainer.py in create_data(self, annotation_path, noise_manifest_path, noise_path, num_samples, count_threshold, is_change_frame_rate, max_test_manifest)
108 test_manifest_path=self.test_manifest,
109 is_change_frame_rate=is_change_frame_rate,
--> 110 max_test_manifest=max_test_manifest)
111 print('=' * 70)
112 print('开始生成噪声数据列表...')

~/PPASR/ppasr/utils/utils.py in create_manifest(annotation_path, train_manifest_path, test_manifest_path, is_change_frame_rate, max_test_manifest)
53 for annotation_text in os.listdir(annotation_path):
54 annotation_text_path = os.path.join(annotation_path, annotation_text)
---> 55 with open(annotation_text_path, 'r', encoding='utf-8') as f:
56 lines = f.readlines()
57 for line in tqdm(lines):

IsADirectoryError: [Errno 21] Is a directory: 'dataset/annotation/.ipynb_checkpoints'

目前，在win上用贪心解码进阶级的识别率，似乎还是很低，这个原因和解码的方式有关吗

scipy==1.6.1 ??

ERROR: Could not find a version that satisfies the requirement scipy==1.6.1 (from versions: 0.8.0, 0.9.0, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.13.0, 0.13.1, 0.13.2, 0.13.3, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.16.1, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 1.0.0b1, 1.0.0rc1, 1.0.0rc2, 1.0.0, 1.0.1, 1.1.0rc1, 1.1.0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.4.0rc1, 1.4.0rc2, 1.4.0, 1.4.1, 1.5.0rc1, 1.5.0rc2, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4)

能否在已有模型的基础上加上自己的数据训练

想咨询下大佬，是否可以基于您wenet的模型再加上自己的数据来训练呢。是在trainer上把pretrained model指向您提供的预训练模型嘛～

生成数据列表时出现错误

老哥，我是在AI studio上部署的，在运行create_data时出现了这个问题，希望您能帮助解答一下，谢谢！

annotation_path: dataset/annotation/
count_threshold: 2
dataset_vocab: dataset/vocabulary.txt
feature_method: linear
is_change_frame_rate: True
max_test_manifest: 10000
mean_std_path: dataset/mean_std.npz
noise_manifest_path: dataset/manifest.noise
noise_path: dataset/audio/noise
num_samples: 1000000
num_workers: 8
test_manifest: dataset/manifest.test
train_manifest: dataset/manifest.train
------------------------------------------------
开始生成数据列表...
  0%|                                                                                                                                          | 0/7176 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "create_data.py", line 39, in <module>
    max_test_manifest=args.max_test_manifest)
  File "/home/aistudio/ppasr/trainer.py", line 110, in create_data
    max_test_manifest=max_test_manifest)
  File "/home/aistudio/ppasr/utils/utils.py", line 61, in create_manifest
    change_rate(audio_path)
  File "/home/aistudio/ppasr/utils/utils.py", line 105, in change_rate
    data, sr = soundfile.read(audio_path)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py", line 257, in read
    subtype, endian, format, closefd) as f:
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py", line 1184, in _open
    "Error opening {0!r}: ".format(self.name))
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'aset/audio/data_aishell/wav/test/S0764/BAC009S0764W0480.wav': System error.

zhconv 1.4.2 is installed but zhconv==1.4.3 is required by {'proces'}

安装报这个错误是咋回事？

b站看大佬直播

打卡，嘿嘿

最终级model貌似上传CSDN失败了，无法找到

GPU预测时出错，Aborted at 1637208819 (unix time) try "date -d @1637208819" if you are using GNU date

老师，您好。

前序步骤【数据准备】、【训练模型】、【执行评估】、【导出模型】均正常通过。

但是在【快速预测】时，执行：python infer_path.py --wav_path=./dataset/test.wav 后，出现以下错误：

root@pp:~/PPASR# python export_model.py --resume_model=models/deepspeech2/epoch_50/
/usr/local/lib/python3.7/dist-packages/numba/types/__init__.py:110: DeprecationWarning: `np.long` is a deprecated alias for `np.compat.long`. To silence this warning, use `np.compat.long` by itself. In the likely event your code does not need to work on Python 2 you can use the builtin `int` for which `np.compat.long` is itself an alias. Doing this will not modify any behaviour and is safe. When replacing `np.long`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  long_ = _make_signed(np.long)
/usr/local/lib/python3.7/dist-packages/numba/types/__init__.py:111: DeprecationWarning: `np.long` is a deprecated alias for `np.compat.long`. To silence this warning, use `np.compat.long` by itself. In the likely event your code does not need to work on Python 2 you can use the builtin `int` for which `np.compat.long` is itself an alias. Doing this will not modify any behaviour and is safe. When replacing `np.long`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  ulong = _make_unsigned(np.long)
/usr/local/lib/python3.7/dist-packages/librosa/cache.py:49: DeprecationWarning: The 'cachedir' attribute has been deprecated in version 0.12 and will be remo
root@pp:~/PPASR# python infer_path.py --wav_path=./dataset/test.wav
/usr/local/lib/python3.7/dist-packages/numba/types/__init__.py:110: DeprecationWarning: `np.long` is a deprecated alias for `np.compat.long`. To silence this warning, use `np.compat.long` by itself. In the likely event your code does not need to work on Python 2 you can use the builtin `int` for which `np.compat.long` is itself an alias. Doing this will not modify any behaviour and is safe. When replacing `np.long`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  long_ = _make_signed(np.long)
/usr/local/lib/python3.7/dist-packages/numba/types/__init__.py:111: DeprecationWarning: `np.long` is a deprecated alias for `np.compat.long`. To silence this warning, use `np.compat.long` by itself. In the likely event your code does not need to work on Python 2 you can use the builtin `int` for which `np.compat.long` is itself an alias. Doing this will not modify any behaviour and is safe. When replacing `np.long`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  ulong = _make_unsigned(np.long)
/usr/local/lib/python3.7/dist-packages/librosa/cache.py:49: DeprecationWarning: The 'cachedir' attribute has been deprecated in version 0.12 and will be removed in version 0.14.
Use os.path.join(memory.location, 'joblib') attribute instead.
  if self.cachedir is not None and self.level >= level:
/usr/local/lib/python3.7/dist-packages/librosa/cache.py:49: DeprecationWarning: The 'cachedir' attribute has been deprecated in version 0.12 and will be removed in version 0.14.
Use os.path.join(memory.location, 'joblib') attribute instead.
  if self.cachedir is not None and self.level >= level:
-----------  Configuration Arguments -----------
alpha: 1.2
beam_size: 10
beta: 0.35
cutoff_prob: 1.0
cutoff_top_n: 40
decoder: ctc_greedy
is_long_audio: False
lang_model_path: lm/zh_giga.no_cna_cmn.prune01244.klm
model_dir: models/deepspeech2/infer/
to_an: True
use_gpu: True
vocab_path: dataset/vocabulary.txt
wav_path: ./dataset/test.wav
------------------------------------------------


--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::framework::SignalHandle(char const*, int)
1   paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1637207454 (unix time) try "date -d @1637207454" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 10680 (TID 0x7f258fdb6740) from PID 0 ***]

Segmentation fault

环境如下：

系统：Ubuntu  18.04 64位
显卡：NVIDIA P100 (单卡)
驱动：10.2.89
内存：60G
Python：3.7.5
PaddlePaddle：2.1.3 （PIP安装）
项目：PPASR
模型：thchs_30(34小时) 
noise模型：无

NVIDIA信息如下：

root@pp:~/PPASR# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
root@pp:~/PPASR# nvidia-smi
Thu Nov 18 12:05:23 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:08.0 Off |                    0 |
| N/A   27C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

请问老师，我在项目：“PaddlePaddle-DeepSpeech”和“PPASR"使用【GPU】预测均遇到该问题，使用【CPU】预测正常。在训练时，使用【GPU】也是正常。

请问这个问题是什么原因导致的？麻烦可否告知如何解决？

感谢您百忙这中抽空审阅，谢谢。

长音频文件识别不了问题

请问下大佬，我用的是自己的wav上音频文件，但是总识别错误，用短音频识别可以，但是没有标点符号，然后demo里的长音频文件貌似没有了

执行 python create_data.py ，之后并没有生成 mean_std.npz

https://github.com/yeyupiaoling/PPASR/blob/master/docs/dataset.md

根据以上链接的步骤，执行 python create_data.py ，之后并没有生成 mean_std.npz

请问这是什么原因的？

visualDL提示无“无可视化结果展示”

你好~
在训练WenetSpeech的过程中想查看训练日志，但是输入visualdl --logdir=./log --host=0.0.0.0或者visualdl --logdir=log --host=0.0.0.0后，在visualDL页面中都提示无可视化结果展示，目前除了修改了部分训练参数外，没有对clone下来的代码进行过修改。对paddlepaddle框架不大熟悉，想请教下，这可能是什么原因导致的？

训练自己的数据集

您好，麻烦问下，我自己准备的数据集进行训练，训练发现，cer在0.24左右就下不去了，识别效果一般，没有thchs30训练识别的效果好。
我自己准备了1000小时左右的数据集进行训练的。
我们用自己的训练集进行训练的时候，需要对某些参数进行调整吗？

语言模型相关

目前想在现有语言模型的基础上针对专有名词做一个增强，想法有两种：
一是直接修改语言模型中 n-gram的概率，但好像 klm 不支持修改；
二是在原有语料集的基础上增加语料并重新训练，但不知道语料集的出处（好像是百度内部语料集）。

所以在这请教一下，有什么好的实现路径吗？

WenetSpeech训练时间的问题

您好~
按照项目的说明在训练WenetSpeech数据集，目前是单卡3090；但是看log提示，全部训练完65个epoch需要至少半年时间，同时看GPU的利用率也不是很高。我是初学者, 想请教下主要的瓶颈可能在哪里，有没有什么包括升级硬件在内的优化方法呢？

如何调用模型实时识别语音

更新日志中提到「2021.11.30: 全面修改为流式语音识别模型」，那应该能支持实时的语音识别？

是否有相应的调用代码提供参考，谢谢🙏

wenetspeech数据集GPU显存占用越来越大

大佬你好~
想请教一下，我训练wenetspeech数据集的时候，发现单个epoch的GPU显存占用，会越来越大，由一开始的2G增加到最后的21G，这个正常吗？我的训练参数设置如下

add_arg('batch_size',       int,    64,                       '训练的批量大小')
add_arg('num_workers',      int,    32,                        '读取数据的线程数量')
add_arg('num_epoch',        int,    65,                       '训练的轮数')
add_arg('learning_rate',    int,    5e-5,                     '初始学习率的大小')
add_arg('min_duration',     int,    0.5,                      '过滤最短的音频长度')
add_arg('max_duration',     int,    30,                       '过滤最长的音频长度，当为-1的时候不限制长度')
add_arg('alpha',            float,  2.2,                      '集束搜索的LM系数')
add_arg('beta',             float,  4.3,                      '集束搜索的WC系数')
add_arg('beam_size',        int,    300,                      '集束搜索的大小，范围:[5, 500]')
add_arg('num_proc_bsearch', int,    10,                       '集束搜索方法使用CPU数量')
add_arg('cutoff_prob',      float,  0.99,                     '剪枝的概率')
add_arg('cutoff_top_n',     int,    40,                       '剪枝的最大值')
add_arg('use_model',        str,    'deepspeech2',              '所使用的模型')
add_arg('train_manifest',   str,    'dataset/manifest.train',   '训练数据的数据列表路径')
add_arg('test_manifest',    str,    'dataset/manifest.test',    '测试数据的数据列表路径')
add_arg('dataset_vocab',    str,    'dataset/vocabulary.txt',   '数据字典的路径')
add_arg('mean_std_path',    str,    'dataset/mean_std.npz',     '数据集的均值和标准值的npy文件路径')
add_arg('augment_conf_path',str,    'conf/augmentation.json',   '数据增强的配置文件，为json格式')
add_arg('save_model_path',  str,    'models/',                  '模型保存的路径')
add_arg('decoder',          str,    'ctc_greedy',               '结果解码方法', choices=['ctc_beam_search', 'ctc_greedy'])
add_arg('lang_model_path',  str,    'lm/zh_giga.no_cna_cmn.prune01244.klm',        "语言模型文件路径")
add_arg('resume_model',     str,    None,                       '恢复训练，当为None则不使用预训练模型')
add_arg('pretrained_model', str,    None,                       '预训练模型的路径，当为None则不使用预训练模型')

运行train.py没有调用GPU训练

作者你好，我train.py后训练数据调用的是cpu，没有调用到GPU，我用paddle.fluid.install_check.run_check()检查显示在GPU或CPU上运行良好。希望你能解答一下，谢谢！

求助一下，wenetspeech的训练配置

麻烦大佬告知一下训练wenetspeech一万个小时数据集的时候，用几张什么显卡大概跑了多久呢，想预估一下自己这边的情况，万分感谢！

为何训练模型时loss突然变为nan

我最近在尝试训练PPASR模型,使用了WenetSpeech数据集和Aishell，Free ST-Chinese-Mandarin-Corpus，THCHS-30数据集,但是在训练过程中loss突然变为nan,训练依旧进行,我有调低学习率,但是没什么用,我应该怎么确定目前情况发生的原因呢,这是我训练时的参数配置
我是在有GPU卡的docker内进行的单卡训练

-----------  Configuration Arguments -----------
alpha: 2.2
augment_conf_path: conf/augmentation.json
batch_size: 128
beam_size: 300
beta: 4.3
cutoff_prob: 0.99
cutoff_top_n: 40
dataset_vocab: dataset/vocabulary.txt
decoder: ctc_beam_search
lang_model_path: lm/zhidao_giga.klm
learning_rate: 5e-05
max_duration: 20
mean_std_path: dataset/mean_std.npz
min_duration: 0.5
num_epoch: 65
num_proc_bsearch: 10
num_workers: 6
pretrained_model: None
resume_model: None
save_model_path: models/
test_manifest: dataset/manifest.test
train_manifest: dataset/manifest.train
use_model: deepspeech2
------------------------------------------------

下面图片是训练时loss变为nan

vocabulary.json下载

你好，我想请问下vocabulary.json在哪里下载呢

安装requirements里面的包安装失败

visualdl,cn2an,zhconv,paddlespeech_feat,webrtcvad，这几个包都无法成功安装在anaconda官网也找不到这几个包，麻烦大佬指点一下呀，谢谢。

Android预计什么时候支持

单机多卡训练失败怎么办？

(paddle_env) D:\python\PPASR__TEST\PPASR>python -m paddle.distributed.launch --gpus '0,1' train.py
-----------  Configuration Arguments -----------
backend: auto
elastic_server: None
force: False
gpus: '0,1'
heter_devices:
heter_worker_num: None
heter_workers:
host: None
http_port: None
ips: 127.0.0.1
job_id: None
log_dir: log
np: None
nproc_per_node: None
run_mode: None
scale: 0
server_num: None
servers:
training_script: train.py
training_script_args: []
worker_num: None
workers:
------------------------------------------------
WARNING 2022-02-16 18:25:24,619 launch.py:423] Not found distinct arguments and compiled with cuda or xpu. Default use collective mode
launch train in GPU mode!
INFO 2022-02-16 18:25:24,621 launch_utils.py:528] Local start 2 processes. First process distributed environment info (Only For Debug):
    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                       PADDLE_TRAINER_ID                        0                      |
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:55021               |
    |                     PADDLE_TRAINERS_NUM                        2                      |
    |                PADDLE_TRAINER_ENDPOINTS         127.0.0.1:55021,127.0.0.1:55022       |
    |                     PADDLE_RANK_IN_NODE                        0                      |
    |                 PADDLE_LOCAL_DEVICE_IDS                       '0                      |
    |                 PADDLE_WORLD_DEVICE_IDS                      '0,1'                    |
    |                     FLAGS_selected_gpus                       '0                      |
    |             FLAGS_selected_accelerators                       '0                      |
    +=======================================================================================+

INFO 2022-02-16 18:25:24,621 launch_utils.py:532] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
子目录或文件 -p 已经存在。
处理: -p 时出错。
子目录或文件 log 已经存在。
处理: log 时出错。
子目录或文件 -p 已经存在。
处理: -p 时出错。
子目录或文件 log 已经存在。
处理: log 时出错。
launch proc_id:16392 idx:0
launch proc_id:18756 idx:1
Traceback (most recent call last):
  File "train.py", line 4, in <module>
    from ppasr.trainer import PPASRTrainer
  File "D:\python\PPASR__TEST\PPASR\ppasr\trainer.py", line 11, in <module>
    import paddle
  File "D:\python\anaconda3\envs\paddle_env\lib\site-packages\paddle\__init__.py", line 293, in <module>
    from .hapi import Model  # noqa: F401
  File "D:\python\anaconda3\envs\paddle_env\lib\site-packages\paddle\hapi\__init__.py", line 25, in <module>
    logger.setup_logger()
  File "D:\python\anaconda3\envs\paddle_env\lib\site-packages\paddle\hapi\logger.py", line 47, in setup_logger
    local_rank = ParallelEnv().local_rank
  File "D:\python\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\dygraph\parallel.py", line 121, in __init__
    self._device_id = int(selected_gpus[0])
ValueError: invalid literal for int() with base 10: "'0"
INFO 2022-02-16 18:25:30,788 launch_utils.py:341] terminate all the procs
ERROR 2022-02-16 18:25:30,788 launch_utils.py:604] ABORT!!! Out of all 2 trainers, the trainer process with rank=[0, 1] was aborted. Please check its log.
INFO 2022-02-16 18:25:33,790 launch_utils.py:341] terminate all the procs
INFO 2022-02-16 18:25:33,790 launch.py:311] Local processes completed.

运行create_data.py后没有生成mean_std.npz

作者大大你好，我运行create_data.py后没有生成mean_std.npz，运行train.py时报错没有这个。希望你能解答一下谢谢！

E:\PyCharm2020\PycharmProjects\PPASR\Virtualenv_Environment\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
  0%|          | 0/13388 [00:00<?, ?it/s]-----------  Configuration Arguments -----------
annotation_path: dataset/annotation/
count_threshold: 2
dataset_vocab: dataset/vocabulary.txt
feature_method: linear
is_change_frame_rate: True
max_test_manifest: 10000
mean_std_path: dataset/mean_std.npz
noise_manifest_path: dataset/manifest.noise
noise_path: dataset/audio/noise
num_samples: 1000000
num_workers: 8
test_manifest: dataset/manifest.test
train_manifest: dataset/manifest.train
------------------------------------------------
开始生成数据列表...
100%|██████████| 13388/13388 [00:27<00:00, 489.01it/s]
完成生成数据列表，数据集总长度为34.16小时！
======================================================================
开始生成噪声数据列表...
噪声音频文件为空，已跳过！
======================================================================
开始生成数据字典...
100%|██████████| 13361/13361 [00:00<00:00, 16198.03it/s]
100%|██████████| 27/27 [00:00<00:00, 13728.48it/s]
数据字典生成完成！
======================================================================
开始抽取1000000条数据计算均值和标准值...
E:\PyCharm2020\PycharmProjects\PPASR\Virtualenv_Environment\lib\site-packages\paddle\fluid\reader.py:356: UserWarning: DataLoader with multi-process mode is not supported on MacOs and Windows currently. Please use signle-process mode with num_workers = 0 instead
  "DataLoader with multi-process mode is not supported on MacOs and Windows currently." \
100%|██████████| 209/209 [01:45<00:00,  1.50it/s]
进程已结束，退出代码 -1073741819 (0xC0000005)

能不能提供下Nvidia Jetson下3.7的PaddlePaddle的Inference预测库？

py3.6安装lac 有点问题

test cer 总是1.0

您好，麻烦问一下，我用的thchs30的训练集训练的，训练了50个epoch 但是test cer的值一直都是1.0，这个是为什么？
参数用的是默认的参数，没去做调整

机器是
ubuntu16.04
GPU tiant xp
12G显存

使用GPU，会卡住很久最终报Process finished with exit code -1073741819 (0xC0000005)终止程序

E:\Users\ikun\anaconda3\envs\speech\python.exe E:/Users/ikun/PycharmProjects/Speech_AI/MASR-master/infer_path.py
E:\Users\ikun\anaconda3\envs\speech\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
----------- Configuration Arguments -----------
alpha: 2.2
beam_size: 30
beta: 4.3
cutoff_prob: 0.99
cutoff_top_n: 40
decoder: ctc_beam_search
is_long_audio: False
lang_model_path: lm/zh_giga.no_cna_cmn.prune01244.klm
model_path: models/deepspeech2/inference.pt
real_time_demo: False
to_an: False
use_gpu: True
use_model: deepspeech2
vocab_path: dataset/vocabulary.txt
wav_path: ./dataset/040.wav

==================================================================
缺少 paddlespeech-ctcdecoders 库，请根据文档安装，如果是Windows系统，只能使用ctc_greedy。
【注意】已自动切换为ctc_greedy解码器。

W0111 22:20:02.530877 1384 analysis_predictor.cc:1353] Deprecated. Please use CreatePredictor instead.

Process finished with exit code -1073741819 (0xC0000005)

标点符号

想问下大佬，根据停顿添加标点符号怎么做？需要添加分词吗？

训练是loss很不稳定,字错率下降慢

目前模型已经训练了14个epoch,训练时loss频繁变化,最低为1,最高可到40多,字错率从初始0.57下降到0.49,不知道这是否是正常现象

使用1300数据集的那个模型出现乱码了，请问下可能是什么样的原因呢

----------- Configuration Arguments -----------
alpha: 1.2
beam_size: 10
beta: 0.35
cutoff_prob: 1.0
cutoff_top_n: 40
decoder: ctc_beam_search
is_long_audio: False
lang_model_path: D:\dnf�\zh_giga.no_cna_cmn.prune01244.klm
model_dir: D:\dnf\PPASR\infer\deepspeech2\infer
real_time_demo: False
to_an: True
use_gpu: True
use_model: deepspeech2
vocab_path: D:\dnf\PPASR\dataset\zh_vocab.txt
wav_path: C:\Users\qiegewala\Music\A2_2.wav

[4234, 5841, 1048, 3128, 4782, 2775, 4081, 3728, 2775, 5412, 5065, 3134, 1792, 3134, 2951, 1566, 1458, 1566, 1792, 1566, 5167, 1930, 3465, 5412, 1566, 4012, 2951, 5168, 1566, 2951, 5250, 48, 2290, 48, 2951, 5168, 1566, 5168, 5250, 2951, 2290, 2951, 5168, 769, 5168, 1458, 2951, 5177, 4497, 1566, 5658, 2760, 337, 3128, 2760]
消耗时间：1741ms, 识别结果: 冼悕仍肪霈烯葳酌烯袢嚟婶曰婶怂旗朴旗曰旗埵柯谌袢旗呻怂阊旗怂踟电盼电怂阊旗阊踟怂盼怂阊引阊朴怂徂馊旗跱趴星肪趴, 得分: 0

windows10，paddlepaddle-gpu==2.1.3 cudatoolkit=10.2，PaddlePaddle 2.2.0

[2022-03-14 21:08:52.489487] 短语音识别失败，错误信息：unknown format: 6

启动web服务后，上传录音一直报这个错误

测试模型都无法下载，地址不存在

mfcc维度

我发现用的是128维度mfccs = librosa.feature.mfcc(y=wav, sr=sr, n_mfcc=128, n_fft=512, hop_length=128).astype("float32")，但是看大多数都是13维，这个128维度是怎么得到的，128个三角滤波器吗，有什么用意吗，

TypeError: object of type 'NoneType' has no len()

执行create_data.py报错：
Traceback (most recent call last):
File "create_data.py", line 39, in
max_test_manifest=args.max_test_manifest)
File "/usr/local/PPASR/ppasr/trainer.py", line 138, in create_data
num_workers=self.num_workers)
File "/usr/local/PPASR/ppasr/utils/utils.py", line 196, in compute_mean_std
num_workers=num_workers)
File "/usr/local/PPASR/ppasr/data_utils/normalizer.py", line 40, in init
self._compute_mean_std(manifest_path, num_samples, num_workers)
File "/usr/local/PPASR/ppasr/data_utils/normalizer.py", line 94, in _compute_mean_std
for i in range(len(means)):

下载数据集后解压失败

你好，我想问下将filepath指定为数据集的绝对路径，然后运训aishell.py报错误：EOFError: Compressed file ended before the end-of-stream marker was reached是什么问题？

提示缺少 vocabulary.txt

您好，我在运行demo的时候，报错"No such file or directory: 'dataset/vocabulary.txt'"

train的时候最后一层linear的输入shape问题

AssertionError: Variable Shape not match, Variable [ linear_0.w_0_moment1_0 ] need tensor with shape (1024, 563) but load set tensor with shape (1024, 564)
create_data.py后词表vocab是563，但是开始训练后第一个epoch就报错，继续往下找
python3.7/site-packages/paddle/fluid/dygraph/layers.py中 _check_match(key, param)方法中发现，state_dict.get(key, None)在key=output.bias和output.weight的时候shape是564，但for key, param in self.state_dict().items():的最后几个param中的shape又是词表的563，所以导致了冲突
求大佬帮忙看看，感激不尽

进阶级的环境安装文档不需要安装paddlepaddle吗

Training loss is nan?

[2021-07-14 02:09:40.075967] Train epoch: 0, batch: 750/34799, loss: 6.70559, learning rate: 0.0001, train time: 4.319s
完成第750的保存
[2021-07-14 02:10:20.735872] Train epoch: 0, batch: 760/34799, loss: 6.64842, learning rate: 0.0001, train time: 4.218s
完成第760的保存
[2021-07-14 02:11:05.361007] Train epoch: 0, batch: 770/34799, loss: 6.76281, learning rate: 0.0001, train time: 6.372s
完成第770的保存
[2021-07-14 02:11:53.009762] Train epoch: 0, batch: 780/34799, loss: 6.61592, learning rate: 0.0001, train time: 3.155s
完成第780的保存
[2021-07-14 02:12:37.086634] Train epoch: 0, batch: 790/34799, loss: nan, learning rate: 0.0001, train time: 3.768s
完成第790的保存
[2021-07-14 02:13:21.365857] Train epoch: 0, batch: 800/34799, loss: nan, learning rate: 0.0001, train time: 3.402s
完成第800的保存
[2021-07-14 02:14:00.991322] Train epoch: 0, batch: 810/34799, loss: nan, learning rate: 0.0001, train time: 3.514s
完成第810的保存
[2021-07-14 02:14:50.522749] Train epoch: 0, batch: 820/34799, loss: nan, learning rate: 0.0001, train time: 3.567s
完成第820的保存
[2021-07-14 02:15:37.339313] Train epoch: 0, batch: 830/34799, loss: nan, learning rate: 0.0001, train time: 3.885s
How to deal with it？

在aistudio运行代码报错

from ppasr.trainer import PPASRTrainer

trainer = PPASRTrainer(mean_std_path="dataset/mean_std.npz",
                       train_manifest="dataset/manifest.train",
                       test_manifest="dataset/manifest.test",
                       dataset_vocab="dataset/vocabulary.txt",
                       num_workers=2)
trainer.create_data(annotation_path="dataset/annotation/")

报错信息如下：

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_17791/1498183113.py in <module>
      7                        num_workers=2)
      8 
----> 9 trainer.create_data(annotation_path="dataset/annotation/")

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ppasr/trainer.py in create_data(self, annotation_path, noise_manifest_path, noise_path, num_samples, count_threshold, is_change_frame_rate, max_test_manifest)
    108                         test_manifest_path=self.test_manifest,
    109                         is_change_frame_rate=is_change_frame_rate,
--> 110                         max_test_manifest=max_test_manifest)
    111         print('=' * 70)
    112         print('开始生成噪声数据列表...')

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ppasr/utils/utils.py in create_manifest(annotation_path, train_manifest_path, test_manifest_path, is_change_frame_rate, max_test_manifest)
     59             # 重新调整音频格式并保存
     60             if is_change_frame_rate:
---> 61                 change_rate(audio_path)
     62             # 获取音频长度
     63             audio_data, samplerate = soundfile.read(audio_path)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ppasr/utils/utils.py in change_rate(audio_path)
    103 # 改变音频采样率为16000Hz
    104 def change_rate(audio_path):
--> 105     data, sr = soundfile.read(audio_path)
    106     if sr != 16000:
    107         data = librosa.resample(data, sr, target_sr=16000)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py in read(file, frames, start, stop, dtype, always_2d, fill_value, out, samplerate, channels, format, subtype, endian, closefd)
    255     """
    256     with SoundFile(file, 'r', samplerate, channels,
--> 257                    subtype, endian, format, closefd) as f:
    258         frames = f._prepare_read(start, stop, frames)
    259         data = f.read(frames, dtype, always_2d, fill_value, out)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py in __init__(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
    627         self._info = _create_info_struct(file, mode, samplerate, channels,
    628                                          format, subtype, endian)
--> 629         self._file = self._open(file, mode_int, closefd)
    630         if set(mode).issuperset('r+') and self.seekable():
    631             # Move write position to 0 (like in Python file objects)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py in _open(self, file, mode_int, closefd)
   1182             raise TypeError("Invalid file: {0!r}".format(self.name))
   1183         _error_check(_snd.sf_error(file_ptr),
-> 1184                      "Error opening {0!r}: ".format(self.name))
   1185         if mode_int == _snd.SFM_WRITE:
   1186             # Due to a bug in libsndfile version <= 1.0.25, frames != 0

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py in _error_check(err, prefix)
   1355     if err != 0:
   1356         err_str = _snd.sf_error_number(err)
-> 1357         raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
   1358 
   1359 

RuntimeError: Error opening 'aset/audio/data_aishell/wav/test/S0764/BAC009S0764W0201.wav': System error.

语音合成如何利用speaker_audio下自由音频

“把需要说话人的语音放在tools/generate_audio/speaker_audio目录下，可以使用dataset/test.wav文件，可以到找多个人的音频放在tools/generate_audio/speaker_audio目录下，开发者也可以尝试入自己的音频放入该目录，这样训练出来的模型能更好识别开发者的语音，采样率最好是16000Hz。”

大佬我没看到利用speaker_audio目录下音频信息的代码，我看它合成的音频说话人都是取自下载的模型中models/fastspeech2_nosil_aishell3_ckpt_0.4/speaker_id_map.txt，这个是后续会开发还是说我看错了，麻烦您解惑！