Git Product home page Git Product logo

asr_syllable's Introduction

ASR_Syllable

=======================基于卷积神经网络的语音识别声学模型的研究========================

此项目是对自己研一与研二上之间对于DCNN-CTC学习总结,提出了MCNN-CTC以及Densenet-CTC声学模型,最终实验结果如下所示:

1) Thchs30_TrainingResults

Thchs30训练以及微调训练曲线

2) Thchs30_Results

Thchs30实验结果

3) Stcmds_Results

Stcmds实验结果

声学模型介绍

1) DCNN-CTC声学模型介绍

该模型主要是在speech_model-05上进行修改,上述模型主要使用DCNN-CTC构建语音识别声学模型,STcmds 数据集也是仿照该模型进行修改,最后实验结果如上图所示;

2) MCNN-CTC声学模型介绍

该模型主要是在speech_model_10 脚本上进行实验,最终实验结果可在上图2)所示结果,最终MCNN-CTC总体实验结果相较于DCNN-CTC较好;

3) DenseNet-CTC声学模型介绍

上述模型主要是在 DenseNet上进行实验,最终实验在Thchs30数据集结果可以达到接近30%左右的CER,具体实验可以自己付尝试一下;

4) Attention-CTC声学模型

此模型主要在DCNN-CTC基础上,在全连接层进行注意力操作,最终结果相较于其他结果相较于DCNN-CTC可能有提升,具体可以参看speech_model_06脚本;主要算法实验如下所示:
NN(Attention)-CTC:
# dense1 = Dense(units=512, activation='relu', use_bias=True, kernel_initializer='he_normal')(reshape)
# attention_prob = Dense(units=512, activation='softmax', name='attention_vec')(dense1)
# attention_mul = multiply([dense1, attention_prob])
#
# dense1 = BatchNormalization(epsilon=0.0002)(attention_mul)
# dense1 = Dropout(0.3)(dense1)

迁移学习

Retraining(重新训练)主要对初始模型进行进一步微调,可进一步提升初始模型的准确率,具体训练脚本可参看 train_modelSpeech 脚本,本文主要针对全部网路层进行微调,实验结果相较于初始模型可进一步提升,具体实验结果可参看图1)

论文引用

W Zhang, M H Zhai, Z L Huang, et al. Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks[C]. https://doi.org/10.1007/978-3-030-27529-7_29

参考项目连接

个人博客 包含自己近期的学习总结
参考链接
ASR_WORD以字为建模单元构建语音识别声学模型

asr_syllable's People

Contributors

zw76859420 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

asr_syllable's Issues

关于语言模型

你好,首先感谢您的研究,我想请问下您考虑过将拼音转换成汉字吗?我看了这部分代码,实现结果不是太好。我现在在做以声韵母为基元的声学模型建模,在语音转汉字这块不知道怎么做。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.