Git Product home page Git Product logo

Comments (7)

Evanston0624 avatar Evanston0624 commented on June 10, 2024 2

感謝nl8590687對於製作ASR服務的意見與提點,對於TF-lite,模型只要可以轉換,在android有支援的版本下有硬體加速。我認為coreML也是如此(我還沒有了解此部分)。

我已經實踐了在Android上使用ASRT訓練的模型(轉為TF-lite)進行推理。

https://github.com/Evanston0624/ASRT_model_Android/tree/main
README 稍晚會創建,主要代碼在:
https://github.com/Evanston0624/ASRT_model_Android/tree/main/app/src/main/java/com/example/myapplication

上述的庫主要實踐了:

  1. 數據前處理(載入音訊的格式>頻譜>padding)
  2. 載入模型
  3. 調用模型取得輸出
  4. 編寫了一個ctc_decode(此部分跟ASRT調用的Keras的ctc_decoder不同,但我在小樣本下測試結果沒問題)

**在前三個階段在小樣本時的輸出數值跟python上相同。
**對於buffer的資料流傳遞可能存在問題,防呆可能也不完善。

尚未實踐

  • 將phoneme轉為詞彙
  • 運行效能測試

將ASRT訓練的模型轉為TF-lite的代碼後續會補上

from asrt_speechrecognition.

Evanston0624 avatar Evanston0624 commented on June 10, 2024

補充:
相同的模型使用predict_speech_file.py是可以正確預測輸出的

import os

from speech_model import ModelSpeech
from model_zoo.speech_model.keras_backend import SpeechModel251BN
from speech_features import Spectrogram
from language_model3 import ModelLanguage

os.environ["CUDA_VISIBLE_DEVICES"] = ""

AUDIO_LENGTH = 1600
AUDIO_FEATURE_LENGTH = 200
CHANNELS = 1
# 默认输出的拼音的表示大小是1428,即1427个拼音+1个空白块
OUTPUT_SIZE = 1431
sm251bn = SpeechModel251BN(
    input_shape=(AUDIO_LENGTH, AUDIO_FEATURE_LENGTH, CHANNELS),
    output_size=OUTPUT_SIZE
)
feat = Spectrogram()
ms = ModelSpeech(sm251bn, feat, max_label_length=64)
now_path = os.path.abspath(os.getcwd())

ms.load_model(now_path+'/save_models/SpeechModel251bn_cv/' + 'SpeechModel251bn_epoch40.model.base.h5')

res = ms.recognize_speech_from_file('test1.wav')
print('*[提示] 声学模型语音识别结果:\n', res)

from asrt_speechrecognition.

Evanston0624 avatar Evanston0624 commented on June 10, 2024

更新:
我透過原始的from utils.ops import read_wav_data來讀取音檔就可以了

def load_audio(audio_path):
    from utils.ops import read_wav_data
    wav_signal, sample_rate, _, _ = read_wav_data(audio_path)
    return wav_signal, sample_rate

轉頻譜的部分目前改回原本Spectrogram類下的run

# load audio
from speech_features import Spectrogram
data_pre = Spectrogram()
# 使用函數直接從音訊檔案中加載音訊數據並轉換為所需的格式
audio_path = 'test1.wav'  # 替換為你的音訊檔案路徑
wav_signal, sample_rate = load_audio(audio_path)

# audio pre
# audio_features = data_pre.onnx_run(wavsignal=wav_signal, fs=sample_rate)
audio_features = data_pre.run(wavsignal=wav_signal, fs=sample_rate)
audio_features = adaptive_padding(input_data=audio_features, target_length=1600)

from asrt_speechrecognition.

nl8590687 avatar nl8590687 commented on June 10, 2024

不建议直接在手机端运行,否则计算性能和依赖环境的安装配置都较为复杂,最佳方案是模型部署于服务器,手机通过API接口调用。具体讲解可以看AI柠檬博客相关文章。

from asrt_speechrecognition.

nl8590687 avatar nl8590687 commented on June 10, 2024

如果实在要在手机端部署也可以,那就需要你自行用对应平台支持的框架重写一遍推理能力了

from asrt_speechrecognition.

Evanston0624 avatar Evanston0624 commented on June 10, 2024

您好,我們的服務器在多用戶調用時的響應速度與不如預期,後續我有自己編寫一套透過socket的TCP+UDP實踐註冊與傳遞語音包的多進程程序,但在多用戶組的情況下響應也是不如預期。(上述問題可能是存在我們的硬體配置或網路等)

因此我想透過ONNX與TF-Lite來實現移動設備推理,我剛剛實際測試已經可以生成結果了,稍晚會把代碼發上來(python的測試代碼)。後續應該會使用java開發app程序,那這部分的工作應該如下:

  • 讀取音檔
  • 轉頻譜
  • tf-lite模型推理
  • onnx模型推理
  • ctc推理

我認為如果可以確定python的數據格式與java上的差異,應該可以正確運行

from asrt_speechrecognition.

nl8590687 avatar nl8590687 commented on June 10, 2024

单进程因为只有一个计算图资源,多用户并发调用响应速度慢是很正常的,你需要做的是多实例集群部署,负载均衡,而不是单纯的改通信协议。AI模型部署本身就是很耗费资源的。

from asrt_speechrecognition.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.