ws_speech_server

Overview

This is a websocket server app that provides access to speech synth/recog services.

It is mostly a helper for sip-lab to permit it to use speech synth/recog engines like google tts/stt, whisper etc during tests.

At the moment we only support engines 'dtmf-ss', 'dtmf-sr', 'bfsk-ss', 'bfsk-sr', 'google-ss' and 'google-sr'

(ss=speech-synth, sr=speech-recog)

Build

npm i
npm run build

If the build fails with something like:

$ npm run build

> [email protected] build
> npx rescript build                                                                       

rescript: [1/2] src/SpeechAgent.cmj
FAILED: src/SpeechAgent.cmj
                                             
  We've found a bug for you!
  /root/tmp/ws_speech_server/src/SpeechAgent.res:2:6-9
                                             
  1 │ open Types                     
  2 │ open Nact                        
  3 │ //open Commands                                                                      
  4 │ open Synther                  
                                                                                           
  The module or file Nact can't be found.
  - If it's a third-party dependency:                                                      
    - Did you list it in bsconfig.json?                                                    
    - Did you run `rescript build` instead of `rescript build -with-deps`
      (latter builds third-parties)?
  - Did you include the file's directory in bsconfig.json?
                                             
FAILED: cannot make progress due to previous errors.

do this:

npm run clean
npm run build

Starting

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/credentials/file
cp config/default.js.sample config/default.js # adjust if necessary
node src/App.bs.js

Commands

The ws_speech_server supports the following commands that are sent as JSON on the WebSocket connection:

start_speech_synth
start_speech_recog
stop_speech_synth
stop_speech_recog

Ex:

{
  cmd: "start_speech_synth",
  args: {
    sampleRate: 8000, // 8000 | 16000 | 32000 | 44100 | 48000
    engine: "dtmf-gen", // dtmf-gen | gss
    voice: "dtmf",
    language: "dtmf",
    text: '1234',
    times: 1, // number of times the text should be played
  }
}

{
  cmd: "start_speech_recog",
  args: {
    sampleRate: 8000, // 8000 | 16000 | 32000 | 44100 | 48000
    engine: "dtmf-det", // dtmf-det | gsr
    language: "dtmf",
  }
}

Messages

The ws_speech_server will send the following messages in the websocket connection:

synth_complete (when cmd start_speech_synth reaches the end of audio output)
speech (when cmd start_speech_recog detects speech).

Ex:

{"evt": "synth_complete"}

{"evt": "speech", "data": {"transcript":"abcd","timestamp":0.46}}

Testing

See manual tests here

reason-nact

We use reason-nact (actually, this is "rescript-nact") however it cannot be used with latest rescript 11 so we will stay with rescript 9.

This means we will not be able to use more recent modules which require rescript 11 like https://github.com/glennsl/rescript-json-combinators.

mayamatakeshi / ws_speech_server Goto Github PK

ws_speech_server's Introduction

ws_speech_server

Overview

Build

Starting

Commands

Messages

Testing

reason-nact

ws_speech_server's People

Contributors

Watchers

ws_speech_server's Issues

Need to stop SpeechAgent actor in case of ws conn error or closure

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent