Git Product home page Git Product logo

ws_speech_server's Introduction

ws_speech_server

Overview

This is a websocket server app that provides access to speech synth/recog services.

It is mostly a helper for sip-lab to permit it to use speech synth/recog engines like google tts/stt, whisper etc during tests.

At the moment we only support engines 'dtmf-ss', 'dtmf-sr', 'bfsk-ss', 'bfsk-sr', 'google-ss' and 'google-sr'

(ss=speech-synth, sr=speech-recog)

Build

npm i
npm run build

If the build fails with something like:

$ npm run build

> [email protected] build
> npx rescript build                                                                       

rescript: [1/2] src/SpeechAgent.cmj
FAILED: src/SpeechAgent.cmj
                                             
  We've found a bug for you!
  /root/tmp/ws_speech_server/src/SpeechAgent.res:2:6-9
                                             
  1 │ open Types                     
  2 │ open Nact                        
  3 │ //open Commands                                                                      
  4 │ open Synther                  
                                                                                           
  The module or file Nact can't be found.
  - If it's a third-party dependency:                                                      
    - Did you list it in bsconfig.json?                                                    
    - Did you run `rescript build` instead of `rescript build -with-deps`
      (latter builds third-parties)?
  - Did you include the file's directory in bsconfig.json?
                                             
FAILED: cannot make progress due to previous errors.

do this:

npm run clean
npm run build

Starting

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/credentials/file
cp config/default.js.sample config/default.js # adjust if necessary
node src/App.bs.js

Commands

The ws_speech_server supports the following commands that are sent as JSON on the WebSocket connection:

  • start_speech_synth
  • start_speech_recog
  • stop_speech_synth
  • stop_speech_recog

Ex:

{
  cmd: "start_speech_synth",
  args: {
    sampleRate: 8000, // 8000 | 16000 | 32000 | 44100 | 48000
    engine: "dtmf-gen", // dtmf-gen | gss
    voice: "dtmf",
    language: "dtmf",
    text: '1234',
    times: 1, // number of times the text should be played
  }
}

{
  cmd: "start_speech_recog",
  args: {
    sampleRate: 8000, // 8000 | 16000 | 32000 | 44100 | 48000
    engine: "dtmf-det", // dtmf-det | gsr
    language: "dtmf",
  }
}

Messages

The ws_speech_server will send the following messages in the websocket connection:

  • synth_complete (when cmd start_speech_synth reaches the end of audio output)
  • speech (when cmd start_speech_recog detects speech).

Ex:

{"evt": "synth_complete"}

{"evt": "speech", "data": {"transcript":"abcd","timestamp":0.46}}

Testing

See manual tests here

reason-nact

We use reason-nact (actually, this is "rescript-nact") however it cannot be used with latest rescript 11 so we will stay with rescript 9.

This means we will not be able to use more recent modules which require rescript 11 like https://github.com/glennsl/rescript-json-combinators.

ws_speech_server's People

Contributors

mayamatakeshi avatar

Watchers

 avatar

ws_speech_server's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.