Git Product home page Git Product logo

cognitive-services-speech-sdk-rs's Introduction

cognitive-services-speech-sdk-rs


License License: MIT Crates.io rustdoc CI

Rust bindings for Microsoft Cognitive Speech Services SDK. Provides thin abstraction around native C API. Heavily inspired by official Go library. Provides speech-to-text, text-to-speech and bot framework dialog management capabilities.

Pull requests welcome!

Speech to text

use cognitive_services_speech_sdk_rs as msspeech;
use log::*;
use std::env;

async fn speech_to_text() {
    let filename = env::var("WAVFILENAME").unwrap();
    let audio_config = msspeech::audio::AudioConfig::from_wav_file_input(&filename).unwrap();

    let speech_config = msspeech::speech::SpeechConfig::from_subscription(
        env::var("MSSubscriptionKey").unwrap(),
        env::var("MSServiceRegion").unwrap(),
    )
    .unwrap();
    let mut speech_recognizer =
        msspeech::speech::SpeechRecognizer::from_config(speech_config, audio_config).unwrap();

    speech_recognizer
        .set_session_started_cb(|event| info!("set_session_started_cb {:?}", event))
        .unwrap();

    speech_recognizer
        .set_session_stopped_cb(|event| info!("set_session_stopped_cb {:?}", event))
        .unwrap();

    speech_recognizer
        .set_speech_start_detected_cb(|event| info!("set_speech_start_detected_cb {:?}", event))
        .unwrap();

    speech_recognizer
        .set_speech_end_detected_cb(|event| info!("set_speech_end_detected_cb {:?}", event))
        .unwrap();

    speech_recognizer
        .set_recognizing_cb(|event| info!("set_recognizing_cb {:?}", event.result.text))
        .unwrap();

    speech_recognizer
        .set_recognized_cb(|event| info!("set_recognized_cb {:?}", event))
        .unwrap();

    speech_recognizer
        .set_canceled_cb(|event| info!("set_canceled_cb {:?}", event))
        .unwrap();

    let result = speech_recognizer.recognize_once_async().await.unwrap();
    info!("got recognition {:?}", result);
}

Text to speech

use cognitive_services_speech_sdk_rs as msspeech;
use log::*;
use std::env;

async fn text_to_speech() {
    let pull_stream = msspeech::audio::PullAudioOutputStream::create_pull_stream().unwrap();
    let audio_config = msspeech::audio::AudioConfig::from_stream_output(&pull_stream).unwrap();

    let speech_config = msspeech::speech::SpeechConfig::from_subscription(
        env::var("MSSubscriptionKey").unwrap(),
        env::var("MSServiceRegion").unwrap(),
    )
    .unwrap();
    let mut speech_synthesizer =
        msspeech::speech::SpeechSynthesizer::from_config(speech_config, audio_config).unwrap();

    speech_synthesizer
        .set_synthesizer_started_cb(|event| info!("synthesizer_started_cb {:?}", event))
        .unwrap();

    speech_synthesizer
        .set_synthesizer_synthesizing_cb(|event| info!("synthesizer_synthesizing_cb {:?}", event))
        .unwrap();

    speech_synthesizer
        .set_synthesizer_completed_cb(|event| info!("synthesizer_completed_cb {:?}", event))
        .unwrap();

    speech_synthesizer
        .set_synthesizer_canceled_cb(|event| info!("synthesizer_canceled_cb {:?}", event))
        .unwrap();

    match speech_synthesizer.speak_text_async("Hello Rust!").await {
        Err(err) => error!("speak_text_async error {:?}", err),
        Ok(speech_audio_bytes) => {
            info!("speech_audio_bytes {:?}", speech_audio_bytes);
        }
    }
}

For more see github integration tests (tests folder) and samples (examples folder).

Build prerequisites

Currently only build on Linux is supported. Uses Clang and Microsoft Speech SDK shared libraries. Details can be found here here.

Install following prerequisites before running cargo build:

sudo apt-get update 
sudo apt-get install clang build-essential libssl1.0.0 libasound2 wget

Build is generating Rust bindings for Speech SDK native functions. These are already prebuilt and put into ffi/bindings.rs file. In most cases it is not necessary to regenerate them. Set following to skip bindings regeneration:

export MS_COG_SVC_SPEECH_SKIP_BINDGEN=1
cargo build

Added in this version

This version (0.2.0) brings following goodies:

  • Build support for ARM architecture.
  • Upgrade of Microsoft Speech SDK version to 1.22.0.
  • Preview of Embedded Speech Config (Details here). See also examples/recognizer/embedded_recognize_once_async_from_file.   EmbeddedSpeechConfig class is not yet available in public release (there are no tutorials/doc available how to create embedded speech models for this API) but Microsoft will be revealing this information in the near future (initially for selected customers only).   This will hopefully make possible to run embedded speech models (possibly on ARM devices) in offline mode emerging some very interesting applications of this library.

Version 0.2.1 brings on the top of that support for build on MacOs (target architecture aarch64), see below.

Version 0.2.2 adds MacOS support for target architecture arm.

How To Build On MacOS

We are supporting MacOS arm and aarch64 and x86_64 architectures.

In order to build on MacOS, download respective binaries of MS Speech SDK(v1.23.0) from here. You can also download the latest MacOS Speech SDK from Microsoft page but this will be the latest version of MS Speech SDK which might be not tested and working well with current version of cognitive-services-speech-sdk-rs.

Once downloaded, extract the content of the zip file (subfolder MicrosoftCognitiveServicesSpeech.xcframework/macos-arm64_x86_64) into dedicated folder, e.g. /Users/xxx/speechsdk. The content of the directory should look as follows:

➜  cd /Users/xxx/speechsdk 
➜  speechsdk ls -la
total 416
drwxr-xr-x   6 xxx  staff     192 Sep 17 19:55 .
drwxr-x---+ 66 xxx  staff    2112 Sep 17 23:21 ..
drwxr-xr-x   7 xxx  staff     224 Sep 17 17:15 MicrosoftCognitiveServicesSpeech.xcframework
-rw-r--r--   1 xxx  staff    1582 Jul 26 11:10 REDIST.txt
-rw-r--r--   1 xxx  staff  191072 Jul 26 11:10 ThirdPartyNotices.md
-rw-r--r--   1 xxx  staff   14893 Jul 26 11:10 license.md
➜  speechsdk

Run following commands to build:

export MACOS_SPEECHSDK_ROOT=/Users/xxx/speechsdk
cargo build

Speech SDK libraries are linked dynamically during build and run. When running the application use following environment variable to point to custom library location:

export DYLD_FALLBACK_FRAMEWORK_PATH=/Users/xxx/speechsdk/MicrosoftCognitiveServicesSpeech.xcframework/macos-arm64_x86_64

Then run your application utilizing cognitive-services-speech-sdk-rs or examples e.g.:

cargo run --example recognizer

cognitive-services-speech-sdk-rs's People

Contributors

adambezecny avatar mzachar avatar nfmccrina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cognitive-services-speech-sdk-rs's Issues

Using async inside callbacks

Hi,

First of all, thanks for this package!

I'm still learning rust, so sorry if they are silly questions.

  1. How to execute async calls inside callbacks?

Currently, I achieve it using a new Tokio runtime:

speech_recognizer
            .set_recognized_cb(move |event| {
                println!("Event!");
                let rt = tokio::runtime::Builder::new_current_thread()
                    .enable_all()
                    .build()
                    .unwrap();
                let tx = tx.clone();

                rt.block_on(async move {
                    println!("here 3 {:?}", event);
                    if (tx.send(event.result.text).await.is_err()) {
                        println!("tx not working!")
                    }
                });
            })
            .unwrap();

There is a better way to do it?

  1. How may I continue to listen to the microphone indefinitely?

Could a loop with sleep do it?

loop {
    tokio::sleep(Duration::from_millis(100)).await;
}

Thanks for your time!

macOS support M1

I'm trying with macOS (m1)... And get this error

  cargo:rustc-link-search=framework=/Users/jorgeucano/speechsdk/MicrosoftCognitiveServicesSpeech.xcframework/macos-arm64_x86_64
  cargo:rustc-link-lib=framework=MicrosoftCognitiveServicesSpeech

  --- stderr
  c_api/wrapper.h:7:10: fatal error: 'speechapi_c.h' file not found
  c_api/wrapper.h:7:10: fatal error: 'speechapi_c.h' file not found, err: true
  thread 'main' panicked at 'Unable to generate bindings: ()', build.rs:155:10
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Do you know why I got this error ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.