ccoreilly / vosk-browser Goto Github PK

View Code? Open in Web Editor NEW

330.0 19.0 58.0 723.47 MB

A speech recognition library running in the browser thanks to a WebAssembly build of Vosk

License: Apache License 2.0

Makefile 5.56% Dockerfile 2.89% TypeScript 34.56% JavaScript 42.73% C++ 13.61% HTML 0.65%

kaldi vosk wasm webassembly asr stt speech-recognition speech-to-text typescript

vosk-browser's Introduction

Vosk-Browser

A somewhat opinionated speech recognition library for the browser using a WebAssembly build of Vosk

This library picks up the work done by Denis Treskunov and packages an updated Vosk WebAssembly build as an easy-to-use browser library.

Note: WebAssembly builds can target NodeJS, the browser's main thread or web workers. This library explicitly compiles Vosk to be used in a WebWorker context. If you want to use Vosk in a NodeJS application it is recommended to use the official node bindings.

Live Demo

Checkout the demo running in-browser speech recognition of microphone input or audio files in 13 languages.

Vosk-Browser Live Demo

Installation

You can install vosk-browser as a module:

$ npm i vosk-browser

You can also use a CDN like jsdelivr to add the library to your page, which will be accessible via the global variable Vosk:

<script type="application/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/vosk.js"></script>

Usage

See the README in ./lib for API reference documentation or check out the examples folder for some ways of using the library

Basic example

One of the simplest examples that assumes vosk-browser is loaded via a script tag. It loads the model named model.tar.gzlocated in the same path as the script and starts listening to the microphone. Recognition results are logged to the console.

async function init() {
    const model = await Vosk.createModel('model.tar.gz');

    const recognizer = new model.KaldiRecognizer();
    recognizer.on("result", (message) => {
        console.log(`Result: ${message.result.text}`);
    });
    recognizer.on("partialresult", (message) => {
        console.log(`Partial result: ${message.result.partial}`);
    });
    
    const mediaStream = await navigator.mediaDevices.getUserMedia({
        video: false,
        audio: {
            echoCancellation: true,
            noiseSuppression: true,
            channelCount: 1,
            sampleRate: 16000
        },
    });
    
    const audioContext = new AudioContext();
    const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1)
    recognizerNode.onaudioprocess = (event) => {
        try {
            recognizer.acceptWaveform(event.inputBuffer)
        } catch (error) {
            console.error('acceptWaveform failed', error)
        }
    }
    const source = audioContext.createMediaStreamSource(mediaStream);
    source.connect(recognizerNode);
}

window.onload = init;

Todos

Write tests
Automate npm publish
Automate demo publishing
Example with speaker model
Better documentation

vosk-browser's People

Contributors

Stargazers

Watchers

vosk-browser's Issues

Not working

I have downloaded the Zip file from github. Examples are not working from local files or from the glitch

Online demo created

Not sure if this is of any use, but I created a small online demo using this tool when I was experimenting with it. You can view it online at

https://captioner.richardson.co.nz/

And the source code for it is at: https://github.com/Rodeoclash/captioner

It might be possible to adapt this for an official demo if you're interested (although it is lacking a few things at the moment, i.e. it only works on video and currently the videos have no audio when playing).

Unable to load model

Hi,

Thanks for this work. I am using Chrome. The model file model.tar.gz is placed in the same folder. It never moves past "Loading..." message!

how do you start the demo locally?

Navigated to modern-vanilla directory and launched python3 -m http.server

can you share the output of the browser console?

ERROR (VoskAPI:Model():src/model.cc:122) Folder '/vosk/model_tar_gz' does not contain model files. Make sure you specified the model path properly in Model constructor. If you are not sure about relative path, use absolute path specification.
put_char @ 82049aad-16de-4cf3-9fcf-0c277f01fe02:41

Module not found: Error: Can't resolve 'worker_threads' in

I used the vosk-browser in one of my webpack projects and it throws these errors.
Module not found: Error: Can't resolve 'worker_threads' in ....

Steps I followed:

npm run rollup
and then in the lib npm pack .
copied it to my webpack project and did a webpack start.

Can't run

I keep getting "invalid base URL" when I try to load any of the example scripts.

Webpage is not loading

I have little coding experience, but I followed all guidelines to launch the demo app from examples/react folder. I ran npm install, npm build and a few other commands to resolve errors for webpack 5. However, finally when I ran npm run start, the vosk-browser fails to launch, even though no errors are detected. The page is empty.

C:\Users\CNata\Downloads\vosk-browser-master\examples\react>npm run start

> [email protected] start
> react-scripts start

(node:17120) [DEP_WEBPACK_DEV_SERVER_ON_AFTER_SETUP_MIDDLEWARE] DeprecationWarning: 'onAfterSetupMiddleware' option is deprecated. Please use the 'setupMiddlewares' option.
(Use `node --trace-deprecation ...` to show where the warning was created)
(node:17120) [DEP_WEBPACK_DEV_SERVER_ON_BEFORE_SETUP_MIDDLEWARE] DeprecationWarning: 'onBeforeSetupMiddleware' option is deprecated. Please use the 'setupMiddlewares' option.
Starting the development server...
Compiled successfully!

You can now view vosk-browser-react-demo in the browser.

  Local:            http://localhost:3000/vosk-browser
  On Your Network:  http://192.168.56.1:3000/vosk-browser

Note that the development build is not optimized.
To create a production build, use npm run build.

webpack compiled successfully
Files successfully emitted, waiting for typecheck results...
Issues checking in progress...
No issues found.

Is the model downloaded or loaded on the browser side??

Hello and have a good time

Is the model downloaded or loaded on the browser side??

In the firefox browser, when I am loading the model, the models are downloaded and remain in the loading mode

Improve Error Handling for processAudioChunk and createModel

Hi,

First of all thanks for this wonderful package I really enjoy using it and find it super useful.

I'm working on a project which utilizes vosk-browser, and noticed that the KaldiRecognizer.on method only supports 2 types of events result and partialresult as part of the TS definitions.

While browsing worker.ts code I noticed that the processAudioChunk method also handles an error.
This error is then catched (worker.ts:72) and emitted back to the model.

I want to subscribe to these errors but the current KaldiRecognizer.on interface only allows result and partialresult as input.

This is a small change and I can contribute myself if you would allow me to do so.

vosk-browser/lib/src/interfaces.ts

Line 128 in c7877a6

export type RecognizerMessage =

Add ServerMessageError to RecognizerMessage type:

export type RecognizerMessage =
  | ServerMessagePartialResult
  | ServerMessageResult
  | ServerMessageError;

Updated:
Another issue is that when I tried to reproduce an error during the processAudioChunk method execution, the result from this method is an error object { error: errorMessage }. I believe it should probably throw an error instead of returning an object so the catch in handleMessage method will build the correct error object. The retuned error object doesn't contain an event field so it is dispatched from the model as undefined and can't be catched by the consumers of the npm module.

vosk-browser/lib/src/model.ts

Line 66 in c7877a6

this.dispatchEvent(new CustomEvent(message.event, { detail: message }));

This is also the case for the createModel method which doesn't resolve the promise in case of fetch failure.

Regards,
Barak
Software Engineer @ Microsoft

npm i vosk-browser report errors

When I execute npm run dev

Can you tell me the answer？

run project,prompt :"You can run npm install --save worker_threads"

please help me,
use vosk-browser prompt error：

Module not found: Can't resolve imported dependency "worker_threads"
Did you forget to install it? You can run: npm install --save worker_threads

App • WARNING • Compilation succeeded but there are warning(s). Please check the log above.

Suggestion: Editing text, interactive clickable transcript, export functions

Hello!

Just wanted to say that I love what you're doing here! ❤️ The demo is amazing, and I can't wait to see how this project pans out.

A while ago, I proposed creating a FLOSS version of Otter.ai and Sonix over at Open Source Ideas:
open-source-ideas/ideas#288

I'm not sure if this is what you're envisioning for this project, but it would be interesting to have the ability to play back the audio and have the playback timed with the transcript. Clicking on a word could also toggle the playback to that point. (see Demo #6 of AblePlayer)
Additionally, the ability to edit and export the text would be helpful for people who use transcriptions (for closed captioning, research, journalism, meeting minutes, etc.)

How to reduce delay between results?

Hi, I’ve noticed Vosk wait a few split seconds after the user is done talking before emitting a result, unlike partial results which get fired continuously, but aren’t as reliable. Since our application only respond to short, single-word commands, we’d like to reduce Vosk’s result “de-bounce” time to make our application feels more responsive. Do you have any suggestions?

Adding Malayalam model to demo website

I have trained Malayalam ASR model for vosk and is available here. Can this be added to the demo website? Is there a way I can help?

How can I add a new language?

I have downloaded a Ukrainian model and changed archive type to tar.gz, but I couldn't load it in the browser. I have looked into other archives but noticed they have a file of unknown type corresponding to each folder in archive. Is there a way to add a new language?
Thank you.

Recognizer listens before the event 'result' or 'partialresult' is added

Hello!
If I say "Hello" and then run the code below, I get the result "Hello".

this.recognizer.addEventListener('partialresult', this.getPartialResult);
this.recognizer.addEventListener('result', this.getResult);

Expected: recognizer starts listening when got event listener.

I am creating a feature in which users press and speak.

I thought to write when users don't need a microphone like this, but then the recognizer pauses.

this.mediaStream.getAudioTracks (). forEach (track => {
   track.enabled = false;
});

So if after a long time user will press my button again, code will run track.enabled = true and recognizer will continue to recognize previous (not actual) voice.

Tested on Vue.js

model folder problem, and code modifications

Hi.

While running the code at https://github.com/ccoreilly/vosk-browser/tree/gh-pages using Firefox and local server, the index.html file would not run. I inspected the source code, and the relative path of model given was "vosk-browser\models\vosk-model-small-en-us-0.15.tar.gz", which I corrected to "models\vosk-model-small-en-us-0.15.tar.gz".

I corrected the code in index.html, and the code ran, and generated Speech-to-text through given audio file (Direct audio through mic still didn't worked, though mic worked, but all it did was sending audio chunks and never display the results in the textarea).

If I need to modify the code, I am unable to do that. How do I modify it. Can you provide the source code?

thanks.

Supported browsers

Demo address The display browser supports Firefox. Is there any way to make it work at Microsoft edge @ccoreilly

Failed to sync file system: Error: FS error

I am getting the following error in both Chrome and Firefox...

Failed to sync file system: Error: FS error
(anonymous) @ fcedf841-34f4-40cb-8bb0-17f857a1d44c:127
Promise.catch (async)
handleMessage @ fcedf841-34f4-40cb-8bb0-17f857a1d44c:126
(anonymous) @ fcedf841-34f4-40cb-8bb0-17f857a1d44c:107

fcedf841-34f4-40cb-8bb0-17f857a1d44c:127 links to the following code:

    class RecognizerWorker {
        constructor() {
            this.recognizers = new Map();
            ctx.addEventListener("message", (event) => this.handleMessage(event));
        }
        handleMessage(event) {
            const message = event.data;
            if (!message) {
                return;
            }
            if (ClientMessage.isLoadMessage(message)) {
                console.debug(JSON.stringify(message));
                const { modelUrl } = message;
                if (!modelUrl) {
                    ctx.postMessage({
                        error: "Missing modelUrl parameter",
                    });
                }
                this.load(modelUrl)
                    .then((result) => {
                    ctx.postMessage({ event: "load", result });
                })
                    .catch((error) => {                                                       // --- IT'S THIS ERROR THAT IS CATCHING  
                    console.error(error);
                    ctx.postMessage({ error: error.message });
                });
                return;

... etc

Do let me know if more details to reproduce the error are needed.

Thankyou!!

hololens2 errors

This example runs could not start audio source in hololens

Build output location

I am able to get the build to complete (when using the modification made in #56), but I cannot find the output files. I run the build by running make in the vosk-browser directory. Where does the build output its files? Do the output files need to be manually extracted from the Docker container?

Delays when transcribing streaming audio

First of all, excellent work. Vosk is great as it is, and this library makes it even better.

I am experiencing a heavy delay on transcription when pulling in a stream from webRTC (partials and fulls).

I suspect maybe it is because of the deprecated "createScriptProcessor" and "onaudioprocess" pieces, but I am unsure.

Here is how I am processing things. If you have any ideas as to why things would be delayed, please let me know. Thank you.

this.recognizeSpeech = async () => {
    console.log("starting recognizeSpeech");
    let audioContext = this.remoteAudioContext;
    let remoteStream = this.incomingAudioStream;
    //
    const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1);
    const model = await createModel("./softphone/model.tar.gz");
    const recognizer = new model.KaldiRecognizer(48000);
    recognizer.setWords(true);
    recognizer.on("partialresult", function (message) {
      console.log("PARTIAL: " + message.result.partial);
    });
    recognizerNode.onaudioprocess = async (event) => {
      try {
        recognizer.acceptWaveform(event.inputBuffer);
      } catch (error) {
        console.error("acceptWaveform failed", error);
      }
    };
    this.remoteTrack.connect(recognizerNode).connect(audioContext.destination);
  };

Is it possible to extract text from a large file faster than playback?

This is an awesome project. I was wondering if it was possible to extract the text from a file in a time shorter than the duration of the file? When testing out the hosted demo, it appears the text is extracted about the rate as if it were from a live source.

There's so much that is new and unfamiliar to me that I'm having a hard time understanding the code. I created a gist that has my attempt to make it work. This is a gist of the beforeUpload function on the Upload component...it extracts text, but I can't quite tell if it's any faster.

Assuming the extraction can be done faster, any idea on how to get an approximate timestamp in the audio file?

The ScriptProcessorNode is deprecated. Use AudioWorkletNode instead.

error in example if use Google chrome.
`async function init() {
const model = await Vosk.createModel('vosk-model-small-ru-0.15.tar.gz');

const recognizer = new model.KaldiRecognizer();
recognizer.on("result", (message) => {
    console.log(`Result: ${message.result.text}`);
    TTS(message.result.text)
});
recognizer.on("partialresult", (message) => {
    console.log(`Partial result: ${message.result.partial}`);
    
});

const mediaStream = await navigator.mediaDevices.getUserMedia({
    video: false,
    audio: {
        echoCancellation: true,
        noiseSuppression: true,
        channelCount: 1,
        sampleRate: 16000
    },
});

const audioContext = new AudioContext();
const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1)
recognizerNode.onaudioprocess = (event) => {
    try {
        recognizer.acceptWaveform(event.inputBuffer)
    } catch (error) {
        console.error('acceptWaveform failed', error)
    }
}
const source = audioContext.createMediaStreamSource(mediaStream);
source.connect(recognizerNode);

}

window.onload = init;`

Models not loading (downloading) in Firefox 96

When load button is pushed, browser console gives the error:

Failed to sync file system: InvalidStateError: A mutation operation was attempted on a database that did not allow mutations. b28a5120-7502-42f6-9dd0-5e7a322b0752:117:21
    error blob:https://ccoreilly.github.io/b28a5120-7502-42f6-9dd0-5e7a322b0752:117
    handleMessage blob:https://ccoreilly.github.io/b28a5120-7502-42f6-9dd0-5e7a322b0752:166

and does not download the model. Tried for English and Catalan.

OS: Ubuntu 20.04

https://zlib.net/zlib-1.2.11.tar.gz is not found.

make builder is not runnning.

#23 [19/30] RUN curl --fail -q -L https://zlib.net/zlib-1.2.11.tar.gz |     tar xz --strip-components=1
#23 sha256:466b94055ed6579c06b1fa768fc518b6b5ac4f293f9d91cbcb111eb106a2c522
#23 0.343   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#23 0.343                                  Dload  Upload   Total   Spent    Left  Speed
  0   315    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
#23 1.454 curl: (22) The requested URL returned error: 404
#23 1.457 
#23 1.457 gzip: stdin: unexpected end of file
#23 1.460 tar: Child returned status 1
#23 1.460 tar: Error is not recoverable: exiting now
#23 ERROR: executor failed running [/bin/sh -c curl --fail -q -L https://zlib.net/zlib-1.2.11.tar.gz |     tar xz --strip-components=1]: exit code: 2
------
 > [19/30] RUN curl --fail -q -L https://zlib.net/zlib-1.2.11.tar.gz |     tar xz --strip-components=1:
------
executor failed running [/bin/sh -c curl --fail -q -L https://zlib.net/zlib-1.2.11.tar.gz |     tar xz --strip-components=1]: exit code: 2
make: *** [builder] Error 1

View timing of words/phonemes?

The type for ServerMessageResult hints at timing information being available from Vosk for words.

    export interface ServerMessageResult {
    event: "result";
    recognizerId: string;
    result: {
        result: Array<{.      // This is maybe where I could find word timing.
            conf: number;
            start: number;
            end: number;
            word: string;
        }>;
        text: string;
    };
}

...but the message received in the result callback doesn't have a result.result.* value.

Is there some way to get the timing info? I would do wonderful things with it.

Big fan of vosk-web. Thanks, Ciaran!

Can't build

Hello I try to reproduce the work produced by arbdevml for issue #49.

I fork the repository.
I have docker, make installed.

I reproduce the described steps :

on vosk-browser folder

make builder

I have the following output :

#7 [ 4/30] RUN git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579
#7 sha256:d72b762a9137ae3da9126377d52f4ac1e5fb4134afc31851f4a093636254bbdc
#7 0.474 Cloning into '.'...
#7 19.32 fatal: reference is not a tree: 6417ac1dece94783e80dfbac0148604685d27579
#7 ERROR: executor failed running [/bin/sh -c git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579]: exit code: 128
------
 > [ 4/30] RUN git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579:
------
executor failed running [/bin/sh -c git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579]: exit code: 128

I think the kaldi project was updated and the git hash does not exists anymore

=> I check the rest of the Dockerfile file and i see that a clone of an inria repository is needed... but this repository seems to be not accessible ?

Someone can help me because i really want to help and have the voice fingerprinting feature with spk model that i have already experiment on python distro ?

Unable to load model in nodejs

When I run the following code:

let Vosk = require("vosk-browser");
let url = "model.tar.gz";
async function init() {
  const model = await Vosk.createModel(url);
}
init();

I get this error:

this.worker.addEventListener("message", (event) => this.handleMessage(event));
                   ^

TypeError: this.worker.addEventListener is not a function
    at EventTarget.initialize (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:238:25)
    at new Model (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:235:18)
    at Object.<anonymous> (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:354:27)
    at Generator.next (<anonymous>)
    at /Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:28:75
    at new Promise (<anonymous>)
    at __awaiter (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:24:16)
    at Object.createModel (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:353:16)
    at init (/Users/bobby/Desktop/vosk-browser/index.js:5:28)
    at Object.<anonymous> (/Users/bobby/Desktop/vosk-browser/index.js:7:1)

My folder structure is

|
|-- index.js
|-- model.tar.gz
|-- node_modules/

So I would thing the program could load the model, but I also get the same error when I set the url to be complete gibberish.

Thanks for your help

can not build vosk-browser

when i run make

#7 [ 4/30] RUN git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579
#7 sha256:d72b762a9137ae3da9126377d52f4ac1e5fb4134afc31851f4a093636254bbdc
#7 0.455 Cloning into '.'...
Updating files: 100% (8265/8265), done.5)
#7 15.75 fatal: reference is not a tree: 6417ac1dece94783e80dfbac0148604685d27579
#7 ERROR: executor failed running [/bin/sh -c git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579]: exit code: 128

i think commit 6417ac1dece94783e80dfbac0148604685d27579 was removed

recognizer.on result never gets called.

I used the react code, and it never calls this piece of code. Is there a parameter to set to enable collecting result?

recognizer.on("result", (message: any) => {
      const { result } = message;
      setUtterances((utt: VoskResult[]) => [...utt, result]);
    });

vosk-browser/tree/master/examples/modern-vanilla

Hi Ciaran, great work, the web demo is very impressive !

Is the tar.gz the correct file format that this library expects ? I can not get the basic demo above to load, or does it need something else, like an extracted version of the language library?

Spent hours trying to figure out what is wrong, it seems to have issues loading the library.

this is the error I get when trying to run it in Chrome (also tried Opera and Firefox)
""""
cb15bbf8-7209-4922-9c12-0b9e258dbd24:127 Error: HTTP error! status: 404
at cb15bbf8-7209-4922-9c12-0b9e258dbd24:41:4212557
at Generator.next ()
at loop (cb15bbf8-7209-4922-9c12-0b9e258dbd24:41:4211624)
at cb15bbf8-7209-4922-9c12-0b9e258dbd24:41:4211805
"""

the directory I run it in (linux webserver running apache) has this file (english):
vosk-model-small-en-us-0.15.tar.gz

and I updated the index.js demo to call it:
from:
const model = await Vosk.createModel('model.tar.gz');
to
const model = await Vosk.createModel('vosk-model-small-en-us-0.15.tar.gz');

I am really excited about getting this to work, so would really appreciate your help with any basic demo anyone could run locally.

thanks!
Emerson

Vosk model

I am new to javascript. I want to see how the vosk-browser script worked using the sample script.
I downloaded a vosk model, zipped it as tar.gz and put it in the same folder as the script. I tried to just check for errors using a button onclick event on a html page. I got this on visual studio code:
Setting up persistent storage at /vosk null/4ccd8af6-9ac1-407c-9f6a-436d83146d69:147
File system synced from host to runtime null/4ccd8af6-9ac1-407c-9f6a-436d83146d69:40
Am I to create a folder named "vosk". I really do not understand.
Thank you for responding.

how to build and get a new demo address？

16kHz sample rate does not work

From the examples, it looks like the required sample rate is 44.1kHz or 48kHz (they both seem to generate accurate transcriptions, not sure which one is better). I tried setting 16kHz for the microphone, audio context, and recognizer, but the transcriptions were not valid at all. I thought the models work with 16kHz; is there a reason why this sample rate doesn't work? The poster of #48 mentioned having to update from 16k to 48kHz in order for the basic example to work.

Attribution difficult

The NOTICES file doesn't include all dependent software, but every piece of dependent software requires attribution. This makes it extremely difficult for anyone to put together a correct (and legally mandatory) attribution and license notice. I put this one together, which I believe includes all dependencies: https://raw.githubusercontent.com/Yahweasel/ennuicastr/master/src/vosk-browser-license.js .

Moreover, I was surprised to find GSL in the mix. GSL is under the GPL (not the LGPL), so if it's being used, then vosk-browser as a whole is licensed under the GPL. That's no problem for my use, but it should be documented somewhere. Weirdly, though, as far as I can tell, it's not actually using GSL. The kaldi patch seems to add GSL to the configure, but doesn't add any uses of GSL as far as I can tell. If it was some experiment (perhaps from the original porter of vosk?) it should just be removed, to fix this licensing snafu.

Recognizer.removeEventListener

I am currently using Vue Js to run Vosk-browser and manage to call the ASR model and Kaldi recognizer by using

this.recognizer.on("result", (message) => {
    const result = message.result;
    this.full.textContent += result.text + " "
})

The model is working well, however, I am trying to remove the event listener by using:

this.recognizer.removeEventListener("result", (message) => {
    const result = message.result;
    this.full.textContent += result.text + " "
})

Is this the way of doing it?

Result event not triggered on file upload

Hello, I am working on a way to pass audio file to the recognizer all at once.

I took the react example and edited file-upload.tsx to send the whole file as buffer to the AudioStreamer "_write" method.
The problem reside on the "result" event of the recognizer not being fired after the process.
The "partialresult" event is called with every words but misses timestamps.

Here is the implementation of the "onChange" function in file-upload.tsx:

const onChange = useCallback(
    async ({ file }: UploadChangeParam<UploadFile<any>>) => {

      if (
        recognizer &&
        file.originFileObj &&
        file.percent === 100
      ) {
        const fileUrl = URL.createObjectURL(file.originFileObj);
        const _audioContext = audioContext ?? new AudioContext();
        const arr = await fetch(fileUrl).then((res) => res.arrayBuffer());

        _audioContext.decodeAudioData(arr, (buffer) => {
          let audioStreamer = new AudioStreamer(recognizer);
          audioStreamer._write(buffer, {
            objectMode: true,
          }, () => {
            console.log('done')
          });
        });
      }
    },
    [audioContext, recognizer]
  );

I have also noticed when uploading a second file it works well, the result event is triggered and includes both files data.

What I am missing? Is there a way to dispatch a "result" event?

The model isn't loading

I see there is an model.tar.gz in public folder in react example. I want to use it for testing purposes, but the model isn't loading.

Build broken by kaldi repo

The kaldi repo no longer has an upstream-1.8.0 branch nor a revision 75ecaef39 (thanks, git, for allowing erasing history). Right now, vosk-browser doesn't build because of these issues.

Recognizer not ready, ignoring (Browser testing)

Hello team!

I am testing your software using apache2 on a virtual machine running a Ubuntu server on windows.
This is the index and I was trying to test the microphone input.
2
1
3 <script type="application/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/vosk.js"></script>
1
2
3
4
5 <script>
6 async function init() {
7 const model = await Vosk.createModel('https://ccoreilly.github.io/vosk-browser/models/vosk-model-small-en-us-0.15.tar.gz');
8
9 const recognizer = new model.KaldiRecognizer();
10 recognizer.on("result", (message) => {
11 console.log(Result: ${message.result.text});
12 });
13 recognizer.on("partialresult", (message) => {
14 console.log(Partial result: ${message.result.partial});
15 });
16
17 const mediaStream = await navigator.mediaDevices.getUserMedia({
18 video: false,
19 audio: {
20 echoCancellation: true,
21 noiseSuppression: true,
22 channelCount: 1,
23 sampleRate: 16000
24 },
25 });
26
27 const audioContext = new AudioContext();
28 const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1)
29 recognizerNode.onaudioprocess = (event) => {
30 try {
31 recognizer.acceptWaveform(event.inputBuffer)
32 } catch (error) {
33 console.error('acceptWaveform failed', error)
34 }
35 }
36 const source = audioContext.createMediaStreamSource(mediaStream);
37 source.connect(recognizerNode);
38 }
39
40 window.onload = init;
41 </script>
42
43
44 Hola!
45
46
~

The result on console is the repetition of the following lines:

Recognizer (id: d6562c55-8db5-4918-9c65-fc0d1f061ff2): Sending audioChunk vosk.js:333:29
Recognizer (id: d6562c55-8db5-4918-9c65-fc0d1f061ff2): process audio chunk with sampleRate 192000 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:269:25
Recognizer (id: d6562c55-8db5-4918-9c65-fc0d1f061ff2): process audio chunk with sampleRate 192000 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:269:25
Recognizer not ready, ignoring 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:271:29
Recognizer not ready, ignoring 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:271:29
Recognizer (id: d6562c55-8db5-4918-9c65-fc0d1f061ff2): Sending audioChunk vosk.js:333:29
Recognizer (id: d6562c55-8db5-4918-9c65-fc0d1f061ff2): process audio chunk with sampleRate 192000 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:269:25
Recognizer not ready, ignoring 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:271:29
Recognizer not ready, ignoring 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:271:29

Could you please help me?
Thanks in advance

AudioWorklet support via SEPIA Web Audio?

Hi everybody,

I just saw this project and thought it was very interesting and fits quite well to a library I've just released 🙂 .
For my SEPIA Open Assistant project I've built the SEPIA Web Audio Library that can handle custom audio pipelines with AudioWorklet and Web-Worker support. There is pretty good WASM support as well since the resampler for example can use Speex via a WASM module.

The library has a module that interfaces with Vosk via the SEPIA STT-Server (a WebSocket streaming STT server). Currently I prefer to host Vosk on a Raspberry Pi 4 instead of running it on the client, but I'm pretty sure much of the code could be reused 😃 .

Let me know if this sounds interesting to you and I can help to get started!

How does it work

examples Demo how does is work， Please give me some advice ，thank you

information available in the User Agent string will be reduced

A page or script is accessing at least one of navigator.userAgent, navigator.appVersion, and navigator.platform. Starting in Chrome 101, the amount of information available in the User Agent string will be reduced.
To fix this issue, replace the usage of navigator.userAgent, navigator.appVersion, and navigator.platform with feature detection, progressive enhancement, or migrate to navigator.userAgentData.
Note that for performance reasons, only the first access to one of the properties is shown

createScriptProcessor deprecated

Looks like createScriptProcessor is deprecated. https://developer.mozilla.org/en-US/docs/Web/API/BaseAudioContext/createScriptProcessor

Can we get an updated version that uses modern technique?

Cant build example/react

First thanks for an amazing contribution.

Second, trying to build (npm run build), I get that recognizer.tsx can't find vosk-browser. Went to node_modules/vosk-browser and did npm build, which solved the first issue, but this led to others.

Any ideas?

Thanks again!

I am using this example: https://github.com/ccoreilly/vosk-browser/blob/master/examples/words-vanilla/index.js

const model = await Vosk.createModel('vosk-model-small-en-in-0.4.tar.gz');
const speakerModel = await Vosk.createSpeakerModel('vosk-model-spk-0.4.zip');

...

const recognizer = new model.KaldiRecognizer(sampleRate, JSON.stringify(['[unk]', 'encen el llum', 'apaga el llum']));
recognizer.setSpkModel(speakerModel);
recognizer.on("result", (message) => {
	const result = message.result;
	if(result.hasOwnProperty('spk'))
		console.info("X-vector:", result.spk);
});

Speaker identification model:
https://alphacephei.com/vosk/models/vosk-model-spk-0.4.zip

Node.js example:
https://github.com/alphacep/vosk-api/blob/master/nodejs/demo/test_speaker.js

Could you offer some advice, please:

How to load vosk-model-spk-0.4.zip
How to implement methods createSpeakerModel and setSpkModel
How to fetch the X-vector of the speaker (voice fingerprint)?
Thank you for your answer.