Git Product home page Git Product logo

Comments (21)

ccoreilly avatar ccoreilly commented on June 12, 2024

Hi @msqr1 ! Great initiative :) software evolves and needs to be maintained. I do not have time to dedicate to this repository so it is good that better alternatives surge and gain traction.

I'll have a deeper look at your work later this week. In the end, users decide based on the developer experience and the features of these libraries so I'd be interested on what other users like @Yahweasel or @erikh2000 think.

from vosk-browser.

Yahweasel avatar Yahweasel commented on June 12, 2024

The core thing I need out of vosk-browser is to not have an AudioContext-level API. I do all of my own audio capturing and ten other layers of processing. Further, although in my own project I do use threads, so SharedArrayBuffer is a nonissue, it's valuable to have a version that runs synchronously, because some users (including myself) manage their own threads. I would rather have a vosk running synchronously with a Worker thread I created on my own than running asynchronously with a Worker thread created by a library. To excessively toot my own horn, my own libav.js allows the user to load it in a synchronous mode, a worker mode, or a threaded mode, and provides the same API in all three.

Basically: I wouldn't mind a more up-to-date vosk adapter, but as stands, your API is too opinionated for me.

from vosk-browser.

msqr1 avatar msqr1 commented on June 12, 2024

You're right, I try to make this as easy to use as possible, just some minimal setup and you can start recognizing. I agree that more features should be added, but as this is the first version, I want to make it as fast and easy to setup as possible. Other use cases can be addressed later.

from vosk-browser.

erikh2000 avatar erikh2000 commented on June 12, 2024

@msqr1 I'm interested in your project, but I'm likely to stick with vosk-browser out of inertia and not having any complaints with it. The main thing I saw in Vosklet that I'd like to see in vosk-browser, if practical, is more of the Vosk functions exposed. I had told myself that at some point I'd get vosk-browser building and try to contribute that myself, but I never got around to it.

The faster processing time is intriguing too. What kind of metrics are you seeing?

from vosk-browser.

msqr1 avatar msqr1 commented on June 12, 2024

I didn't really measured it, ngl, so maybe I should remove that line. But, I moved hot computations to c++ like free, mapping input data, I also use a simpler mechanism to communicate between js and c++, I used the faster new emscripten wasmfs, I used the new emmalloc, I turned on o3, lto, simd, non trapping float to int and many more... As such, I think it should be faster. You're right, I shouldn't claim anything without benchmarks.

from vosk-browser.

erikh2000 avatar erikh2000 commented on June 12, 2024

No worries, @msqr1. I don't expect you to be super-scientific in your claims. I was just curious about what kind of speed increase you might be seeing. Your changes for performance seem promising.

from vosk-browser.

Yahweasel avatar Yahweasel commented on June 12, 2024

FYI, simd will do not a damned thing (other than make it not work on Safari) unless the code is specifically written to use it. wasm simd is broadly compatible with x86 simd, but only the C API, and nobody uses the C API. I would be stunned to learn that that's gaining you anything. I had a simd version of libav.js for years and finally ditched it because it wasn't actually beneficial.

from vosk-browser.

msqr1 avatar msqr1 commented on June 12, 2024

Well, the thing is kaldi just refuses to compile with simd off, so I have to turn it on. It may or may not do anything though.

from vosk-browser.

Yahweasel avatar Yahweasel commented on June 12, 2024

Oh, well that's just lovely X-D

from vosk-browser.

msqr1 avatar msqr1 commented on June 12, 2024

Just curious, how do you use a speech recognition library with your libav project? Isn't that for audio formats?

from vosk-browser.

Yahweasel avatar Yahweasel commented on June 12, 2024

I do not. I use both in Ennuicastr.

from vosk-browser.

msqr1 avatar msqr1 commented on June 12, 2024

I can make a sync version, I just don't know how it is possible. If you block the current thread to recognize, how do you stop it? Synchronous model and recognizer loading should be easy. I'm not sure about the recognizer loop.

from vosk-browser.

Yahweasel avatar Yahweasel commented on June 12, 2024

I can make a sync version, I just don't know how it is possible. If you block the current thread to recognize, how do you stop it? Synchronous model and recognizer loading should be easy. I'm not sure about the recognizer loop.

We're on an issue submitted to a synchronous version of the same API ;)

from vosk-browser.

msqr1 avatar msqr1 commented on June 12, 2024

The recognizer, I can't see how it is synchronous? It can't be blocking the one thread that is controlling itself.
Can I take a look at the issue? Maybe there is something I can do. Keep in mind that even if the recognizer is asynchronous, you can bind event listener to them, and setXXX on them synchronously. The only synchronous part is the recognition process itself:

from vosk-browser.

Yahweasel avatar Yahweasel commented on June 12, 2024

The API of Vosk just takes a chunk at a time. That API is synchronous.

from vosk-browser.

msqr1 avatar msqr1 commented on June 12, 2024

I get it, but wouldn't that block itself from other actions? I can surely add acceptWaveformSync() that recognize (will block) on the same thread and return the result. Will that fit your use case? Ngl, a fully synchronous API, is even easier than the current one. I only need to translate it over without managing task queues and other stuff

from vosk-browser.

Yahweasel avatar Yahweasel commented on June 12, 2024

My case is that I have vosk-browser loaded in a Worker thread which is also responsible for echo cancellation, noise suppression, audio metrics, and encoding. Each of these steps takes raw Float32Array audio in and spits raw Float32Array audio out, and I want them all to be synchronous because I'm managing all the threading myself. What I mean when I say that your API is opinionated is that it's doing more than just vosk: it's handling capture, it's handling threading, it's handling formats. For some people, that's presumably very useful. For me, that's actively unhelpful.

Also, to be clear: you should not be writing your code to fit my use case if that doesn't help you in any way. I'm perfectly happy with vosk-browser, and have no urgent need for a more updated version, though as a general principle I'd like for things to be up to date. I'm only presenting my case on this thread because I was asked to.

from vosk-browser.

msqr1 avatar msqr1 commented on June 12, 2024

My case is that I have vosk-browser loaded in a Worker thread which is also responsible for echo cancellation, noise suppression, audio metrics, and encoding. Each of these steps takes raw Float32Array audio in and spits raw Float32Array audio out, and I want them all to be synchronous because I'm managing all the threading myself. What I mean when I say that your API is opinionated is that it's doing more than just vosk: it's handling capture, it's handling threading, it's handling formats.

No, I just want to find out how you use it, because I just want to see what use case would synchronous vosk be needed, so thanks for your information! The above really helped me learn!

from vosk-browser.

Yahweasel avatar Yahweasel commented on June 12, 2024

I can be totally precise: https://github.com/ennuicastr/ennuicastr/blob/3b3830fc979b039c245429a5ec7657594af4a705/awp/ennuicastr-worker.ts#L786

There's my call to acceptWaveformFloat :)

from vosk-browser.

msqr1 avatar msqr1 commented on June 12, 2024

I completely understand it now :)))))))

from vosk-browser.

msqr1 avatar msqr1 commented on June 12, 2024

@ccoreilly did you go over it?

from vosk-browser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.