Comments (21)
Hi @msqr1 ! Great initiative :) software evolves and needs to be maintained. I do not have time to dedicate to this repository so it is good that better alternatives surge and gain traction.
I'll have a deeper look at your work later this week. In the end, users decide based on the developer experience and the features of these libraries so I'd be interested on what other users like @Yahweasel or @erikh2000 think.
from vosk-browser.
The core thing I need out of vosk-browser is to not have an AudioContext-level API. I do all of my own audio capturing and ten other layers of processing. Further, although in my own project I do use threads, so SharedArrayBuffer is a nonissue, it's valuable to have a version that runs synchronously, because some users (including myself) manage their own threads. I would rather have a vosk running synchronously with a Worker thread I created on my own than running asynchronously with a Worker thread created by a library. To excessively toot my own horn, my own libav.js allows the user to load it in a synchronous mode, a worker mode, or a threaded mode, and provides the same API in all three.
Basically: I wouldn't mind a more up-to-date vosk adapter, but as stands, your API is too opinionated for me.
from vosk-browser.
You're right, I try to make this as easy to use as possible, just some minimal setup and you can start recognizing. I agree that more features should be added, but as this is the first version, I want to make it as fast and easy to setup as possible. Other use cases can be addressed later.
from vosk-browser.
@msqr1 I'm interested in your project, but I'm likely to stick with vosk-browser out of inertia and not having any complaints with it. The main thing I saw in Vosklet that I'd like to see in vosk-browser, if practical, is more of the Vosk functions exposed. I had told myself that at some point I'd get vosk-browser building and try to contribute that myself, but I never got around to it.
The faster processing time is intriguing too. What kind of metrics are you seeing?
from vosk-browser.
I didn't really measured it, ngl, so maybe I should remove that line. But, I moved hot computations to c++ like free
, mapping input data, I also use a simpler mechanism to communicate between js and c++, I used the faster new emscripten wasmfs, I used the new emmalloc, I turned on o3, lto, simd, non trapping float to int and many more... As such, I think it should be faster. You're right, I shouldn't claim anything without benchmarks.
from vosk-browser.
No worries, @msqr1. I don't expect you to be super-scientific in your claims. I was just curious about what kind of speed increase you might be seeing. Your changes for performance seem promising.
from vosk-browser.
FYI, simd will do not a damned thing (other than make it not work on Safari) unless the code is specifically written to use it. wasm simd is broadly compatible with x86 simd, but only the C API, and nobody uses the C API. I would be stunned to learn that that's gaining you anything. I had a simd version of libav.js for years and finally ditched it because it wasn't actually beneficial.
from vosk-browser.
Well, the thing is kaldi just refuses to compile with simd off, so I have to turn it on. It may or may not do anything though.
from vosk-browser.
Oh, well that's just lovely X-D
from vosk-browser.
Just curious, how do you use a speech recognition library with your libav project? Isn't that for audio formats?
from vosk-browser.
I do not. I use both in Ennuicastr.
from vosk-browser.
I can make a sync version, I just don't know how it is possible. If you block the current thread to recognize, how do you stop it? Synchronous model and recognizer loading should be easy. I'm not sure about the recognizer loop.
from vosk-browser.
I can make a sync version, I just don't know how it is possible. If you block the current thread to recognize, how do you stop it? Synchronous model and recognizer loading should be easy. I'm not sure about the recognizer loop.
We're on an issue submitted to a synchronous version of the same API ;)
from vosk-browser.
The recognizer, I can't see how it is synchronous? It can't be blocking the one thread that is controlling itself.
Can I take a look at the issue? Maybe there is something I can do. Keep in mind that even if the recognizer is asynchronous, you can bind event listener to them, and setXXX on them synchronously. The only synchronous part is the recognition process itself:
from vosk-browser.
The API of Vosk just takes a chunk at a time. That API is synchronous.
from vosk-browser.
I get it, but wouldn't that block itself from other actions? I can surely add acceptWaveformSync() that recognize (will block) on the same thread and return the result. Will that fit your use case? Ngl, a fully synchronous API, is even easier than the current one. I only need to translate it over without managing task queues and other stuff
from vosk-browser.
My case is that I have vosk-browser loaded in a Worker thread which is also responsible for echo cancellation, noise suppression, audio metrics, and encoding. Each of these steps takes raw Float32Array audio in and spits raw Float32Array audio out, and I want them all to be synchronous because I'm managing all the threading myself. What I mean when I say that your API is opinionated is that it's doing more than just vosk: it's handling capture, it's handling threading, it's handling formats. For some people, that's presumably very useful. For me, that's actively unhelpful.
Also, to be clear: you should not be writing your code to fit my use case if that doesn't help you in any way. I'm perfectly happy with vosk-browser, and have no urgent need for a more updated version, though as a general principle I'd like for things to be up to date. I'm only presenting my case on this thread because I was asked to.
from vosk-browser.
My case is that I have vosk-browser loaded in a Worker thread which is also responsible for echo cancellation, noise suppression, audio metrics, and encoding. Each of these steps takes raw Float32Array audio in and spits raw Float32Array audio out, and I want them all to be synchronous because I'm managing all the threading myself. What I mean when I say that your API is opinionated is that it's doing more than just vosk: it's handling capture, it's handling threading, it's handling formats.
No, I just want to find out how you use it, because I just want to see what use case would synchronous vosk be needed, so thanks for your information! The above really helped me learn!
from vosk-browser.
I can be totally precise: https://github.com/ennuicastr/ennuicastr/blob/3b3830fc979b039c245429a5ec7657594af4a705/awp/ennuicastr-worker.ts#L786
There's my call to acceptWaveformFloat :)
from vosk-browser.
I completely understand it now :)))))))
from vosk-browser.
@ccoreilly did you go over it?
from vosk-browser.
Related Issues (20)
- Result event not triggered on file upload HOT 4
- Build output location HOT 2
- Delays when transcribing streaming audio HOT 4
- information available in the User Agent string will be reduced
- can not build vosk-browser HOT 1
- View timing of words/phonemes? HOT 1
- Can't run HOT 14
- Can't build HOT 10
- Add SetMaxAlternatives
- Word confidence value is always 1...
- Two problems when using vosk-browser with non-streaming, separated static waveforms HOT 2
- Malayalam Model is not loading HOT 11
- Attempting to pass data to the KaldiRecognizer results in an odd internal error HOT 4
- react example process undefined HOT 2
- Is the demo working? HOT 1
- having problems running make (using wsl/ubuntu on windows)
- How to run vosk-browser instantly?
- Library Documentation and Numbers as non text HOT 4
- Get ready state of recognizer HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vosk-browser.