saharmor / whisper-playground Goto Github PK
View Code? Open in Web Editor NEWBuild real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/
License: MIT License
Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/
License: MIT License
Just wondering if it would be worth packing this in a docker file for easier installs and usage?
When ending the stream, there may be remaining audio data that isn't long enough to be transcribed (following the pre-configured transcription timeout). If the audio data contains speech, it should be transcribed and sent back to the client as well.
Speechless batches lead to hallucinations, especially in the real-time mode.
Three approaches:
....
An error rises when stable-ts tries to suppress the silence in a speechless batch.
import whisper
model = whisper.load_model("large-v2")
result = model.transcribe('test_file/EN-ZH.wav')
print(result['text'])
The above audio contains a mix of English and Chinese language. But the result is just pure English only. How did you get code-switch output ? The same file played back in the demo gives the expected output. Pls tell me how it is done
Hello! Thanks for the whisper-playground! I'm having lots of fun :)
I'd like to add portuguese in the playground, is it possible?
Ensure the values are legal before sending to the server
Right after setting up per the instruction, in the same terminal of setting up, when executing
flask run --port 8000
I got the following error:
File "/home/yshen/dev/whisper-playground/backend/app.py", line 5, in <module>
from flask_cors import CORS
ModuleNotFoundError: No module named 'flask_cors'
but when I execute import flask_cors
in python interactive session, it worked.
(venv) yshen@ys-private:~/dev/whisper-playground/backend$ python
Python 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import flask_cors
>>>
How should I investigate the problem?
(whisper-playground) E:\whisper\whisper-playground>cd backend
(whisper-playground) E:\whisper\whisper-playground\backend>python server.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
======== Running on http://0.0.0.0:8000 ========
(Press CTRL+C to quit)
INFO:root:Client connected, initializing stream...
INFO:root:Stream configuration received: {'beamSize': '1', 'transcribeTimeout': '5', 'transcriptionMethod': 'real-time', 'model': 'small', 'language': 'english'}
WARNING:faster_whisper:An error occured while synchronizing the model guillaumekln/faster-whisper-small.en from the Hugging Face Hub:
Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.
WARNING:faster_whisper:Trying to load the model directly from the local cache, if it exists.
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\threading.py", line 932, in _bootstrap_inner
self.run()
File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "E:\whisper\whisper-playground\backend\client_manager.py", line 15, in create_new_client
new_client = initialize_client(sid, sio, config)
File "E:\whisper\whisper-playground\backend\clients\utils.py", line 56, in initialize_client
transcriber = WhisperTranscriber(model_name=whisper_model.value, language_code=language_code, beam_size=beam_size)
File "E:\whisper\whisper-playground\backend\transcription\whisper_transcriber.py", line 29, in __init__
self.model = WhisperModel(self.get_full_model_name(model_name, language_code), device=device,
File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\site-packages\faster_whisper\transcribe.py", line 124, in __init__
self.model = ctranslate2.models.Whisper(
RuntimeError: Unable to open file 'model.bin' in model 'C:\Users\admin\.cache\huggingface\hub\models--guillaumekln--faster-whisper-small.en\snapshots\4e49ce629e3fa4c3da596c602b212cb026910443'
"The web page is using the default settings with the 'small' model in English for real-time usage. However, when connecting to the backend, an error occurs. The backend seems to be using a previous version of the automatically downloaded model."
When the transcriptions extend beyond the maximum width, a scrollbar appears but the user must manually scroll down to see the new transcriptions.
My box:
OS: Ubuntu 20.04.5 LTS x86_64
Kernel: 5.4.0-125-generic
Packages: 3901 (dpkg), 30 (flatpak), 37 (snap)
Shell: bash 5.0.17
Python 3.8.10
Your instructions throw many compilation errors:
30 | #include "Python.h"
| ^~~~~~~~~~
compilation terminated.
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
->
sudo apt-get install python3-dev # for python3.x installs
31 | #include "portaudio.h"
| ^~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
Solved only with manual compilation of http://files.portaudio.com/download.html, i.e.: ProgrammingHero1/romantic-alexa#88
cd backend && source venv/bin/activate && flask run --port 8000
bash: venv/bin/activate: No such file or directory
I have tried playing with this tip: https://trendoceans.com/how-to-resolve-venv-bin-activate-is-not-executable-by-this-user/ but then gave up.
Managing speakers manually will allow to signal duplicate speaker detections to the client when using the real-time mode, and to mitigate speaker swappings when using the sequential mode.
AttributeError: module 'whisper' has no attribute 'load_model'
127.0.0.1 - - [15/Dec/2022 13:17:56] "POST /transcribe HTTP/1.1" 500 -
[2022-12-15 13:18:06,805] ERROR in app: Exception on /transcribe [POST]
Traceback (most recent call last):
File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask_cors/extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/Users/omkar.kadam/Desktop/whisper-playground/backend/app.py", line 21, in transcribe
audio_model = whisper.load_model(model)
AttributeError: module 'whisper' has no attribute 'load_model'
127.0.0.1 - - [15/Dec/2022 13:18:06] "POST /transcribe HTTP/1.1" 500 -
I'm getting this error when I follow the Readme steps, i have tried making changes in the App.js and package.json stumbling on another issue claiming it to be due to CORS,
But the backend seems broken, any help is appreciated, I probably have missed some package/model dependency
Thanks in advance!
Hi, I can't get it to work,
When I press "Start transcribing" button and say something, it keep loading and can't press to Stop the transcribing.
`$ sh install_playground.sh
yarn install v1.22.19
[1/4] Resolving packages...
success Already up-to-date.
Done in 0.41s.
Requirement already satisfied: wheel in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (0.37.1)
[notice] A new release of pip available: 22.2.2 -> 22.3
[notice] To update, run: python.exe -m pip install --upgrade pip
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
Collecting git+https://github.com/openai/whisper.git (from -r requirements.txt (line 8))
Cloning https://github.com/openai/whisper.git to c:\users\lnwry\appdata\local\temp\pip-req-build-kva5bjjp
Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git 'C:\Users\lnwry\AppData\Local\Temp\pip-req-build-kva5bjjp'
Resolved https://github.com/openai/whisper.git to commit d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Requirement already satisfied: numpy in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 1)) (1.23.4)
Requirement already satisfied: tqdm in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 2)) (4.64.1)
Requirement already satisfied: transformers>=4.19.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 3)) (4.23.1)
Requirement already satisfied: ffmpeg-python==0.2.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 4)) (0.2.0)
Requirement already satisfied: pyaudio in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 5)) (0.2.12)
Requirement already satisfied: SpeechRecognition in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 6)) (3.8.1)
Requirement already satisfied: pydub in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 7)) (0.25.1)
Requirement already satisfied: torch in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 10)) (1.12.1+cu113)
Requirement already satisfied: flask in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 11)) (2.2.2)
Requirement already satisfied: flask_cors in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 12)) (3.0.10)
Requirement already satisfied: future in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from ffmpeg-python==0.2.0->-r requirements.txt (line 4)) (0.18.2)
Requirement already satisfied: colorama in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from tqdm->-r requirements.txt (line 2)) (0.4.5)
Requirement already satisfied: regex!=2019.12.17 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (2022.9.13)
Requirement already satisfied: pyyaml>=5.1 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (6.0)
Requirement already satisfied: requests in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (2.28.1)
Requirement already satisfied: packaging>=20.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (21.3)
Requirement already satisfied: filelock in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (3.8.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (0.10.1)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (0.13.1)
Requirement already satisfied: more-itertools in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from whisper==1.0->-r requirements.txt (line 8)) (8.14.0)
Requirement already satisfied: typing-extensions in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from torch->-r requirements.txt (line 10)) (4.4.0)
Requirement already satisfied: itsdangerous>=2.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (2.1.2)
Requirement already satisfied: click>=8.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (8.1.3)
Requirement already satisfied: Jinja2>=3.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (3.1.2)
Requirement already satisfied: Werkzeug>=2.2.2 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (2.2.2)
Requirement already satisfied: Six in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask_cors->-r requirements.txt (line 12)) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from Jinja2>=3.0->flask->-r requirements.txt (line 11)) (2.1.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from packaging>=20.0->transformers>=4.19.0->-r requirements.txt (line 3)) (3.0.9)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (1.26.12)
Requirement already satisfied: idna<4,>=2.5 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (2022.9.24)
Requirement already satisfied: charset-normalizer<3,>=2 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (2.1.1)
[notice] A new release of pip available: 22.2.2 -> 22.3
[notice] To update, run: python.exe -m pip install --upgrade pip
`
When executing cd interface && yarn start
in a newly opened terminal, I got the error:
Starting the development server...
Error: error:0308010C:digital envelope routines::unsupported
at new Hash (node:internal/crypto/hash:71:19)
at Object.createHash (node:crypto:133:10)
at module.exports (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/util/createHash.js:135:53)
at NormalModule._initBuildHash (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:417:16)
at handleParseError (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:471:10)
at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:503:5
at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:358:12
at /home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:373:3
at iterateNormalLoaders (/home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:214:10)
at iterateNormalLoaders (/home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:221:10)
/home/yshen/dev/whisper-playground/interface/node_modules/react-scripts/scripts/start.js:19
throw err;
^
Error: error:0308010C:digital envelope routines::unsupported
at new Hash (node:internal/crypto/hash:71:19)
at Object.createHash (node:crypto:133:10)
at module.exports (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/util/createHash.js:135:53)
at NormalModule._initBuildHash (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:417:16)
at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:452:10
at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:323:13
at /home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:367:11
at /home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:233:18
at context.callback (/home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:111:13)
at /home/yshen/dev/whisper-playground/interface/node_modules/react-scripts/node_modules/babel-loader/lib/index.js:59:103 {
opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
library: 'digital envelope routines',
reason: 'unsupported',
code: 'ERR_OSSL_EVP_UNSUPPORTED'
}
Node.js v18.12.1
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
I also noticed when executing yarn start
the second time, a web page is opened, at URL:http://localhost:3000
with an error message:
Hmmm… can't reach this pagelocalhost refused to connect.
Try:
Search the web for [localhost](https://www.bing.com/search?form=ANLKDR&q=localhost)
Checking the connection
[Checking the proxy and the firewall](chrome-error://chromewebdata/#buttons)
ERR_CONNECTION_REFUSED
(whisper) D:\>cd whisper
(whisper) D:\whisper>cd whisper-playground
(whisper) D:\whisper\whisper-playground>cd backend
(whisper) D:\whisper\whisper-playground\backend>python server.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Traceback (most recent call last):
File "D:\whisper\whisper-playground\backend\server.py", line 5, in <module>
from backend.client_manager import ClientManager
ModuleNotFoundError: No module named 'backend'
I'm not sure if it's due to the runtime environment.
Only one client can connect at a time due to the transcription calls not being thread-safe. Ensure multiple clients can connect at once.
Hi, when I run npm run start I get:
sh: 1: react-scripts: not found
Then I run npm i, I get this:
angel@PCLX:~/Descargas/whisper-playground-main/interface$ npm i
Debugger attached.
npm ERR! code ERESOLVE
npm ERR! ERESOLVE unable to resolve dependency tree
npm ERR!
npm ERR! While resolving: [email protected]
npm ERR! Found: [email protected]
npm ERR! node_modules/react
npm ERR! react@"^18.2.0" from the root project
npm ERR!
npm ERR! Could not resolve dependency:
npm ERR! peer react@"^16.8.0 || ^17.0.0" from @material-ui/[email protected]
npm ERR! node_modules/@material-ui/core
npm ERR! @material-ui/core@"^4.11.4" from the root project
npm ERR!
npm ERR! Fix the upstream dependency conflict, or retry
npm ERR! this command with --force, or --legacy-peer-deps
npm ERR! to accept an incorrect (and potentially broken) dependency resolution.
npm ERR!
npm ERR! See /home/angel/.npm/eresolve-report.txt for a full report.
npm ERR! A complete log of this run can be found in:
npm ERR! /home/angel/.npm/_logs/2023-02-08T13_55_23_214Z-debug-0.log
Waiting for the debugger to disconnect...
How can I fix this?
I was wondering if there are some general/best parameters that can be used in DIART config , in order to achieve better diarization results.
In my case I played a audio having two speakers , male and female, but the system kept on increasing the speaker count as the audio progressed.
The audio was of US accent so I think that it should at least work fine for those particular audios.
Any suggestions there to improve the accuracy of diarization would be helpful.
Hi, just wanted to ask if it is possible to use this repo using CPU , if yes then do I need to make any specific changes to the code or any other additional step. OS: Ubuntu 20 LTS and python 3.8
Note that the client is connecting to the server when starting the stream
Note the delay for the first transcription with the real-time mode
Note that the final transcription is in progress when ending the stream
Note that the stream has ended
cc @ethanzrd
Logic will be to combine Whisper + pyannote.audio based on timestamps to output something along the lines of:
Person A: Hi
Person B: Hello, how are you
Person A: I'm good, and you?
....
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.