saharmor / whisper-playground Goto Github PK

View Code? Open in Web Editor NEW

775.0 14.0 141.0 417 KB

Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/

License: MIT License

Python 54.94% HTML 2.42% CSS 0.80% JavaScript 39.63% Shell 2.21%

machine-learning speech-recognition speech-to-text whisper openai

whisper-playground's Issues

Possible to package this in a dockerfile?

Just wondering if it would be worth packing this in a docker file for easier installs and usage?

Dockerized the fullstack app

ensure it works on Windows and Mac

Transcribe the remaining audio data when using the real-time mode

When ending the stream, there may be remaining audio data that isn't long enough to be transcribed (following the pre-configured transcription timeout). If the audio data contains speech, it should be transcribed and sent back to the client as well.

Add VAD to avoid hallucinations with speechless batches

Speechless batches lead to hallucinations, especially in the real-time mode.

Three approaches:

Silero VAD for every chunk.
Use native VAD provided by faster-whisper.
Silero VAD for batch before transcription.

.... this is for future issues

....

Prevent errors with speechless batches

An error rises when stable-ts tries to suppress the silence in a speechless batch.

Code switching inputs

import whisper
model = whisper.load_model("large-v2")
result = model.transcribe('test_file/EN-ZH.wav')
print(result['text'])
The above audio contains a mix of English and Chinese language. But the result is just pure English only. How did you get code-switch output ? The same file played back in the demo gives the expected output. Pls tell me how it is done

How to add a new language in the interface?

Hello! Thanks for the whisper-playground! I'm having lots of fun :)

I'd like to add portuguese in the playground, is it possible?

Implement parameter validation in the client

Ensure the values are legal before sending to the server

How to investigate the problem of "No module named 'flask_cors'"?

Right after setting up per the instruction, in the same terminal of setting up, when executing
flask run --port 8000

I got the following error:
  File "/home/yshen/dev/whisper-playground/backend/app.py", line 5, in <module>
    from flask_cors import CORS
ModuleNotFoundError: No module named 'flask_cors'

but when I execute import flask_cors
in python interactive session, it worked.

(venv) yshen@ys-private:~/dev/whisper-playground/backend$ python
Python 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import flask_cors
>>>

How should I investigate the problem?

Error occurred during backend connection.

(whisper-playground) E:\whisper\whisper-playground>cd backend

(whisper-playground) E:\whisper\whisper-playground\backend>python server.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
======== Running on http://0.0.0.0:8000 ========
(Press CTRL+C to quit)
INFO:root:Client connected, initializing stream...
INFO:root:Stream configuration received: {'beamSize': '1', 'transcribeTimeout': '5', 'transcriptionMethod': 'real-time', 'model': 'small', 'language': 'english'}
WARNING:faster_whisper:An error occured while synchronizing the model guillaumekln/faster-whisper-small.en from the Hugging Face Hub:
Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.
WARNING:faster_whisper:Trying to load the model directly from the local cache, if it exists.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "E:\whisper\whisper-playground\backend\client_manager.py", line 15, in create_new_client
    new_client = initialize_client(sid, sio, config)
  File "E:\whisper\whisper-playground\backend\clients\utils.py", line 56, in initialize_client
    transcriber = WhisperTranscriber(model_name=whisper_model.value, language_code=language_code, beam_size=beam_size)
  File "E:\whisper\whisper-playground\backend\transcription\whisper_transcriber.py", line 29, in __init__
    self.model = WhisperModel(self.get_full_model_name(model_name, language_code), device=device,
  File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\site-packages\faster_whisper\transcribe.py", line 124, in __init__
    self.model = ctranslate2.models.Whisper(
RuntimeError: Unable to open file 'model.bin' in model 'C:\Users\admin\.cache\huggingface\hub\models--guillaumekln--faster-whisper-small.en\snapshots\4e49ce629e3fa4c3da596c602b212cb026910443'

"The web page is using the default settings with the 'small' model in English for real-time usage. However, when connecting to the backend, an error occurs. The backend seems to be using a previous version of the automatically downloaded model."

Add auto-scroll for transcribed data in client

When the transcriptions extend beyond the maximum width, a scrollbar appears but the user must manually scroll down to see the new transcriptions.

Online demo is down

Self-explanatory. Online demo is down.

https://whisperui.monsterapi.ai/

Multiple errors: missing python dev, portaudio and venv

My box:

OS: Ubuntu 20.04.5 LTS x86_64
Kernel: 5.4.0-125-generic
Packages: 3901 (dpkg), 30 (flatpak), 37 (snap)
Shell: bash 5.0.17
Python 3.8.10

Your instructions throw many compilation errors:

The python3-dev package is missing:

30    | #include "Python.h"
        |          ^~~~~~~~~~
  compilation terminated.
  error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1

->
sudo apt-get install python3-dev # for python3.x installs

PortAudio does not compile:

   31 | #include "portaudio.h"
        |          ^~~~~~~~~~~~~
  compilation terminated.
  error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1

Solved only with manual compilation of http://files.portaudio.com/download.html, i.e.: ProgrammingHero1/romantic-alexa#88

Virtual env is playing up:
cd backend && source venv/bin/activate && flask run --port 8000
throws
bash: venv/bin/activate: No such file or directory

I have tried playing with this tip: https://trendoceans.com/how-to-resolve-venv-bin-activate-is-not-executable-by-this-user/ but then gave up.

Manage speakers manually for real-time refinement & avoiding speaker swapping

Managing speakers manually will allow to signal duplicate speaker detections to the client when using the real-time mode, and to mitigate speaker swappings when using the sequential mode.

Backend error : AttributeError: module 'whisper' has no attribute 'load_model'

AttributeError: module 'whisper' has no attribute 'load_model'
127.0.0.1 - - [15/Dec/2022 13:17:56] "POST /transcribe HTTP/1.1" 500 -
[2022-12-15 13:18:06,805] ERROR in app: Exception on /transcribe [POST]
Traceback (most recent call last):
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/app.py", line 21, in transcribe
    audio_model = whisper.load_model(model)
AttributeError: module 'whisper' has no attribute 'load_model'
127.0.0.1 - - [15/Dec/2022 13:18:06] "POST /transcribe HTTP/1.1" 500 -

I'm getting this error when I follow the Readme steps, i have tried making changes in the App.js and package.json stumbling on another issue claiming it to be due to CORS,

But the backend seems broken, any help is appreciated, I probably have missed some package/model dependency

Thanks in advance!

Stucking on three purple dots

Hi, I can't get it to work,

When I press "Start transcribing" button and say something, it keep loading and can't press to Stop the transcribing.

`$ sh install_playground.sh
yarn install v1.22.19
[1/4] Resolving packages...
success Already up-to-date.
Done in 0.41s.
Requirement already satisfied: wheel in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (0.37.1)

[notice] A new release of pip available: 22.2.2 -> 22.3
[notice] To update, run: python.exe -m pip install --upgrade pip
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
Collecting git+https://github.com/openai/whisper.git (from -r requirements.txt (line 8))
Cloning https://github.com/openai/whisper.git to c:\users\lnwry\appdata\local\temp\pip-req-build-kva5bjjp
Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git 'C:\Users\lnwry\AppData\Local\Temp\pip-req-build-kva5bjjp'
Resolved https://github.com/openai/whisper.git to commit d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Requirement already satisfied: numpy in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 1)) (1.23.4)
Requirement already satisfied: tqdm in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 2)) (4.64.1)
Requirement already satisfied: transformers>=4.19.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 3)) (4.23.1)
Requirement already satisfied: ffmpeg-python==0.2.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 4)) (0.2.0)
Requirement already satisfied: pyaudio in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 5)) (0.2.12)
Requirement already satisfied: SpeechRecognition in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 6)) (3.8.1)
Requirement already satisfied: pydub in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 7)) (0.25.1)
Requirement already satisfied: torch in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 10)) (1.12.1+cu113)
Requirement already satisfied: flask in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 11)) (2.2.2)
Requirement already satisfied: flask_cors in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 12)) (3.0.10)
Requirement already satisfied: future in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from ffmpeg-python==0.2.0->-r requirements.txt (line 4)) (0.18.2)
Requirement already satisfied: colorama in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from tqdm->-r requirements.txt (line 2)) (0.4.5)
Requirement already satisfied: regex!=2019.12.17 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (2022.9.13)
Requirement already satisfied: pyyaml>=5.1 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (6.0)
Requirement already satisfied: requests in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (2.28.1)
Requirement already satisfied: packaging>=20.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (21.3)
Requirement already satisfied: filelock in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (3.8.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (0.10.1)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (0.13.1)
Requirement already satisfied: more-itertools in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from whisper==1.0->-r requirements.txt (line 8)) (8.14.0)
Requirement already satisfied: typing-extensions in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from torch->-r requirements.txt (line 10)) (4.4.0)
Requirement already satisfied: itsdangerous>=2.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (2.1.2)
Requirement already satisfied: click>=8.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (8.1.3)
Requirement already satisfied: Jinja2>=3.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (3.1.2)
Requirement already satisfied: Werkzeug>=2.2.2 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (2.2.2)
Requirement already satisfied: Six in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask_cors->-r requirements.txt (line 12)) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from Jinja2>=3.0->flask->-r requirements.txt (line 11)) (2.1.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from packaging>=20.0->transformers>=4.19.0->-r requirements.txt (line 3)) (3.0.9)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (1.26.12)
Requirement already satisfied: idna<4,>=2.5 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (2022.9.24)
Requirement already satisfied: charset-normalizer<3,>=2 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (2.1.1)

[notice] A new release of pip available: 22.2.2 -> 22.3
[notice] To update, run: python.exe -m pip install --upgrade pip
`

`yarn start` error

When executing cd interface && yarn start
in a newly opened terminal, I got the error:

Starting the development server...

Error: error:0308010C:digital envelope routines::unsupported
    at new Hash (node:internal/crypto/hash:71:19)
    at Object.createHash (node:crypto:133:10)
    at module.exports (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/util/createHash.js:135:53)
    at NormalModule._initBuildHash (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:417:16)
    at handleParseError (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:471:10)
    at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:503:5
    at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:358:12
    at /home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:373:3
    at iterateNormalLoaders (/home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:214:10)
    at iterateNormalLoaders (/home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:221:10)
/home/yshen/dev/whisper-playground/interface/node_modules/react-scripts/scripts/start.js:19
  throw err;
  ^

Error: error:0308010C:digital envelope routines::unsupported
    at new Hash (node:internal/crypto/hash:71:19)
    at Object.createHash (node:crypto:133:10)
    at module.exports (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/util/createHash.js:135:53)
    at NormalModule._initBuildHash (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:417:16)
    at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:452:10
    at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:323:13
    at /home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:367:11
    at /home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:233:18
    at context.callback (/home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:111:13)
    at /home/yshen/dev/whisper-playground/interface/node_modules/react-scripts/node_modules/babel-loader/lib/index.js:59:103 {
  opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
  library: 'digital envelope routines',
  reason: 'unsupported',
  code: 'ERR_OSSL_EVP_UNSUPPORTED'
}

Node.js v18.12.1
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

I also noticed when executing yarn start the second time, a web page is opened, at URL:http://localhost:3000
with an error message:

Hmmm… can't reach this pagelocalhost refused to connect.
Try:

Search the web for [localhost](https://www.bing.com/search?form=ANLKDR&q=localhost)
Checking the connection
[Checking the proxy and the firewall](chrome-error://chromewebdata/#buttons)
ERR_CONNECTION_REFUSED

The backend is not available on Windows.

(whisper) D:\>cd whisper

(whisper) D:\whisper>cd whisper-playground

(whisper) D:\whisper\whisper-playground>cd backend

(whisper) D:\whisper\whisper-playground\backend>python server.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Traceback (most recent call last):
  File "D:\whisper\whisper-playground\backend\server.py", line 5, in <module>
    from backend.client_manager import ClientManager
ModuleNotFoundError: No module named 'backend'

I'm not sure if it's due to the runtime environment.

Implement multi-client support

Only one client can connect at a time due to the transcription calls not being thread-safe. Ensure multiple clients can connect at once.

sh: 1: react-scripts: not found

Hi, when I run npm run start I get:

sh: 1: react-scripts: not found

Then I run npm i, I get this:

angel@PCLX:~/Descargas/whisper-playground-main/interface$ npm i
Debugger attached.
npm ERR! code ERESOLVE
npm ERR! ERESOLVE unable to resolve dependency tree
npm ERR! 
npm ERR! While resolving: [email protected]
npm ERR! Found: [email protected]
npm ERR! node_modules/react
npm ERR!   react@"^18.2.0" from the root project
npm ERR! 
npm ERR! Could not resolve dependency:
npm ERR! peer react@"^16.8.0 || ^17.0.0" from @material-ui/[email protected]
npm ERR! node_modules/@material-ui/core
npm ERR!   @material-ui/core@"^4.11.4" from the root project
npm ERR! 
npm ERR! Fix the upstream dependency conflict, or retry
npm ERR! this command with --force, or --legacy-peer-deps
npm ERR! to accept an incorrect (and potentially broken) dependency resolution.
npm ERR! 
npm ERR! See /home/angel/.npm/eresolve-report.txt for a full report.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/angel/.npm/_logs/2023-02-08T13_55_23_214Z-debug-0.log
Waiting for the debugger to disconnect...

How can I fix this?

Pyaudio

Looks like a great project!
During the Install the backend and frontend environmet sh install_playground.sh I'm getting an error:

BEST DIART PARAMETERS IN GENERAL TO USE FOR BETTER DIARIZATION RESULTS

I was wondering if there are some general/best parameters that can be used in DIART config , in order to achieve better diarization results.
In my case I played a audio having two speakers , male and female, but the system kept on increasing the speaker count as the audio progressed.

The audio was of US accent so I think that it should at least work fine for those particular audios.

Any suggestions there to improve the accuracy of diarization would be helpful.

Use whisper.cpp for faster CPU inference

https://github.com/ggerganov/whisper.cpp#real-time-audio-input-example

USING THIS SETUP ON CPU

Hi, just wanted to ask if it is possible to use this repo using CPU , if yes then do I need to make any specific changes to the code or any other additional step. OS: Ubuntu 20 LTS and python 3.8

Implement status bar

Note that the client is connecting to the server when starting the stream
Note the delay for the first transcription with the real-time mode
Note that the final transcription is in progress when ending the stream
Note that the stream has ended

Person A: Hi
Person B: Hello, how are you
Person A: I'm good, and you?
....

saharmor / whisper-playground Goto Github PK

whisper-playground's Issues

Recommend Projects

Recommend Topics

Recommend Org