Git Product home page Git Product logo

whisper-playground's Issues

Code switching inputs

import whisper
model = whisper.load_model("large-v2")
result = model.transcribe('test_file/EN-ZH.wav')
print(result['text'])
The above audio contains a mix of English and Chinese language. But the result is just pure English only. How did you get code-switch output ? The same file played back in the demo gives the expected output. Pls tell me how it is done

How to investigate the problem of "No module named 'flask_cors'"?

Right after setting up per the instruction, in the same terminal of setting up, when executing
flask run --port 8000

I got the following error:
  File "/home/yshen/dev/whisper-playground/backend/app.py", line 5, in <module>
    from flask_cors import CORS
ModuleNotFoundError: No module named 'flask_cors'

but when I execute import flask_cors
in python interactive session, it worked.

(venv) yshen@ys-private:~/dev/whisper-playground/backend$ python
Python 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import flask_cors
>>>

How should I investigate the problem?

Error occurred during backend connection.

(whisper-playground) E:\whisper\whisper-playground>cd backend

(whisper-playground) E:\whisper\whisper-playground\backend>python server.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
======== Running on http://0.0.0.0:8000 ========
(Press CTRL+C to quit)
INFO:root:Client connected, initializing stream...
INFO:root:Stream configuration received: {'beamSize': '1', 'transcribeTimeout': '5', 'transcriptionMethod': 'real-time', 'model': 'small', 'language': 'english'}
WARNING:faster_whisper:An error occured while synchronizing the model guillaumekln/faster-whisper-small.en from the Hugging Face Hub:
Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.
WARNING:faster_whisper:Trying to load the model directly from the local cache, if it exists.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "E:\whisper\whisper-playground\backend\client_manager.py", line 15, in create_new_client
    new_client = initialize_client(sid, sio, config)
  File "E:\whisper\whisper-playground\backend\clients\utils.py", line 56, in initialize_client
    transcriber = WhisperTranscriber(model_name=whisper_model.value, language_code=language_code, beam_size=beam_size)
  File "E:\whisper\whisper-playground\backend\transcription\whisper_transcriber.py", line 29, in __init__
    self.model = WhisperModel(self.get_full_model_name(model_name, language_code), device=device,
  File "C:\ProgramData\Anaconda3\envs\whisper-playground\lib\site-packages\faster_whisper\transcribe.py", line 124, in __init__
    self.model = ctranslate2.models.Whisper(
RuntimeError: Unable to open file 'model.bin' in model 'C:\Users\admin\.cache\huggingface\hub\models--guillaumekln--faster-whisper-small.en\snapshots\4e49ce629e3fa4c3da596c602b212cb026910443'

"The web page is using the default settings with the 'small' model in English for real-time usage. However, when connecting to the backend, an error occurs. The backend seems to be using a previous version of the automatically downloaded model."

Multiple errors: missing python dev, portaudio and venv

My box:

OS: Ubuntu 20.04.5 LTS x86_64
Kernel: 5.4.0-125-generic
Packages: 3901 (dpkg), 30 (flatpak), 37 (snap)
Shell: bash 5.0.17
Python 3.8.10

Your instructions throw many compilation errors:

  1. The python3-dev package is missing:
30    | #include "Python.h"
        |          ^~~~~~~~~~
  compilation terminated.
  error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1

->
sudo apt-get install python3-dev # for python3.x installs

  1. PortAudio does not compile:
   31 | #include "portaudio.h"
        |          ^~~~~~~~~~~~~
  compilation terminated.
  error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1

Solved only with manual compilation of http://files.portaudio.com/download.html, i.e.: ProgrammingHero1/romantic-alexa#88

  1. Virtual env is playing up:
    cd backend && source venv/bin/activate && flask run --port 8000
    throws
    bash: venv/bin/activate: No such file or directory

I have tried playing with this tip: https://trendoceans.com/how-to-resolve-venv-bin-activate-is-not-executable-by-this-user/ but then gave up.

Backend error : AttributeError: module 'whisper' has no attribute 'load_model'

AttributeError: module 'whisper' has no attribute 'load_model'
127.0.0.1 - - [15/Dec/2022 13:17:56] "POST /transcribe HTTP/1.1" 500 -
[2022-12-15 13:18:06,805] ERROR in app: Exception on /transcribe [POST]
Traceback (most recent call last):
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/venv/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/Users/omkar.kadam/Desktop/whisper-playground/backend/app.py", line 21, in transcribe
    audio_model = whisper.load_model(model)
AttributeError: module 'whisper' has no attribute 'load_model'
127.0.0.1 - - [15/Dec/2022 13:18:06] "POST /transcribe HTTP/1.1" 500 - 

I'm getting this error when I follow the Readme steps, i have tried making changes in the App.js and package.json stumbling on another issue claiming it to be due to CORS,

But the backend seems broken, any help is appreciated, I probably have missed some package/model dependency

Thanks in advance!

Stucking on three purple dots

Hi, I can't get it to work,

When I press "Start transcribing" button and say something, it keep loading and can't press to Stop the transcribing.

image

`$ sh install_playground.sh
yarn install v1.22.19
[1/4] Resolving packages...
success Already up-to-date.
Done in 0.41s.
Requirement already satisfied: wheel in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (0.37.1)

[notice] A new release of pip available: 22.2.2 -> 22.3
[notice] To update, run: python.exe -m pip install --upgrade pip
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
Collecting git+https://github.com/openai/whisper.git (from -r requirements.txt (line 8))
Cloning https://github.com/openai/whisper.git to c:\users\lnwry\appdata\local\temp\pip-req-build-kva5bjjp
Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git 'C:\Users\lnwry\AppData\Local\Temp\pip-req-build-kva5bjjp'
Resolved https://github.com/openai/whisper.git to commit d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Requirement already satisfied: numpy in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 1)) (1.23.4)
Requirement already satisfied: tqdm in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 2)) (4.64.1)
Requirement already satisfied: transformers>=4.19.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 3)) (4.23.1)
Requirement already satisfied: ffmpeg-python==0.2.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 4)) (0.2.0)
Requirement already satisfied: pyaudio in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 5)) (0.2.12)
Requirement already satisfied: SpeechRecognition in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 6)) (3.8.1)
Requirement already satisfied: pydub in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 7)) (0.25.1)
Requirement already satisfied: torch in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 10)) (1.12.1+cu113)
Requirement already satisfied: flask in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 11)) (2.2.2)
Requirement already satisfied: flask_cors in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from -r requirements.txt (line 12)) (3.0.10)
Requirement already satisfied: future in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from ffmpeg-python==0.2.0->-r requirements.txt (line 4)) (0.18.2)
Requirement already satisfied: colorama in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from tqdm->-r requirements.txt (line 2)) (0.4.5)
Requirement already satisfied: regex!=2019.12.17 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (2022.9.13)
Requirement already satisfied: pyyaml>=5.1 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (6.0)
Requirement already satisfied: requests in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (2.28.1)
Requirement already satisfied: packaging>=20.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (21.3)
Requirement already satisfied: filelock in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (3.8.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (0.10.1)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from transformers>=4.19.0->-r requirements.txt (line 3)) (0.13.1)
Requirement already satisfied: more-itertools in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from whisper==1.0->-r requirements.txt (line 8)) (8.14.0)
Requirement already satisfied: typing-extensions in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from torch->-r requirements.txt (line 10)) (4.4.0)
Requirement already satisfied: itsdangerous>=2.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (2.1.2)
Requirement already satisfied: click>=8.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (8.1.3)
Requirement already satisfied: Jinja2>=3.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (3.1.2)
Requirement already satisfied: Werkzeug>=2.2.2 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask->-r requirements.txt (line 11)) (2.2.2)
Requirement already satisfied: Six in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from flask_cors->-r requirements.txt (line 12)) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from Jinja2>=3.0->flask->-r requirements.txt (line 11)) (2.1.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from packaging>=20.0->transformers>=4.19.0->-r requirements.txt (line 3)) (3.0.9)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (1.26.12)
Requirement already satisfied: idna<4,>=2.5 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (2022.9.24)
Requirement already satisfied: charset-normalizer<3,>=2 in e:\vrchat mat\vtube#whisper\whisper-playground\backend\venv\lib\site-packages (from requests->transformers>=4.19.0->-r requirements.txt (line 3)) (2.1.1)

[notice] A new release of pip available: 22.2.2 -> 22.3
[notice] To update, run: python.exe -m pip install --upgrade pip
`

`yarn start` error

When executing cd interface && yarn start
in a newly opened terminal, I got the error:

Starting the development server...

Error: error:0308010C:digital envelope routines::unsupported
    at new Hash (node:internal/crypto/hash:71:19)
    at Object.createHash (node:crypto:133:10)
    at module.exports (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/util/createHash.js:135:53)
    at NormalModule._initBuildHash (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:417:16)
    at handleParseError (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:471:10)
    at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:503:5
    at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:358:12
    at /home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:373:3
    at iterateNormalLoaders (/home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:214:10)
    at iterateNormalLoaders (/home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:221:10)
/home/yshen/dev/whisper-playground/interface/node_modules/react-scripts/scripts/start.js:19
  throw err;
  ^

Error: error:0308010C:digital envelope routines::unsupported
    at new Hash (node:internal/crypto/hash:71:19)
    at Object.createHash (node:crypto:133:10)
    at module.exports (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/util/createHash.js:135:53)
    at NormalModule._initBuildHash (/home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:417:16)
    at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:452:10
    at /home/yshen/dev/whisper-playground/interface/node_modules/webpack/lib/NormalModule.js:323:13
    at /home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:367:11
    at /home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:233:18
    at context.callback (/home/yshen/dev/whisper-playground/interface/node_modules/loader-runner/lib/LoaderRunner.js:111:13)
    at /home/yshen/dev/whisper-playground/interface/node_modules/react-scripts/node_modules/babel-loader/lib/index.js:59:103 {
  opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
  library: 'digital envelope routines',
  reason: 'unsupported',
  code: 'ERR_OSSL_EVP_UNSUPPORTED'
}

Node.js v18.12.1
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

I also noticed when executing yarn start the second time, a web page is opened, at URL:http://localhost:3000
with an error message:

Hmmm… can't reach this pagelocalhost refused to connect.
Try:

Search the web for [localhost](https://www.bing.com/search?form=ANLKDR&q=localhost)
Checking the connection
[Checking the proxy and the firewall](chrome-error://chromewebdata/#buttons)
ERR_CONNECTION_REFUSED

The backend is not available on Windows.

(whisper) D:\>cd whisper

(whisper) D:\whisper>cd whisper-playground

(whisper) D:\whisper\whisper-playground>cd backend

(whisper) D:\whisper\whisper-playground\backend>python server.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Traceback (most recent call last):
  File "D:\whisper\whisper-playground\backend\server.py", line 5, in <module>
    from backend.client_manager import ClientManager
ModuleNotFoundError: No module named 'backend'

I'm not sure if it's due to the runtime environment.

Implement multi-client support

Only one client can connect at a time due to the transcription calls not being thread-safe. Ensure multiple clients can connect at once.

sh: 1: react-scripts: not found

Hi, when I run npm run start I get:

sh: 1: react-scripts: not found

Then I run npm i, I get this:

angel@PCLX:~/Descargas/whisper-playground-main/interface$ npm i
Debugger attached.
npm ERR! code ERESOLVE
npm ERR! ERESOLVE unable to resolve dependency tree
npm ERR! 
npm ERR! While resolving: [email protected]
npm ERR! Found: [email protected]
npm ERR! node_modules/react
npm ERR!   react@"^18.2.0" from the root project
npm ERR! 
npm ERR! Could not resolve dependency:
npm ERR! peer react@"^16.8.0 || ^17.0.0" from @material-ui/[email protected]
npm ERR! node_modules/@material-ui/core
npm ERR!   @material-ui/core@"^4.11.4" from the root project
npm ERR! 
npm ERR! Fix the upstream dependency conflict, or retry
npm ERR! this command with --force, or --legacy-peer-deps
npm ERR! to accept an incorrect (and potentially broken) dependency resolution.
npm ERR! 
npm ERR! See /home/angel/.npm/eresolve-report.txt for a full report.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/angel/.npm/_logs/2023-02-08T13_55_23_214Z-debug-0.log
Waiting for the debugger to disconnect...


How can I fix this?

Pyaudio

Looks like a great project!
During the Install the backend and frontend environmet sh install_playground.sh I'm getting an error:

image

BEST DIART PARAMETERS IN GENERAL TO USE FOR BETTER DIARIZATION RESULTS

I was wondering if there are some general/best parameters that can be used in DIART config , in order to achieve better diarization results.
In my case I played a audio having two speakers , male and female, but the system kept on increasing the speaker count as the audio progressed.

The audio was of US accent so I think that it should at least work fine for those particular audios.

Any suggestions there to improve the accuracy of diarization would be helpful.

USING THIS SETUP ON CPU

Hi, just wanted to ask if it is possible to use this repo using CPU , if yes then do I need to make any specific changes to the code or any other additional step. OS: Ubuntu 20 LTS and python 3.8

Implement status bar

  • Note that the client is connecting to the server when starting the stream

  • Note the delay for the first transcription with the real-time mode

  • Note that the final transcription is in progress when ending the stream

  • Note that the stream has ended

Use pyannote-audio for speaker diarization

Logic will be to combine Whisper + pyannote.audio based on timestamps to output something along the lines of:

Person A: Hi
Person B: Hello, how are you
Person A: I'm good, and you?
....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.