Git Product home page Git Product logo

cog-insanely-fast-whisper-with-video's Introduction

Hi there ๐Ÿ‘‹

GitHub Stats

Joseph Turian's GitHub stats

Last Updated on 13/09/2022 08:07:56 UTC

cog-insanely-fast-whisper-with-video's People

Contributors

broyojo avatar bt-nia avatar chenxwh avatar eltociear avatar gsheni avatar kadirnar avatar li-yifei avatar omahs avatar patrick91 avatar python481516 avatar skocur avatar turian avatar vaibhavs10 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

cog-insanely-fast-whisper-with-video's Issues

A video download takes about 95% of time

Hey, thanks for making this super easy to use Whisper on Replicate.

I recently noticed that providing a youtube link will result in a long processing time where downloads are very slow, might be due a variety of reasons, and the final processing takes less than a minute (in my case, 9.5min to download and around 30sec to transcribe). While Replicate is generally very affordable, this scales very badly.

I understand the courtesy of a free tool and I immediately think of self-downloading and posting on a temporary S3 link, but I wanted to open an issue to gather more wise brains. A great benefit of Replicate is that, this way, I can execute transcription on a serverless platform with tiny little resources (just passing arguments around).

What do you think?

I once had similar case with my own custom script running on modal.com cloud, the good thing there was that I could execute download in CPU only container (basically free vs. w/gpu) and then pass on to the GPU container for Whisper job.

TypeError: unsupported operand type(s) for //: 'tuple' and 'int'

How to reproduce:

  • upload audio file
  • settings:
    • url: none / empty
    • task: transcribe
    • language: de
    • batch_size: 64
    • timestamp: word
    • diarise_audio: false
    • hf_token: none / empty
  • click "Run" button

I get the same error when running the model deployed from https://github.com/chenxwh/insanely-fast-whisper

Log output

max gpu memory allocated over runtime: 3.14 GB
Traceback (most recent call last):
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/cog/server/worker.py", line 217, in _predict
result = predict(**payload)
^^^^^^^^^^^^^^^^^^
File "/src/predict.py", line 162, in predict
raise e
File "/src/predict.py", line 122, in predict
outputs = self.pipe(
^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 357, in __call__
return super().__call__(inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1132, in __call__
return next(
^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
item = next(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/transformers/pipelines/pt_utils.py", line 266, in __next__
processed = self.infer(next(self.iterator), **self.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1046, in forward
model_outputs = self._forward(model_inputs, **forward_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 552, in _forward
generate_kwargs["num_frames"] = stride[0] // self.feature_extractor.hop_length
~~~~~~~~~~^^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for //: 'tuple' and 'int'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.