Git Product home page Git Product logo

decipher's Introduction

Decipher 📺️

AI-generated transcription subtitles are a way to automatically add subtitles to your videos by using artificial intelligence to transcribe the audio from the video. This eliminates the need for manual transcription and can make your videos more accessible to a wider audience. Decipher uses whisper to transcribe the audio taken from the video and create subtitles

What is whisper?

Whisper is an automatic State-of-the-Art speech recognition system from OpenAI that has been trained on 680,000 hours of multilingual and multitask supervised data collected from the web. This large and diverse dataset leads to improved robustness to accents, background noise and technical language.

Getting Started

There are two different ways to begin using decipher, depending on your preferences:

Google Colab

Open In Colab

Notes:

  • Requires a (free) Google account
  • Instructions are embedded in the Colab Notebook

Google Colab is a cloud-based platform for machine learning and data science, for free without the need for a powerful GPU of your own. It offers the option to borrow a powerful GPU (Tesla K80, T4, P4, or P100) on their server for free for a maximum of 12 hours per session. For those who require even more powerful GPUs and longer runtimes, Colab Pro/Pro+ options are available.

Manual

Dependencies

Installation

pip install git+https://github.com/dsymbol/decipher

or

git clone https://github.com/dsymbol/decipher
cd decipher && pip install . 

Note: Do NOT use 'pip install decipher'. It installs a different package.

GUI (gradio) usage

decipher gui
# or
python -m decipher gui

Command-line usage

The transcribe subcommand allows you to transcribe a video file into a SubRip Subtitle (SRT) file. It also has the option to automatically add the generated subtitles to the video.

The subtitle subcommand allows you to add subtitles to a video using an already existing SRT file. This subcommand does not perform transcription, but rather assumes that the SRT file has already been created. It is typically used by people who want to validate the accuracy of a transcription generated by the transcribe subcommand.

To get started right away:

decipher --help

You can run decipher as a package if running it as a script doesn't work:

python -m decipher --help

Command-line examples:

Generate SRT subtitles for video:

decipher transcribe -i video.mp4 --model small

Burn generated subtitles into video:

decipher subtitle -i video.mp4 --subtitle_file video.srt --subtitle_action burn

Generate and burn subtitles into video without validating transcription:

decipher transcribe -i video.mp4 --model small --subtitle_action burn

decipher's People

Contributors

dsymbol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

decipher's Issues

Disturbing Subs....

Some of the generated subtitles are, "I'm going to kill you." This happens when there is no dialogue or when there is some discordant moaning. There are also some errors with the moaning over dialogue, but these are less of a concern.

It is easy to change the subtitle text before burning, but I find this error disturbing.

I changed language='english' because there was an error with having multiple language in ln. 1169. I was using the large-v2 model.
Macbeth (1979 - Scary Subs

Conversion failed!

When I was burning generated subtitles into video, an error occurred, prompting me' Conversion failed!' The generated mp4 file is 0kb

Could not write header (incorrect codec parameters ?): Invalid argument

Tried to transcribe an episode of the rookie

Input #0, matroska,webm, from 'C:\Users\Duckers\Downloads\rookie.mkv':
Metadata:
encoder : libebml v1.4.4 + libmatroska v1.7.1
Duration: 00:43:03.65, start: 0.000000, bitrate: 10197 kb/s
Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn (default)
Metadata:
BPS : 9555108
DURATION : 00:43:03.623000000
NUMBER_OF_FRAMES: 61945
NUMBER_OF_BYTES : 3085849879
_STATISTICS_WRITING_APP: mkvmerge v74.0.0 ('You Oughta Know') 64-bit
_STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
Stream #0:1(eng): Audio: eac3, 48000 Hz, 5.1(side), fltp, 640 kb/s (default)
Metadata:
BPS : 640000
DURATION : 00:43:03.648000000
NUMBER_OF_FRAMES: 80739
NUMBER_OF_BYTES : 206691840
_STATISTICS_WRITING_APP: mkvmerge v74.0.0 ('You Oughta Know') 64-bit
_STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
Stream #0:2(eng): Subtitle: subrip
Metadata:
BPS : 105
DURATION : 00:42:58.507000000
NUMBER_OF_FRAMES: 1178
NUMBER_OF_BYTES : 34122
_STATISTICS_WRITING_APP: mkvmerge v74.0.0 ('You Oughta Know') 64-bit
_STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
Stream #0:3(eng): Subtitle: subrip (hearing impaired)
Metadata:
title : SDH
BPS : 113
DURATION : 00:42:58.507000000
NUMBER_OF_FRAMES: 1299
NUMBER_OF_BYTES : 36505
_STATISTICS_WRITING_APP: mkvmerge v74.0.0 ('You Oughta Know') 64-bit
_STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
[adts @ 0000026a2c79e600] Only AAC streams can be muxed by the ADTS muxer
[out#0/adts @ 0000026a2c785c80] Could not write header (incorrect codec parameters ?): Invalid argument
Error opening output file rookie.aac.
Error opening output files: Invalid argument

AssertionError

image
AssertionError is prompted after using $ decipher transcribe --help.

Generate srt?

Hello,

This is very similar to something I was planning to do.

Wouldn't it be better to generate .srt files instead? So one can manually adjust the errors, fix the size, disable them and so on.

If you want a single file you can separately make a .mkv file and add a subtitle track.

Failed to load audio

I'm trying to generate an SRT file for a video but I get the following error:

RuntimeError: Failed to load audio: ffmpeg version 5.1.1-1ubuntu1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12 (Ubuntu 12.2.0-1ubuntu1)
  configuration: --prefix=/usr --extra-version=1ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-shared
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
[aac @ 0x55f18b9979c0] Format aac detected only with low score of 1, misdetection possible!
chuck.s01e02.bluray.1080p.DD5.1.H265-d3g.aac: End of file

Not really sure why. To be clear, I can watch the video without any trouble with audio.

TypeError: unsupported operand type(s) for |: 'type' and 'type'

decipher gui
Traceback (most recent call last):
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\Scripts\decipher.exe\__main__.py", line 4, in <module>
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\site-packages\decipher\__main__.py", line 4, in <module>
    from decipher.action import subtitle, transcribe
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\site-packages\decipher\action.py", line 10, in <module>
    from ffutils import ffprog
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\site-packages\ffutils\__init__.py", line 1, in <module>
    from .ff import get_ffmpeg_exe, ffprog
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\site-packages\ffutils\ff.py", line 116, in <module>
    def _download_exe(url: str, filename: str | Path) -> str:
TypeError: unsupported operand type(s) for |: 'type' and 'type'

PS C:\Users\krafi\decipher>


PS C:\Users\krafi\decipher> python -m decipher gui
Traceback (most recent call last):
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\krafi\decipher\decipher\__main__.py", line 4, in <module>
    from decipher.action import subtitle, transcribe
  File "C:\Users\krafi\decipher\decipher\action.py", line 10, in <module>
    from ffutils import ffprog
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\site-packages\ffutils\__init__.py", line 1, in <module>
    from .ff import get_ffmpeg_exe, ffprog
  File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\site-packages\ffutils\ff.py", line 116, in <module>
    def _download_exe(url: str, filename: str | Path) -> str:
TypeError: unsupported operand type(s) for |: 'type' and 'type'

Issue with burn subtitle

This tool is great and thx for this, I don't have to install all the stuff and run it on colab for srt generation.
After i got text generated, I tried to Burn in to video. No matter use the Burn in "Transcript" or "Subtitle", I will get below attached result. It seems I am missing the font file?
My text strings are in Chinese
error

Some text got generated and try to burn in:
`1
00:00:00,720 --> 00:00:02,000
哈囉大家好我是品哥

2
00:00:02,000 --> 00:00:04,400
今天又來到了大家最喜歡的單元

3
00:00:04,400 --> 00:00:06,599
麥當勞新品開箱

4
00:00:06,599 --> 00:00:08,699
這次我要開箱的第一個是

5
00:00:08,699 --> 00:00:10,500
熬龍蝦堡`

Another question is what does "Add" do? I thought this will add generated string in to video, but I actually don't see anything been added. Only "Burn" will have different outcome.

Not really a critical issue but im getting this error.

So I have a 3060 and a core i9-12900kf and when im using decipher im getting the error: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
but also this software is absoultely amazing you have done an absolutely amazing job

2 issues on macos / apple silicon

thanks a lot for this tool!!

I installed it by manually git cloning the repo, and running pip3 install .

the first error I received was:

TypeError: WhisperForConditionalGeneration.__init__() got an unexpected keyword argument 'attn_implementation'

I "fixed" it by removing model_kwargs={"attn_implementation": "sdpa"}, from this line in action.py

the next problem I encountered was AttributeError: module 'torch' has no attribute 'mps'

I "fixed" that one by commenting out these lines in the same file

and then everything worked and I was able to burn the subtitles into the video file! :-) woo

if it helps, I'm using Python 3.11.2 on macos 12.7

thanks again!

Error on transcribe on Macbook M1

Running decipher transcribe -i E02.mp4 --model small results in an error:

(base) adam@192 decipher % decipher transcribe -i E02.mp4 --model small Extracting audio file... Traceback (most recent call last): File "/Users/adam/opt/anaconda3/bin/decipher", line 8, in <module> sys.exit(main()) File "/Users/adam/opt/anaconda3/lib/python3.9/site-packages/decipher/__main__.py", line 92, in main transcribe( File "/Users/adam/opt/anaconda3/lib/python3.9/site-packages/decipher/decipher.py", line 18, in transcribe run( File "/Users/adam/opt/anaconda3/lib/python3.9/site-packages/decipher/decipher.py", line 83, in run p = subprocess.run(command, text=True) File "/Users/adam/opt/anaconda3/lib/python3.9/subprocess.py", line 505, in run with Popen(*popenargs, **kwargs) as process: File "/Users/adam/opt/anaconda3/lib/python3.9/subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/Users/adam/opt/anaconda3/lib/python3.9/subprocess.py", line 1821, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg -y -i /Users/adam/Git/decipher/E02.mp4 -vn -acodec copy E02.aac'

Running on MacBook Pro M1

Google Colab Issue

Hello! Whenever I try to run Decipher on Google Colab, this error message pops up. I think it might be related to your implementation of insanely-fast-whisper, but don't quote me on that (I'm not a coder at all lol):

ImportError Traceback (most recent call last)
in <cell line: 18>()
16 dir = os.getcwd()
17
---> 18 transcribe(
19 input,
20 output_dir if output_dir else "result",

7 frames
/usr/local/lib/python3.10/dist-packages/decipher/action.py in transcribe(video_in, output_dir, model, language, task, batch_size, subs)
75
76 temp_srt = mktemp(suffix=".srt", dir=os.getcwd())
---> 77 audio_to_srt(audio_file, temp_srt, model, task, language, batch_size)
78 os.remove(audio_file)
79 srt_filename = video_in.stem + ".srt"

/usr/local/lib/python3.10/dist-packages/decipher/action.py in audio_to_srt(audio_file, temp_srt, model, task, language, batch_size)
35 print(f"{device.upper()} is being used for this transcription, this process may take a while.")
36
---> 37 pipe = pipeline(
38 "automatic-speech-recognition",
39 model=f"openai/whisper-{model}",

/usr/local/lib/python3.10/dist-packages/transformers/pipelines/init.py in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
903 if isinstance(model, str) or framework is None:
904 model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]}
--> 905 framework, model = infer_framework_load_model(
906 model,
907 model_classes=model_classes,

/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py in infer_framework_load_model(model, config, model_classes, task, framework, **model_kwargs)
277
278 try:
--> 279 model = model_class.from_pretrained(model, **kwargs)
280 if hasattr(model, "eval"):
281 model = model.eval()

/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
559 elif type(config) in cls._model_mapping.keys():
560 model_class = _get_model_class(config, cls._model_mapping)
--> 561 return model_class.from_pretrained(
562 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
563 )

/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3367
3368 config = copy.deepcopy(config) # We do not want to modify the config inplace in from_pretrained.
-> 3369 config = cls._autoset_attn_implementation(
3370 config, use_flash_attention_2=use_flash_attention_2, torch_dtype=torch_dtype, device_map=device_map
3371 )

/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in _autoset_attn_implementation(cls, config, use_flash_attention_2, torch_dtype, device_map, check_device_map)
1367 elif requested_attn_implementation in [None, "sdpa"]:
1368 # use_flash_attention_2 takes priority over SDPA, hence SDPA treated in this elif.
-> 1369 config = cls._check_and_enable_sdpa(
1370 config,
1371 hard_check_only=False if requested_attn_implementation is None else True,

/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in _check_and_enable_sdpa(cls, config, hard_check_only)
1529 )
1530 if not is_torch_sdpa_available():
-> 1531 raise ImportError(
1532 "PyTorch SDPA requirements in Transformers are not met. Please install torch>=2.1.1."
1533 )

ImportError: PyTorch SDPA requirements in Transformers are not met. Please install torch>=2.1.1.

Implement functionality to define a maximum length or word count for subtitles per block

Description:
Currently, our project generates subtitles per block automatically in SRT format without any restrictions on length or word count using Decipher. In order to improve the user experience and readability, I would like to suggest if you could implement a feature that allows us to define a max_length or max_words variable, ensuring that the generated subtitles comply with the specified limit.

Something Like that: openai/whisper#314

Please add s static-ffmpg to provide binaries

You can use
from static_ffmpef import add_paths
add_paths(weak=True)

To add ffmpeg to your project that will download platform specific ffmpeg for this package if ffmpeg is not already on the system path.

My own package 'transcribe-anything' does this to great effect.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.