Git Product home page Git Product logo

wscribe's Introduction

wscribe

Getting started

wscribe is yet another easy to use front-end for whisper specifically for transcription. It aims to be modular so that it can support multiple audio sources, processing backends and inference interfaces. It can run both on CPU and GPU based on the processing backend. Once transcript is generated, editing/correction/visualization of the transcript can be done manually with the wscribe-editor.

It was created at sochara because we have a large volume of audio recordings that need to be transcribed and eventually archived. Another important need was that we needed to verify and manually edit the generated transcript, I could not find any open-source tool that checked all the boxes. Suggested workflow is generating word-level transcript(only supported in json export) and then editing the transcript with the wscribe-editor.

Currently, it supports the following. Check roadmap for upcoming support.

  • Processing backend: faster-whisper
  • Audio sources: Local files (Audio/Video)
  • Inference interfaces: Python CLI
  • File exports: JSON, SRT, WebVTT

Installation

These instructions were tested on NixOS:Python3.10 and ArchLinux:Python3.10 but should work for any other OS, if you face any installation issues please feel free to create issues. I’ll try to put out a docker image sometime.

1. Set required env var

  • WSCRIBE_MODELS_DIR : Path to the directory where whisper models should be downloaded to.
export WSCRIBE_MODELS_DIR=$XDG_DATA_HOME/whisper-models # example

2. Download the models

Recommended
  • Recommended way for downloading the models is to use the helper script, it’ll download the models to WSCRIBE_MODELS_DIR.
    cd /tmp # temporary script, only needed to download the models
    curl https://raw.githubusercontent.com/geekodour/wscribe/main/scripts/fw_dw_hf_wo_lfs.sh
    chmod u+x fw_dw_hf_wo_lfs.sh
    ./fw_dw_hf_wo_lfs.sh tiny # other models: tiny, small, medium and large-v2
        
Manual

You can download the models directly from here using git lfs, make sure you download/copy them to WSCRIBE_MODELS_DIR

3. Install wscribe

Assuming you already have a working python>=3.10 setup

pip install wscribe

Usage

# wscribe transcribe [OPTIONS] SOURCE DESTINATION

# cpu
wscribe transcribe audio.mp3 transcription.json
# use gpu
wscribe transcribe video.mp4 transcription.json --gpu
# use gpu, srt format
wscribe transcribe video.mp4 transcription.srt -g -f srt
# use gpu, srt format, tiny model
wscribe transcribe video.mp4 transcription.vtt -g -f vtt -m tiny
wscribe transcribe --help # all help info

Numbers

devicequantmodeloriginal playbacktranscriptionplayback/transcription
cudafloat16tiny6.3m0.1m68x
cudafloat16small6.3m0.2m29x
cudafloat16medium6.3m0.4m14x
cudafloat16large-v26.3m0.8m7x
cpuint8tiny6.3m0.2m25x
cpuint8small6.3m1.3m4x
cpuint8medium6.3m3.6m~1.7x
cpuint8large-v26.3m3.6m~0.9x

Roadmap

Processing Backends

Transcription Features

  • [ ] Add support for diarization
  • [ ] Add translation
  • [ ] Add VAD/other de-noising stuff etc.
  • [ ] Add local llm integration with llama.cpp or something similar for summary and othe possible things. It can be also used to generate more accurate transcript. Whisper mostly generates sort of a subtitle, for converting subtitle into transcription we need to group the subtitle. This can be done in various ways. Eg. By speaker if diarization is supported, by time chunks etc. By using LLMs or maybe other NLP techniques we’ll also be able to do this with things like break in dialogue etc. Have to explore.

Inference interfaces

  • [-] Python CLI
    • [X] Basic CLI
    • [ ] Improve summary statistics
  • [ ] REST endpoint
    • [ ] Basic server to run wscribe via an API.
    • [ ] Possibly add glue code to expose it via CFtunnels or something similar
  • [ ] GUI

Audio sources

  • [X] Local files
  • [ ] Youtube
  • [ ] Google drive

Distribution

  • [X] Python packaging
  • [ ] Docker/Podman
  • [ ] Package for Nix
  • [ ] Package for Arch(AUR)

Contributing

All contribution happens through PRs, any contributions is greatly appreciated, bugfixes are welcome, features are welcome, tests are welcome, suggestions & criticism are welcome.

Testing

  • make test
  • See other helper commands in Makefile

wscribe's People

Contributors

geekodour avatar dependabot[bot] avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.