Git Product home page Git Product logo

kurianbenoy / indic-subtitler Goto Github PK

View Code? Open in Web Editor NEW
59.0 2.0 9.0 37.17 MB

Open source subtitling platform πŸ’» for transcribing and translating videos/audios in Indic languages.

Home Page: https://indicsubtitler.in/

License: GNU General Public License v2.0

Jupyter Notebook 98.91% Python 0.28% JavaScript 0.79% CSS 0.02%
asr fastapi nextjs webapp deep-learning faster-whisper inference openai quantization speech-recognition speech-to-text transformers whisper whisperx vegam-whisper

indic-subtitler's Introduction

Indic-Subtitler

An open source subtitling platform πŸ’» for transcribing videos/audios in Indic languages and translating subtitles as well using ML models.

logo1 logo2

Demos

Watch the Indicsubtitler.in Demo Video

Use-cases

  1. Now content-creators, can create youtube videos in their native language like Tamil and create captions in languages like English, Hindi, Malayalam etc. with our tool.
  2. Can create educational content for doctors practising commmunity medicine, can be used in apps for schools. Like a content in English can be translated to Telgu, the mother tongue of student so they can understand things quickly
  3. Can be used for media professionals to subtitle news content, movies etc.

Don't use Indic Subtitler for any unlawful purposes.

Project Architecture

Generate Subtitles Section

Generate_subtitles drawio

Our novel architecture introduced with this project for Generative UI which works for any ASR models

Our novel architecture

Technology stack

1. ML Model

A. SeamlessM4T model

We are planning to use Meta's Seamless Communication technology which was recently released in github [1]. The SeamlessM4T_v2_large model πŸš€, supports almost 12 Indic languages [2] by default. With this model alone, we can potentially transcribe audio in respective languages and even translate subtitles into other languages. More details about SeamlessM4T can be found in paper [7]. The functionality is very well explained in this tutorial [8] written in Seamless Communication Repository.

In lot of Indic languages, there are fine-tuned Whisper ASR models in respective languages. More such models can be found in this Whisper event leaderboard [3]. We have personally fine-tuned Whisper models in my mother tongue malayalam like [4] and [5]. So if performance of any language is not really good in SeamlessM4T model, we can switch to one of the fine-tuned Whisper ASR based models available in open source or make one ourselves. Yet one thing to note though is, that Whisper might not be able to support all the languages listed in Seamless.

Indic Languages supported with SeamlessM4T

Language Code
Assamese asm
Bengali ben
English eng
Gujarati guj
Hindi hin
Kannada kan
Malayalam mal
Marathi mar
Odia ory
Punjabi pan
Tamil tam
Telugu tel
Urdu urd

The language code abbrevation for each of the models can be found out here [6].

B. faster-whisper

faster-whisper [9] is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models. This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU. Since faster-whisper is based in Whisper, it supports all the 99 languages supported by whisper.

Indic Languages supported with faster-whisper

Language Code
Assamese as
Bengali bn
English en
Gujarati gu
Hindi hi
Kannada kn
Malayalam ml
Marathi mr
Punjabi pa
Tamil ta
Telgu te
Urdu ur

C. WhisperX

WhisperX provides fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization. The features provided by WhisperX are:

  • ⚑️ Batched inference for 70x realtime transcription using whisper large-v2
  • πŸͺΆ faster-whisper backend, requires <8GB gpu memory for large-v2 with beam_size=5
  • 🎯 Accurate word-level timestamps using wav2vec2 alignment
  • πŸ‘―β€β™‚οΈ Multispeaker ASR using speaker diarization from pyannote-audio (speaker ID labels)
  • πŸ—£οΈ VAD preprocessing, reduces hallucination & batching with no WER degradation

Indic Languages supported with faster-whisper

Language Code
English en
Hindi hi
Telgu te
Urdu ur

D. fine-tuned Whisper model

In certain languages, Whisper by default is not performing strongly. In your problem, the open source Whisper model doesn’t give good results. Then fine-tune your ASR model with examples like Fine-Tune Whisper For Multilingual ASR with πŸ€— Transformers.

Indic Languages supported with fine-tuned Whisper model

Language Code
Malayalam ml

2. Backend API

We plan to use FastAPI as the backend and deploy it on serveless platforms like Modal.com or any other alternatives.

API format

  • POST request for the webendpoints: generate_seamlessm4t_speech, generate_faster_whisper_speech, generate_whisperx_speech API with the following input format:
{
 "wav_base64": "Audio in base64 format",
 "target": "Your target lanugage you want to transcribe or translate your audio"
}
  • POST request for the functions: youtube_generate_seamlessm4t_speech, youtube_generate_faster_whisper_speech, youtube_generate_whisperx_speech API with the following input format:
{
 "yt_id": "Youtube ID as input in string format",
 "target": "Your target lanugage you want to transcribe or translate your audio"
}

3. Frontend

Next.js, being a React framework, offers you all the benefits of React plus more features out of the box, such as file-based routing and API routes, which can simplify your development process. It's an excellent choice, especially for a web application that requires server-side rendering (SSR) or static site generation (SSG) for better performance and SEO.

Framework: Next.js (enables SSR and SSG, improving load times and SEO) Styling: Tailwind CSS or styled-components (for styling with ease and efficiency)

indic-subtitler's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

indic-subtitler's Issues

doubt about the project

does this project only makes subtitles for youtube videos or it can even translate the youtube videos into various other regional languages??

Multiple Language output

It seems that when uploading an audio or video in Kannada, only the initial portion gets transcribed accurately, while the subsequent part is transcribed in Tamil, as depicted in the provided screenshot. This likely arises due to a language detection error or a system glitch.

Screenshot from 2024-04-12 18-38-02

Improve blog recommendation section for Telugu subtitles

The current blog recommendation section lacks information on which model performs best for generating Telugu subtitles.

Proposed Solution:

  1. Testing: Conduct a test using several models (available in the advanced options) on a set of video samples containing Telugu subtitles.
  2. Evaluation: Analyze the results and identify the model that delivers the most accurate subtitles.
  3. Update Blog: Incorporate the findings into the blog's recommendation section, highlighting the model best suited for Telugu subtitles.

erewer

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.