Git Product home page Git Product logo

data-science-capstone-assignment's Introduction

Transcriptor Project

Overview

Transcriptor is an advanced transcription tool that leverages the OpenAI Whisper Tiny model to convert speech from audio files, YouTube videos, or live input into text. It features a user-friendly frontend built with Streamlit and a robust backend designed with FastAPI and WebSockets, providing real-time transcription capabilities with high accuracy.

Features

  • Transcribe audio from local files (.wav format).
  • Transcribe content directly from YouTube URLs.
  • Real-time transcription through live audio input.
  • Simple and intuitive user interface.
  • Option to download transcriptions as a text file.
  • Generates live SRT files using Silero VAD and Whisper which can be used in the video.

How It Works

  1. Select a Mode of Operation: Choose whether to upload an audio file, enter a YouTube URL, or start live transcription.
  2. Upload or Input: Depending on the selected mode, either upload an audio file, enter a YouTube URL, or begin speaking when prompted.
  3. Review: Wait for the transcription to complete and then review your transcribed text.
  4. Download: Get a copy of the transcribed text by downloading it directly from the interface.

Installation

To set up the Transcriptor project locally, you need to create a virtual environment and load the requirements in the requirements.txt file in the main folder after navigating the project directory. This project requires ffmpeg module for real time transcriptions. Please install this module if necessary

  1. Clone the repository (ssh example):
    git clone [email protected]:jainal09/data-science-capstone-assignment.git
    
  2. Navigate to the project directory:
    cd data-science-capstone-assignment
    
  3. In order to run the backend ggo to the backend directory and start the FASTAPI server
    cd backend
    uvicorn app:app --host 0.0.0.0 --port 8000
    
  4. After running the backend, you need to start the streamlit frontend in another terminal, go to the frontend folder and run the streamlit app
    cd frontend
    streamlit run app.py
    

Usage

The frontend at: http://localhost:8501 The backend at: http://localhost:8000

Docker

docker compose up -d

It starts both the frontend and the backend communicating with eacb other.

License

MIT License

Copyright (c) 2024 Jainal Gosaliya, Adarsh Gupta, Faizan Shaikh

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Citations

  1. OpenAI Whisper Tiny Model: The core of Transcriptor's transcription capabilities is built upon the OpenAI Whisper Tiny model, a state-of-the-art Automatic Speech Recognition (ASR) model capable of transcribing speech in 99 languages with high accuracy [0].

  2. Hugging Face Whisper Tiny: The Whisper Tiny model is available on Hugging Face, a platform that hosts a wide range of models for various NLP tasks, including ASR. This model is designed for both English-only and multilingual applications, with parameters ranging from 38M for tiny models to 1.5B for large models [0].

  3. Whisper Release and Collection: The Whisper model collection includes both English-only and multilingual checkpoints for ASR and Speech-to-Text (ST), showcasing the versatility and efficiency of the model in different language and application scenarios [0].

  4. NB-Whisper Small Model: The Norwegian NB-Whisper Small model, developed by the National Library of Norway, is based on the work of OpenAI's Whisper. This model has been trained on a diverse dataset of 8 million samples, demonstrating the potential for fine-tuning and adaptation of the Whisper model for specific language and domain applications [3].

  5. Whisper's Code and Model Weights: The code and model weights for Whisper are released under the MIT License, making it accessible for developers and researchers to build upon and integrate into their projects. This open-source approach facilitates the development of advanced transcription tools like Transcriptor [4].

data-science-capstone-assignment's People

Contributors

jainal09 avatar adarsh1999 avatar shfaizan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.