Git Product home page Git Product logo

whisper-space's Introduction

Whisper-based ASR Model with NLP Insights

This project is a powerful application that combines the capabilities of OpenAI's Whisper model and spaCy's NLP functionalities. It allows you to upload a 30-second audio file, transcribe the speech to text, and generate various natural language processing insights. The project leverages Gradio to create an interactive web interface for easy use.

Key Features

  • Audio Transcription: Upload an audio file, and the application transcribes the speech to text using the Whisper model.
  • Named Entity Recognition (NER): Identifies and visualizes named entities (such as people, organizations, locations, etc.) in the transcribed text.
  • Part-of-Speech (POS) Tagging: Tags each word in the transcribed text with its corresponding part of speech (e.g., noun, verb, adjective).
  • Dependency Parsing: Analyzes the grammatical structure of the transcribed text by establishing relationships between words.
  • Sentence Segmentation: Splits the transcribed text into individual sentences.

How It Works

  1. Transcription:

    • The uploaded audio file is processed and padded/trimmed to fit a 30-second duration.
    • The audio is converted into a log-Mel spectrogram and passed through the Whisper model to generate a transcription.
  2. NLP Analysis:

    • The transcribed text is analyzed using spaCy.
    • Various NLP insights are generated, including named entity recognition, part-of-speech tagging, dependency parsing, and sentence segmentation.
  3. Display:

    • The transcribed text and NLP insights are displayed on the web interface using Gradio.

Usage

To use this application, follow these steps:

  1. Upload an Audio File: Select an audio file (up to 30 seconds in duration) from your local machine.
  2. View Transcription and Insights: The application will display the transcribed text along with the following insights:
    • Named Entities Visualization
    • Part-of-Speech Tagging
    • Dependency Parsing
    • Sentence Segmentation

Installation

To run this project locally, you need to install the required dependencies. Here is the requirements.txt file for easy setup:

gradio==3.9
openai-whisper
spacy==3.4.0
pydub==0.25.1
torch==1.12.1

Installation Instructions

  1. Create a virtual environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  2. Install the requirements:

    pip install -r requirements.txt

Running the Application

To run the application, execute the following command:

python main.py

This will launch the Gradio interface, and you can interact with the application through your web browser.

Example

Example Interface


This project demonstrates the seamless integration of speech recognition and natural language processing, providing valuable insights from audio inputs in an interactive and user-friendly manner.

Feel free to explore and enhance the capabilities of this application according to your needs. Happy coding!


whisper-space's People

Contributors

udit-rawat avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.