Git Product home page Git Product logo

whisperbot's Introduction

WhisperBot

Welcome to WhisperBot. WhisperBot builds upon the capabilities of the WhisperLive by integrating Mistral, a Large Language Model (LLM), on top of the real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper, a powerful automatic speech recognition (ASR) system. Both Mistral and Whisper are optimized to run efficiently as TensorRT engines, maximizing performance and real-time processing capabilities.

Features

  • Real-Time Speech-to-Text: Utilizes OpenAI WhisperLive to convert spoken language into text in real-time.

  • Large Language Model Integration: Adds Mistral, a Large Language Model, to enhance the understanding and context of the transcribed text.

  • TensorRT Optimization: Both Mistral and Whisper are optimized to run as TensorRT engines, ensuring high-performance and low-latency processing.

Prerequisites

Install TensorRT-LLM to build Whisper and Mistral TensorRT engines. The README builds a docker image for TensorRT-LLM. Instead of building a docker image, we can also refer to the README and the Dockerfile.multi to install the required packages in the base pytroch docker image. Just make sure to use the correct base image as mentioned in the dockerfile and everything should go nice and smooth.

Build Whisper TensorRT Engine

cd TensorRT-LLM/examples/whisper
  • Currently, by default TensorRT-LLM only supports large-v2 and large-v3. In this repo, we use small.en.
  • Download the required assets.
wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz

# small.en model
wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
  • Edit build.py to support small.en. In order to do that, add "small.en" as an item in the list choices.
  • Build small.en TensorRT engine.
pip install -r requirements.txt
python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin  --use_bert_attention_plugin --model_name small.en

Build Mistral TensorRT Engine

cd TensorRT-LLM/examples/llama
  • Convert Mistral to fp16 TensorRT engine.
python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
                --dtype float16 \
                --remove_input_padding \
                --use_gpt_attention_plugin float16 \
                --enable_context_fmha \
                --use_gemm_plugin float16 \
                --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
                --max_input_len 5000
                --max_batch_size 1

Build Phi TensorRT Engine

Note: Phi is only available in main branch and hasnt been released yet. So, make sure to build TensorRT-LLM from main branch.

cd TensorRT-LLM/examples/phi
  • Build phi TensorRT engine
git lfs install
git clone https://huggingface.co/microsoft/phi-2
python3 build.py --dtype=float16                    \
                 --log_level=verbose                \
                 --use_gpt_attention_plugin float16 \
                 --use_gemm_plugin float16          \
                 --max_batch_size=16                \
                 --max_input_len=1024               \
                 --max_output_len=1024              \
                 --output_dir=phi_engine            \
                 --model_dir=phi-2>&1 | tee build.log

Run WhisperBot

  • Clone this repo and install requirements.
git clone https://github.com/collabora/WhisperBot.git
cd WhisperBot
apt update
apt install ffmpeg portaudio19-dev -y
pip install -r requirements.txt

Whisper + Mistral

  • Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Mistral TensorRT from the build phase. If a huggingface model is used to build mistral then just use the huggingface repo name as the tokenizer path.
python3 main.py --mistral
                --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
                --mistral_tensorrt_path /root/TensorRT-LLM/examples/llama/tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
                --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B

Whisper + Phi

  • Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Phi TensorRT from the build phase. If a huggingface model is used to build phi then just use the huggingface repo name as the tokenizer path.
python3 main.py --phi
                --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
                --phi_tensorrt_path /root/TensorRT-LLM/examples/phi/phi_engine \
                --phi_tokenizer_path /root/TensorRT-LLM/examples/phi/phi-2
  • On the client side clone the repo, install the requirements and execute run_client.py
cd WhisperBot
pip install -r requirements.txt
python3 run_client.py

Contact Us

For questions or issues, please open an issue. Contact us at: [email protected], [email protected], [email protected]

whisperbot's People

Contributors

makaveli10 avatar zoq avatar jpc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.