collabora / whisperfusion Goto Github PK

WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.

Python 93.27% Dockerfile 0.44% Shell 6.29%

whisperfusion's Introduction

WhisperFusion

Seamless conversations with AI (with ultra-low latency)

Welcome to WhisperFusion. WhisperFusion builds upon the capabilities of the WhisperLive and WhisperSpeech by integrating Mistral, a Large Language Model (LLM), on top of the real-time speech-to-text pipeline. Both LLM and Whisper are optimized to run efficiently as TensorRT engines, maximizing performance and real-time processing capabilities. While WhiperSpeech is optimized with torch.compile.

Features

Real-Time Speech-to-Text: Utilizes OpenAI WhisperLive to convert spoken language into text in real-time.
Large Language Model Integration: Adds Mistral, a Large Language Model, to enhance the understanding and context of the transcribed text.
TensorRT Optimization: Both LLM and Whisper are optimized to run as TensorRT engines, ensuring high-performance and low-latency processing.
torch.compile: WhisperSpeech uses torch.compile to speed up inference which makes PyTorch code run faster by JIT-compiling PyTorch code into optimized kernels.

Hardware Requirements

A GPU with at least 24GB of RAM
For optimal latency, the GPU should have a similar FP16 (half) TFLOPS as the RTX 4090. Here are the hardware specifications for the RTX 4090.

The demo was run on a single RTX 4090 GPU. WhisperFusion uses the Nvidia TensorRT-LLM library for CUDA optimized versions of popular LLM models. TensorRT-LLM supports multiple GPUs, so it should be possible to run WhisperFusion for even better performance on multiple GPUs.

Getting Started

We provide a Docker Compose setup to streamline the deployment of the pre-built TensorRT-LLM docker container. This setup includes both Whisper and Phi converted to TensorRT engines, and the WhisperSpeech model is pre-downloaded to quickly start interacting with WhisperFusion. Additionally, we include a simple web server for the Web GUI.

Build and Run with docker compose for RTX 3090 and RTX

mkdir docker/scratch-space
cp docker/scripts/build-* docker/scripts/run-whisperfusion.sh docker/scratch-space/

# Set the CUDA_ARCH environment variable based on your GPU
# Use '86-real' for RTX 3090, '89-real' for RTX 4090
CUDA_ARCH=86-real docker compose build
docker compose up

Start Web GUI on http://localhost:8000

NOTE

Contact Us

For questions or issues, please open an issue. Contact us at: [email protected], [email protected], [email protected]

whisperfusion's People

Contributors

Stargazers

Watchers

Forkers

zoq jpc evdcush bllxk turbo-agi makaveli10 registersalad afro-lingo techthiyanes hbcbh1999 areski mrcodechef uakbr zaoyang segmond guluarte leroyg qmagix muharremokutan akim shivammehta25 animesh josephrp leetesla vincentsider cent-usa2025 allinbsv lyhiving kilingzhang escottgoodwin cylonspace zhuqunwu vochat liuzl fjteam glaceage dudesmitherz damianb-bitflipper wodole fang-zhang mu-l blackthompson mbrukman starquest-ai frank-ay hustshawn ailabteam hhy5277 sennoy11012 yanxg aaronluoxiao anilali mhylle ericzhang0215 gbouslov chessvia-dev fmbento jimmy1984xu gaojiaodeng xxxx001 meta-introspector young17 cenwurong viningr sujianwei1 13609600590 anthonyyuan forexwiki fenardh siliciuss yvtang marmikcfc slikovnicas vital121 syaikhipin shiqimei milhouse1337 nasa03 tonywang-sh georgegu tyresedev stealth-d varinliali josephlahiru abhiskmr wentianyang tomlxr saltb0xapps tomchapin zhangziliang04 z-zawhtet-a charithmadhuranga digitally-challenged mdwoicke

whisperfusion's Issues

VAD in the client / browser?

Doing VAD on the server makes up for a lot of unnecessary traffic and server load. Why not do it on the client?

Thanks to wasm, Silero VAD can be run entirely in the browser. I am building a similar project to this and am using https://github.com/ricky0123/vad. Judging by my tests, it works really well on desktop as well as mobile devices.

One Click Deploy - WhisperSpeech/WhisperFusion

The Objective

Basically folks are having trouble deploying whisperfusion locally using command-line / docker , therefore we can actually help them quite a lot by including one-click (cloud) deploy buttons for them even directly in the whisperfusion readme.md

One-Click (Cloud) Deploy

Basically we can find some shields and when you click on them it should deploy whisperfusion demo to the cloud provider.

Example One-Click Deploy

Pinokio.Computer
github spaces
huggingface spaces
aws / gcs / azure

which browser you used?

I am exprementing issues when using firefox and chromium.

for firefox it would throw the exception in main.js

Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported.

for chromimum it does not ask the permission for microphone and throw

main.js:138 Websocket created.
main.js:141 Connected to server.
main.js:47 AudioContext {baseLatency: 0.032, outputLatency: 0, destination: AudioDestinationNode, currentTime: 0, sampleRate: 16000, …}
main.js:78 Error TypeError: Cannot read properties of undefined (reading 'getUserMedia')
    at start_recording (main.js:53:57)
main.js:152 Connection closed (1006).

which browser you used?

error run docker compose - python3: command not found in build-trt-llm.sh

ARG BASE_IMAGE=nvcr.io/nvidia/cuda
=> [whisperfusion internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [whisperfusion internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 805B 0.0s
=> [whisperfusion internal] load metadata for nvcr.io/nvidia/cuda:12.2.2-devel-ubuntu22.04 0.0s
=> [whisperfusion base 1/2] FROM nvcr.io/nvidia/cuda:12.2.2-devel-ubuntu22.04 0.0s
=> [whisperfusion internal] load build context 0.0s
=> => transferring context: 212B 0.0s
=> CACHED [whisperfusion base 2/2] RUN apt-get update && apt-get install -y --no-install-recommends x 0.0s
=> CACHED [whisperfusion devel 1/5] WORKDIR /root 0.0s
=> CACHED [whisperfusion devel 2/5] COPY scripts/install-deps.sh /root 0.0s
=> CACHED [whisperfusion devel 3/5] RUN bash install-deps.sh && rm install-deps.sh 0.0s
=> CACHED [whisperfusion devel 4/5] COPY scripts/build-trt-llm.sh /root 0.0s
=> ERROR [whisperfusion devel 5/5] RUN bash build-trt-llm.sh && rm build-trt-llm.sh 0.3s

[whisperfusion devel 5/5] RUN bash build-trt-llm.sh && rm build-trt-llm.sh:
0.244 build-trt-llm.sh: line 9: python3: command not found

AttributeError: '_Runtime' object has no attribute 'address' on Ubuntu with T4 GPU

I try to run this project on AWS EC2 g4dn.xlarge with T4 GPU and I got AttributeError as below

==========
== CUDA ==
==========

CUDA Version 12.2.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

done loading
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
[02/06/2024-02:23:28] [TRT] [E] 6: The engine plan file is generated on an incompatible device, expecting compute 7.5 got compute 8.9, please rebuild.
[02/06/2024-02:23:29] [TRT] [E] 2: [engine.cpp::deserializeEngine::1148] Error Code 2: Internal Error (Assertion engine->deserialize(start, size, allocator, runtime) failed. )
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/WhisperFusion/llm_service.py", line 195, in run
    self.initialize_model(
  File "/root/WhisperFusion/llm_service.py", line 109, in initialize_model
    self.runner = self.runner_cls.from_dir(**self.runner_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py", line 417, in from_dir
    session = session_cls(model_config,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 475, in __init__
    self.runtime = _Runtime(engine_buffer, mapping)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 153, in __init__
    self.__prepare(mapping, engine_buffer)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 174, in __prepare
    assert self.engine is not None
AssertionError
Exception ignored in: <function _Runtime.__del__ at 0x7fa97eebb5b0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 279, in __del__
    cudart.cudaFree(self.address)  # FIXME: cudaFree is None??
AttributeError: '_Runtime' object has no attribute 'address'

RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

version: "3.8"
services:

  whisperfusion:
    image: ghcr.io/collabora/whisperfusion:latest
    shm_size: 64G
    expose:
     - 6006/tcp
     - 8888/tcp
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['1']
              capabilities: [gpu]

$ docker compose run --entrypoint "nvidia-smi -L" --rm whisperfusion
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-e16d1e1c-7902-1bcb-fd46-e437e472b976)

$ docker compose up whisperfusion
[+] Running 1/0
 ✔ Container whisperfusion-whisperfusion-1  Created                                                                                           0.0s
Attaching to whisperfusion-whisperfusion-1
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | ==========
whisperfusion-whisperfusion-1  | == CUDA ==
whisperfusion-whisperfusion-1  | ==========
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | CUDA Version 12.2.2
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
whisperfusion-whisperfusion-1  | By pulling and using the container, you accept the terms and conditions of this license:
whisperfusion-whisperfusion-1  | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | done loading
whisperfusion-whisperfusion-1  | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
whisperfusion-whisperfusion-1  | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
whisperfusion-whisperfusion-1  | Process Process-3:
whisperfusion-whisperfusion-1  | Traceback (most recent call last):
whisperfusion-whisperfusion-1  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
whisperfusion-whisperfusion-1  |     self.run()
whisperfusion-whisperfusion-1  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
whisperfusion-whisperfusion-1  |     self._target(*self._args, **self._kwargs)
whisperfusion-whisperfusion-1  |   File "/root/WhisperFusion/llm_service.py", line 195, in run
whisperfusion-whisperfusion-1  |     self.initialize_model(
whisperfusion-whisperfusion-1  |   File "/root/WhisperFusion/llm_service.py", line 109, in initialize_model
whisperfusion-whisperfusion-1  |     self.runner = self.runner_cls.from_dir(**self.runner_kwargs)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py", line 416, in from_dir
whisperfusion-whisperfusion-1  |     torch.cuda.set_device(rank % runtime_mapping.gpus_per_node)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 404, in set_device
whisperfusion-whisperfusion-1  |     torch._C._cuda_setDevice(device)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
whisperfusion-whisperfusion-1  |     torch._C._cuda_init()
whisperfusion-whisperfusion-1  | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
whisperfusion-whisperfusion-1  | /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
whisperfusion-whisperfusion-1  |   return torch._C._cuda_getDeviceCount() > 0
whisperfusion-whisperfusion-1  | /usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
whisperfusion-whisperfusion-1  |   warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
whisperfusion-whisperfusion-1  | Process Process-4:
whisperfusion-whisperfusion-1  | Traceback (most recent call last):
whisperfusion-whisperfusion-1  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
whisperfusion-whisperfusion-1  |     self.run()
whisperfusion-whisperfusion-1  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
whisperfusion-whisperfusion-1  |     self._target(*self._args, **self._kwargs)
whisperfusion-whisperfusion-1  |   File "/root/WhisperFusion/tts_service.py", line 19, in run
whisperfusion-whisperfusion-1  |     self.initialize_model()
whisperfusion-whisperfusion-1  |   File "/root/WhisperFusion/tts_service.py", line 14, in initialize_model
whisperfusion-whisperfusion-1  |     self.pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-tiny-en+pl.model')
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/whisperspeech/pipeline.py", line 61, in __init__
whisperfusion-whisperfusion-1  |     self.vocoder = Vocoder()
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/whisperspeech/a2wav.py", line 14, in __init__
whisperfusion-whisperfusion-1  |     self.vocos = Vocos.from_pretrained(repo_id).cuda()
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 918, in cuda
whisperfusion-whisperfusion-1  |     return self._apply(lambda t: t.cuda(device))
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
whisperfusion-whisperfusion-1  |     module._apply(fn)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
whisperfusion-whisperfusion-1  |     module._apply(fn)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
whisperfusion-whisperfusion-1  |     module._apply(fn)
whisperfusion-whisperfusion-1  |   [Previous line repeated 4 more times]
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 833, in _apply
whisperfusion-whisperfusion-1  |     param_applied = fn(param)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 918, in <lambda>
whisperfusion-whisperfusion-1  |     return self._apply(lambda t: t.cuda(device))
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
whisperfusion-whisperfusion-1  |     torch._C._cuda_init()
whisperfusion-whisperfusion-1  | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
whisperfusion-whisperfusion-1  | Failed to load the T2S model:
whisperfusion-whisperfusion-1  | Traceback (most recent call last):
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/whisperspeech/pipeline.py", line 48, in __init__
whisperfusion-whisperfusion-1  |     self.t2s = TSARTransformer.load_model(**args).cuda()
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 918, in cuda
whisperfusion-whisperfusion-1  |     return self._apply(lambda t: t.cuda(device))
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
whisperfusion-whisperfusion-1  |     module._apply(fn)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 833, in _apply
whisperfusion-whisperfusion-1  |     param_applied = fn(param)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 918, in <lambda>
whisperfusion-whisperfusion-1  |     return self._apply(lambda t: t.cuda(device))
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
whisperfusion-whisperfusion-1  |     torch._C._cuda_init()
whisperfusion-whisperfusion-1  | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | Failed to load the S2A model:
whisperfusion-whisperfusion-1  | Traceback (most recent call last):
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/whisperspeech/pipeline.py", line 56, in __init__
whisperfusion-whisperfusion-1  |     self.s2a = SADelARTransformer.load_model(**args).cuda()
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/whisperspeech/s2a_delar_mup_wds_mlang.py", line 423, in load_model
whisperfusion-whisperfusion-1  |     spec = torch.load(local_filename)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1014, in load
whisperfusion-whisperfusion-1  |     return _load(opened_zipfile,
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1422, in _load
whisperfusion-whisperfusion-1  |     result = unpickler.load()
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1392, in persistent_load
whisperfusion-whisperfusion-1  |     typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1366, in load_tensor
whisperfusion-whisperfusion-1  |     wrap_storage=restore_location(storage, location),
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 381, in default_restore_location
whisperfusion-whisperfusion-1  |     result = fn(storage, location)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 274, in _cuda_deserialize
whisperfusion-whisperfusion-1  |     device = validate_cuda_device(location)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 258, in validate_cuda_device
whisperfusion-whisperfusion-1  |     raise RuntimeError('Attempting to deserialize object on a CUDA '
whisperfusion-whisperfusion-1  | RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
whisperfusion-whisperfusion-1  |

Recommended Hardware?

Hi WhisperFusion. Your demo is really quite shockingly good.

I am trying to replicate that, but my system is rather slow and laggy. I suspect it is because the GPU I am running the Whisper LLM is not powerful enough. I am currently renting a single L4 GPU.

What are your hardware recommendations to run this model?

Also I'd recommend adding this information to the README for others' future reference. Thanks!

expected 8.0 but got 8.9

Hello, when I am running the docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion:latest I got the following error expected compute 8.0 but got compute 8.9. I am trying to run this on A100 80GB GPU in GCP. I have read the docs and it said to run like this to build for a new arch bash build.sh 86-real. Now should I build like

bash build.sh 8.9-real

bash build.sh 89-real

docker stuck

sometimes no response or not stable

sometimes it does not have response and sometimes no text-to-speech generated. why?

WhisperFusion currently doesn't work in WSL2 with Docker Desktop (CUDA init issue in PyTorch)

I followed the instructions from the README and the docker image built fine.

However, when I ran it, the WhisperFusion process fails (which makes the webapp not work).

The problem is unfortunately hidden because by default all of the logs in build-models.sh are sent to /dev/null. Removing that I get this

whisperfusion-1  | [06/06/2024-00:16:18] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
whisperfusion-1  | [06/06/2024-00:16:18] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
whisperfusion-1  | [06/06/2024-00:16:18] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
whisperfusion-1  | [06/06/2024-00:16:18] [TRT] [W] Unable to determine GPU memory usage: named symbol not found
whisperfusion-1  | [06/06/2024-00:16:18] [TRT] [W] Unable to determine GPU memory usage: named symbol not found
whisperfusion-1  | [06/06/2024-00:16:18] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 558, GPU 0 (MiB)
whisperfusion-1  | [06/06/2024-00:16:18] [TRT] [E] 6: CUDA initialization failure with error: 500. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
whisperfusion-1  | Traceback (most recent call last):
whisperfusion-1  |   File "/root/TensorRT-LLM-examples/whisper/build.py", line 384, in <module>
whisperfusion-1  |     run_build(args)
whisperfusion-1  |   File "/root/TensorRT-LLM-examples/whisper/build.py", line 378, in run_build
whisperfusion-1  |     build_encoder(model, args)
whisperfusion-1  |   File "/root/TensorRT-LLM-examples/whisper/build.py", line 188, in build_encoder
whisperfusion-1  |     builder = Builder()
whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/builder.py", line 82, in __init__
whisperfusion-1  |     self._trt_builder = trt.Builder(logger.trt_logger)
whisperfusion-1  | TypeError: pybind11::init(): factory function returned nullptr

which seems to be a problem of not finding my GPU, however if I run

docker run -it --gpus=all --rm whisperfusion:latest nvidia-smi

I get the expected result

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:01:00.0  On |                  Off |
| 30%   42C    P2            108W /  450W |    5983MiB /  24564MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Next, I tried to see if PyTorch can see the GPU with this command

docker run -it --gpus=all --rm whisperfusion:latest python -c 'import torch; print(torch.cuda.device_count())'

I get the expected 1 but when I run this other command

docker run -it --gpus=all --rm whisperfusion:latest python -c 'import torch; print(torch.cuda.is_available())'

I get the error

/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

I googled around for that error and a found this issue

NVIDIA/nvidia-container-toolkit#520

which is recent (last week) problem in the nvidia-container-toolkit about a missing dll that it's needed for CUDA to work properly when operating in WSL

The issue suggests to upgrade the nvidia-container-toolkit but it also says that if we're using Docker Desktop, as I am, a solution is not yet available and the only solution is to downgrade the NVIDIA Driver to version 552.xx or earlier. I'm probably just going to wait until a fix is available, but I thought I'd pass this along because others might be running into the same problem.

Uncaught TypeError in audio-processor.js

Getting this. I'm trying to view the page on windows while running it in WSL. Everything else seems OK.

Uncaught TypeError: Cannot read properties of undefined (reading 'set')
at AudioStreamProcessor.process (audio-processor.js:24:23)
process @ audio-processor.js:24

Issue building on windows

hello,
I am trying to run the docker on my windows 11 with RTX 4090, i used the following command (no image for 4090)
docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion-3090:latest
but it failed with this error:

Make a Cool Agents Demo With WhisperFusion

Objective

showcase the speed and quality of whisperfusion style applications using the huggingface Agents library.

Tasks

Wrap WhisperFusion with huggingface/transformers/agents for compatibility
add cool tools like whisperspeech pipeline , or something else that could be fun :-)
also push to transformers and agents

Deploy

deploy docker on huggingface

References :

agents : https://huggingface.co/docs/transformers/en/transformers_agents

WhisperLive `segments` only transcribe last 25 seconds

Code reference: https://github.com/collabora/WhisperFusion/blob/main/whisper_live/trt_server.py#L376-L389

The highlighted code does not make logical sense to me and seems buggy. Mostly that segments is only the last_segment. The last_segment is the clipped audio of the last 25 seconds. If someone talks for longer than 25 seconds with no EOS in between, then the demo UI would not record the earliest part of the conversation. Also the point of line 380 makes no clear sense.

I am trying to understand how this works with the UI. Looking at the server.py code of the most recent WhisperLive makes much more sense.

Running outside Docker

Any instructions for installing this without needing Docker?

I setup a new Python environment with these commands

python -m pip install --upgrade pip
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts wheel==0.41.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts faster-whisper==0.9.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts websockets==12.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts onnxruntime==1.16.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts ffmpeg-python==0.2.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts scipy==1.12.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts websocket-client==1.7.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts tiktoken==0.3.3
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts kaldialign==0.7.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts braceexpand==0.1.7
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts openai-whisper==20231117
python.exe -m pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts whisperspeech==0.6
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts soundfile==0.12.1
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.1.1+cu118 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

I change into the examples/chatbot/html directory and run python -m http.server
I can open the webpage and click the microphone, but when I speak it is not recognised and/or shown.

What else do I need to run outside Docker?

Mac OSX apple silicon framework

Hello,

Will this run on OSX Metal or CoreML or mlx frameworks or does it require the CUDA framework?

Thank you.

Readme.md is missing port forward to 8000 in sample command line

Sample command line is missing the port forward to expose the python http.server module..

Command should be:
docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -p 8000:8000 -it ghcr.io/collabora/whisperfusion:latest

What license is this code released under?

Hey, love this repo. Thank you so much for this work. Can you clarify what license your code has? I couldn't confirm if it is MIT or not. Putting a license.md will resolve that. Thanks!

Docker image re-build failure

Thanks for this wonderful tool. I updated CUDA to >12 and am on windows 10 with an RTX 3060, which means (I think), that I need to rebuild for sm_86 arch. What do I need to do here?

(whisper) C:\Users\ryzen\Documents\WoNdErLaNd\Personal\WhisperFusion\WhisperFusion\docker> docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion:latest

==========
== CUDA ==
==========

CUDA Version 12.2.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

done loading
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[01/30/2024-08:29:55] [TRT] [E] 6: The engine plan file is generated on an incompatible device, expecting compute 8.6 got compute 8.9, please rebuild.   👈
[01/30/2024-08:29:55] [TRT] [E] 2: [engine.cpp::deserializeEngine::1148] Error Code 2: Internal Error (Assertion engine->deserialize(start, size, allocator, runtime) failed. )
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/WhisperFusion/llm_service.py", line 195, in run
    self.initialize_model(
  File "/root/WhisperFusion/llm_service.py", line 109, in initialize_model
    self.runner = self.runner_cls.from_dir(**self.runner_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py", line 417, in from_dir
    session = session_cls(model_config,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 475, in __init__
    self.runtime = _Runtime(engine_buffer, mapping)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 153, in __init__
    self.__prepare(mapping, engine_buffer)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 174, in __prepare
    assert self.engine is not None
AssertionError
Exception ignored in: <function _Runtime.__del__ at 0x7ff3885975b0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 279, in __del__
    cudart.cudaFree(self.address)  # FIXME: cudaFree is None??
AttributeError: '_Runtime' object has no attribute 'address'

Here's the process and resulting logs...

ryzen@DESKTOP-O0HU0GU MINGW64 ~/Documents/WoNdErLaNd/Personal/WhisperFusion/WhisperFusion/docker (main)
$ ./build.sh 86-real  👈
[+] Building 21.5s (8/10)                                                                                                                                                                                                                           docker:default 
 => [internal] load .dockerignore                                                                                                                                                                                                                             0.0s 
 => => transferring context: 2B                                                                                                                                                                                                                               0.0s 
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                          0.0s 
 => => transferring dockerfile: 442B                                                                                                                                                                                                                          0.0s 
 => [internal] load metadata for nvcr.io/nvidia/cuda:12.2.2-devel-ubuntu22.04                                                                                                                                                                                 0.7s 
 => [1/6] FROM nvcr.io/nvidia/cuda:12.2.2-devel-ubuntu22.04@sha256:ae8a022c02aec945c4f8c52f65deaf535de7abb58e840350d19391ec683f4980                                                                                                                           0.0s 
 => [internal] load build context                                                                                                                                                                                                                             0.0s 
 => => transferring context: 75B                                                                                                                                                                                                                              0.0s 
 => CACHED [2/6] WORKDIR /root                                                                                                                                                                                                                                0.0s 
 => CACHED [3/6] COPY install-deps.sh /root                                                                                                                                                                                                                   0.0s 
 => ERROR [4/6] RUN bash install-deps.sh && rm install-deps.sh                                                                                                                                                                                               20.7s 
------
 > [4/6] RUN bash install-deps.sh && rm install-deps.sh:
0.248 install-deps.sh: line 2: $'\r': command not found
0.303 Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1581 B]
0.341 Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [665 kB]
0.364 Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
0.517 Get:4 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
0.657 Get:5 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [1398 kB]
1.090 Get:6 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [44.6 kB]
1.258 Get:7 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [1685 kB]
1.441 Get:8 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
1.716 Get:9 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1060 kB]
2.080 Get:10 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]
2.706 Get:11 http://archive.ubuntu.com/ubuntu jammy/restricted amd64 Packages [164 kB]
3.439 Get:12 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [17.5 MB]
5.299 Get:13 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages [1792 kB]
6.554 Get:14 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64 Packages [266 kB]
7.449 Get:15 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [1722 kB]
11.08 Get:16 http://archive.ubuntu.com/ubuntu jammy-updates/multiverse amd64 Packages [50.4 kB]
11.57 Get:17 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1677 kB]
12.81 Get:18 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1326 kB]
15.71 Get:19 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64 Packages [50.4 kB]
16.20 Get:20 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [28.1 kB]
16.36 Fetched 30.0 MB in 16s (1865 kB/s)
16.36 Reading package lists...
17.06 Reading package lists...
17.76 Building dependency tree...
17.93 Reading state information...
17.93 E: Unable to locate package git-lfs
17.93 install-deps.sh: line 4: git: command not found
17.93 install-deps.sh: line 5: cd: $'TensorRT-LLM\r': No such file or directory
17.94 install-deps.sh: line 6: git: command not found
17.94 install-deps.sh: line 7: git: command not found
17.94 install-deps.sh: line 8: git: command not found
17.94 install-deps.sh: line 9: git: command not found
17.94 install-deps.sh: line 10: $'\r': command not found
17.94 
17.94 WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
17.94 
18.70 
18.70 WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
18.70 
19.45
19.45 WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
19.45
20.23 (Stripping trailing CRs from patch; use --binary to disable.)
20.23 can't find file to patch at input line 5
20.23 Perhaps you used the wrong -p or --strip option?
20.23 The text leading up to this was:
20.23 --------------------------
20.23 |diff --git a/docker/common/install_tensorrt.sh b/docker/common/install_tensorrt.sh
20.23 |index 2dcb0a6..3a27e03 100644
20.23 |--- a/docker/common/install_tensorrt.sh
20.23 |+++ b/docker/common/install_tensorrt.sh
20.23 --------------------------
20.23 File to patch:
20.23 Skip this patch? [y]
20.23 Skipping patch.
20.23 patch: **** malformed patch at line 13: libnccl2/unknown,now 2.19.3-1+cuda12.2 amd64 [installed,upgradable to: 2.19.3-1+cuda12.3] ]]; then
20.23
20.23 install-deps.sh: line 38: $'\r': command not found
20.23 install-deps.sh: line 39: cd: $'docker/common/\r': No such file or directory
: No such file or directory
: No such file or directoryh
: No such file or directory 44: /etc/shinit_v2
: No such file or directorysh
20.23 install-deps.sh: line 47: pip3: command not found
: No such file or directoryt.sh
: No such file or directoryphy.sh
: No such file or directory 50: /etc/shinit_v2
20.24 install-deps.sh: line 51: $'\r': command not found
20.24 install-deps.sh: line 52: cd: $'/root/TensorRT-LLM/docker/common/\r': No such file or directory
: No such file or directorysh
: No such file or directory 54: /etc/shinit_v2
------
Dockerfile:11
--------------------
   9 |     COPY install-deps.sh /root
  10 |     ENV CUDA_ARCH=${CUDA_ARCH}
  11 | >>> RUN bash install-deps.sh && rm install-deps.sh
  12 |
  13 |     COPY install-trt-llm.sh /root
--------------------
ERROR: failed to solve: process "/bin/sh -c bash install-deps.sh && rm install-deps.sh" did not complete successfully: exit code: 1

ryzen@DESKTOP-O0HU0GU MINGW64 ~/Documents/WoNdErLaNd/Personal/WhisperFusion/WhisperFusion/docker (main)
$

FileNotFoundError: [Errno 2] No such file or directory: '/root/dolphin-2_6-phi-2/config.json' when running self build docker image

When running a newly self build docker image, I'm getting the error message:
FileNotFoundError: [Errno 2] No such file or directory: '/root/dolphin-2_6-phi-2/config.json'

Some more context:

s6-rc: info: service legacy-services successfully started
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/WhisperFusion/llm_service.py", line 205, in run
self.initialize_model(
File "/root/WhisperFusion/llm_service.py", line 98, in initialize_model
model_name = read_model_name(engine_dir)
File "/root/WhisperFusion/llm_service.py", line 23, in read_model_name
engine_version = tensorrt_llm.runtime.engine.get_engine_version(engine_dir)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/engine.py", line 81, in get_engine_version
with open(config_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/root/dolphin-2_6-phi-2/config.json'
/usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
[2024-02-09 13:49:45,605] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored

I had to rebuild for compute capability 8.9 with:
bash build.sh 89-real

I can't immediately see any obvious error messages in the build (also not guaranteed that I missed something)
I used the master branch at c90a694

Could you provide an image with compute capability 8.9? Or any ideas for the fix? Should I use another checkout to rebuild the docker image?

Version build for rtx3090 - not working

I wanted to test WhisperFusions on rtx3090.

Went throught build.sh and then:
docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion:latest
and I run http server as in description

after that got this:

==========
== CUDA ==
==========

CUDA Version 12.2.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO:root:[LLM INFO:] Loaded LLM TensorRT Engine.
/usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
[2024-02-05 11:51:46,152] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
INFO:websockets.server:connection open
downloading ONNX model...
loading session
loading onnx model
reset states
INFO:root:[Whisper INFO:] New client connected

[2024-02-05 11:52:27,316] [1/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
[2024-02-05 11:52:27,324] [1/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
[2024-02-05 11:52:30,266] [1/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
[2024-02-05 11:52:30,278] [1/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
 |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.00% [152/152 00:00<00:00]

rvsh@bob:/opt/WhisperFusion/examples/chatbot/html$ python3 -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
127.0.0.1 - - [05/Feb/2024 12:55:49] "GET / HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:56:10] "GET / HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:56:23] "GET / HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:56:23] "GET / HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:56:23] "GET / HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:56:27] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:56:27] "GET /css/style.css HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:56:27] "GET /css/all.min.css HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:56:27] "GET /img/Collabora_Logo.svg HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:56:27] "GET /img/microphone-white.png HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:56:27] "GET /img/stop.png HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:56:27] "GET /img/record.png HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:56:27] "GET /js/main.js HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:56:27] code 404, message File not found
127.0.0.1 - - [05/Feb/2024 12:56:27] "GET /favicon.ico HTTP/1.1" 404 -
127.0.0.1 - - [05/Feb/2024 12:56:33] "GET /img/microphone-hover.png HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:56:33] "GET /img/microphone.png HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:57:05] "GET /css/all.min.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:57:05] "GET /css/style.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:57:05] "GET /js/main.js HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:57:05] "GET /img/microphone-white.png HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:57:05] "GET /img/Collabora_Logo.svg HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:57:05] "GET /img/stop.png HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:57:05] "GET /img/record.png HTTP/1.1" 304 -
127.0.0.1 - - [05/Feb/2024 12:57:07] "GET /img/microphone-hover.png HTTP/1.1" 200 -
127.0.0.1 - - [05/Feb/2024 12:57:07] "GET /img/microphone.png HTTP/1.1" 200 -

on the 127.0.0.1:8000 there is server visible but clicking microphone just run the timer and nothing else is going on.

Where is the problem?

Other languages and Whisper models

Hi and thanks for sharing this awesome project! 🤩

Currently it seems that only english is supported/configured but we would also like to try other languages (e.g. german) as well.

So we started with Whisper. We briefly tried using the Whisper small model instead of small.en by simply patching build-whisper.sh and rebuilding the Docker container but that doesn't seem to be the only place we have to touch here as we only get this when running the container:

INFO:root:[Whisper INFO:] New client connected

INFO:root:[Whisper INFO]: . br,pt whe int Mus............................................, eos: True
INFO:root:[Whisper INFO]: Average inference time 0.37747994336214935


INFO:root:[Whisper INFO]: .. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br. br., eos: True
INFO:root:[Whisper INFO]: Average inference time 0.31598156690597534

Before we dig deeper into the project (we just found it today), we thought we'd quickly ask if you might have any tips/recommendations for us or are already working on similar ideas.

Thanks again!

Race condition in text_to_speech func

I believe there might be a race condition in the following section of the code:

# clip audio if the current chunk exceeds 30 seconds, this basically implies that
# no valid segment for the last 30 seconds from whisper
if self.frames_np[int((self.timestamp_offset - self.frames_offset)*self.RATE):].shape[0] > 25 * self.RATE:
    duration = self.frames_np.shape[0] / self.RATE
    self.timestamp_offset = self.frames_offset + duration - 5

samples_take = max(0, (self.timestamp_offset - self.frames_offset)*self.RATE)
input_bytes = self.frames_np[int(samples_take):].copy()
duration = input_bytes.shape[0] / self.RATE
if duration < 0.4:
    # If the audio duration is short, release the lock and wait
    self.lock.release()
    time.sleep(0.01)    # 5ms sleep to wait for some voice active audio to arrive
    continue

You are accessing a shared resource between threads without proper synchronization in this section. Could you please review this code?

Docker: unauthorized

After running sudo docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion:latest I got:

Unable to find image 'ghcr.io/collabora/whisperfusion:latest' locally
docker: Error response from daemon: Head "https://ghcr.io/v2/collabora/whisperfusion/manifests/latest": unauthorized.
See 'docker run --help'.

Indentation Bug in `trt_server.py`

In the file trt_server.py I suspect that the highlighted lines need to be in the same indentation level as the while loop. Otherwise, in its current form it makes no sense to me. Just shining some light on this.

Does not work on paperspace Quadro 5000 RTX x 2

I have a setup of Quadro RTX 5000 x2 (16x2) GB GPU RAM, everything is up but no response is getting produced.

Build container using:

sudo CUDA_ARCH=89-real docker compose build

The logs are:

nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:32:24 +0000] "GET /css/all.min.css HTTP/1.1" 304 0 "http://184.105.215.27:8000/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" "-"
nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:32:24 +0000] "GET /img/microphone-white.png HTTP/1.1" 304 0 "http://184.105.215.27:8000/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" "-"
nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:32:24 +0000] "GET /js/main.js HTTP/1.1" 304 0 "http://184.105.215.27:8000/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" "-"
nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:32:24 +0000] "GET /img/record.png HTTP/1.1" 304 0 "http://184.105.215.27:8000/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" "-"
whisperfusion-1  | INFO:websockets.server:connection open
whisperfusion-1  | INFO:websockets.server:connection open
whisperfusion-1  | INFO:root:[Whisper INFO:] New client connected
nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:32:30 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Mobile Safari/537.36 Edg/122.0.0.0" "-"
nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:32:30 +0000] "GET /transcription HTTP/1.1" 101 76 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" "-"
whisperfusion-1  | ERROR:root:received 1001 (going away); then sent 1001 (going away)
whisperfusion-1  | INFO:root:Cleaning up.
whisperfusion-1  | INFO:root:[Whisper INFO:] Connection Closed.
nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:32:30 +0000] "GET /audio HTTP/1.1" 101 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" "-"
whisperfusion-1  | INFO:root:[Whisper INFO:] Exiting speech to text thread
whisperfusion-1  | INFO:websockets.server:connection open
whisperfusion-1  | INFO:websockets.server:connection open
whisperfusion-1  | INFO:root:[Whisper INFO:] New client connected
nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:32:43 +0000] "GET /img/microphone-hover.png HTTP/1.1" 304 0 "http://184.105.215.27:8000/" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Mobile Safari/537.36 Edg/122.0.0.0" "-"
nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:33:07 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Mobile Safari/537.36 Edg/122.0.0.0" "-"
nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:33:07 +0000] "GET /transcription HTTP/1.1" 101 76 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Mobile Safari/537.36 Edg/122.0.0.0" "-"
nginx-1          | 51.36.220.128 - - [05/Mar/2024:16:33:07 +0000] "GET /audio HTTP/1.1" 101 0 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Mobile Safari/537.36 Edg/122.0.0.0" "-"
whisperfusion-1  | ERROR:root:received 1001 (going away); then sent 1001 (going away)
whisper

What is wrong?

Why the preference for Dolphin Phi over regular Phi, Mistral or Llama2?

Just curious to understand your decision process on this.

libth_common.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev with Latest docker image

I tried to run the latest (as of today) docker image:

docker run --gpus all --shm-size 64G -p 8001:80 ghcr.io/collabora/whisperfusion:latest

Im getting the error OSError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev. See below for details.

I'm using the following Version:

docker pull ghcr.io/collabora/whisperfusion:latest
latest: Pulling from collabora/whisperfusion
Digest: sha256:dc6029a768c15a7588008f415840eea5939fae7b7d079496b5f96242ae83ea48
Status: Image is up to date for ghcr.io/collabora/whisperfusion:latest
ghcr.io/collabora/whisperfusion:latest

s6-rc: info: service legacy-services successfully started
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 58, in _init
    torch.classes.load_library(ft_decoder_lib)
  File "/usr/local/lib/python3.10/dist-packages/torch/_classes.py", line 51, in load_library
    torch.ops.load_library(path)
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/WhisperFusion/main.py", line 11, in <module>
    from whisper_live.trt_server import TranscriptionServer
  File "/root/WhisperFusion/whisper_live/trt_server.py", line 17, in <module>
    from whisper_live.trt_transcriber import WhisperTRTLLM
  File "/root/WhisperFusion/whisper_live/trt_transcriber.py", line 16, in <module>
    import tensorrt_llm
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 64, in <module>
    _init(log_level="error")
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 61, in _init
    raise ImportError(str(e) + msg)
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev
FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.

Cannot capture to audio capture endpoint (113: No route to host)

Greetings.

I followed the exact instructions for the 3090 image.

However, I am getting these problems when I open the browser and try to start audio capturing:

nginx-1          | 172.18.0.1 - - [04/Mar/2024:15:04:19 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36" "-"
nginx-1          | 2024/03/04 15:04:22 [error] 9#9: *2 connect() failed (113: No route to host) while connecting to upstream, client: 172.18.0.1, server: _, request: "GET /audio HTTP/1.1", upstream: "http://172.18.0.2:8888/audio", host: "localhost:8000"
nginx-1          | 2024/03/04 15:04:22 [error] 10#10: *4 connect() failed (113: No route to host) while connecting to upstream, client: 172.18.0.1, server: _, request: "GET /transcription HTTP/1.1", upstream: "http://172.18.0.2:6006/transcription", host: "localhost:8000"
nginx-1          | 172.18.0.1 - - [04/Mar/2024:15:04:22 +0000] "GET /audio HTTP/1.1" 502 559 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36" "-"
nginx-1          | 172.18.0.1 - - [04/Mar/2024:15:04:22 +0000] "GET /transcription HTTP/1.1" 502 559 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36" "-"

Can anyone help?