gadersd / whisper-burn Goto Github PK

View Code? Open in Web Editor NEW

239.0 239.0 27.0 403 KB

A Rust implementation of OpenAI's Whisper model using the burn framework

License: MIT License

Rust 88.19% Python 11.81%

whisper-burn's People

Contributors

Stargazers

Watchers

Forkers

shabbirhasan1 sunny-g isgasho joseluis oyelowo amalic codingonion wavey-ai quackdoc practicalrs tiero davidgortega mayanksinha900 npatsakula edeetee dantheman3333 matthewcroughan theabrusch

whisper-burn's Issues

bug: transcribing with medium model

OS: Mac Ventura

Seems like with the tiny model, transcription works, but when using the medium you get a buffer size error. Perhaps we could do chunking

     Running `target/release/whisper audio.wav medium`
thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_bind_group
    Buffer binding 0 range 212439040 exceeds `max_*_buffer_binding_size` limit 134217728

', /Users/botch/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.17.0/src/backend/direct.rs:3056:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::ops::function::Fn::call
   3: <wgpu::backend::direct::Context as wgpu::context::Context>::device_create_bind_group
   4: <T as wgpu::context::DynContext>::device_create_bind_group
   5: wgpu::Device::create_bind_group
   6: burn_wgpu::context::base::Context::execute
   7: burn_wgpu::kernel::index::select::select
   8: burn_tensor::tensor::ops::modules::base::ModuleOps::embedding
   9: whisper::model::Whisper<B>::forward_decoder
  10: whisper::main

Update

Using a six-minute audio file with the tiny model produces the same issue.

No output with 8 second clip?

This project fails to provide any output on a different test file (the test file works with whisper, sounds normal when I listen to it, and was created from an m4a via ffmpeg according to the whisper instructions):

$ ffmpeg -i ../whisper/20220922\ 084923.m4a -ar 16000 -ac 1 -c:a pcm_s16le output.wav
$ cargo run --release output.wav tiny_en
    Running `target/release/whisper output.wav tiny_en`
<|notimestamps|>
Transcribed text: <|notimestamps|>

GitHub refuses to allow me to upload a wav file (even base64 encoded as .txt). Not sure what the best way to share is.

FR: Mention need for setting runtime paths for libtorch

I ran into some errors due to missing a system install of libtorch at the expected path. I was able to trace these errors back to: https://github.com/LaurentMazare/tch-rs#libtorch-manual-install and the need to set some environment variables like LIBTORCH or LIBTORCH_USE_PYTORCH.

I'm trying to get this working with nix (on aarch64-darwin) but not having luck so far.

Can not choose second gpu using wgpu

Running on windows I have tried to set my intel gpu using let-env WGPU_ADAPTER_NAME = 'intel' via nushell to no succsess.

I also tried changing device choose to let device = WgpuDevice::DiscreteGpu(0); and it too did not work.

in the end I had to set type Backend = WgpuBackend<burn_wgpu::Vulkan, f32, i32>; and use VK_ICD_FILENAMES = '\windows\system32\DriverStore\FileRepository\iigd_dch_d.inf_amd64_50b98d237e0753a8\igvk64.json' to use the intel gpu (for anyone stumbling upon this the link may be different depending on gpu, so you will need to manually find igvk64.json)

not sure if this is an issue with whisper-burn or the wgpu backend for burn, I think it's a wgpu-burn issue but thought it would be safer to report issue here first

Panic while transcribing audio.wav

thread 'main' panicked at 'slice index starts at 172409 but ends at 168511', 
/tmp/whisper-burn/src/transcribe.rs:101:22

whisper-burn/src/transcribe.rs

Line 95 in 3757c15

let iter_len = (waveform.len() - n_samples_per_tensor) / shift + 1;

Here waveform.len() could be less than n_samples_per_tensor, which results in iter_len to be extremely large:

[src/transcribe.rs:97] n_samples_per_tensor = 238559
[src/transcribe.rs:97] waveform.len() = 168511
[src/transcribe.rs:97] waveform.len() - n_samples_per_tensor = 18446744073709481568

Replacing subtraction with saturating_sub fixes the issue.

Tinygrad import path changed in model conversion script

The latest tinygrad 0.7.0 moved state to under tinygrad.nn.state, which broke the conversion script. Updating the import path fixed the problem.

Does using wgpu backend requires Torch to be installed?

I am following the example in README with my Mac M1, I reach the point to run it, I choose to use wgpu backend and it panics. Before your refactor you did some days ago, I was able to run this project locally on my mac (no torch installed whatsover)

cargo run --release --features wgpu-backend --bin transcribe tiny_en audio16k.wav transcription.txt
warning: unused imports: `Bool`, `Int`, `activation::relu`
 --> src/model/load.rs:6:9
  |
6 |         activation::relu,
  |         ^^^^^^^^^^^^^^^^
7 |         Tensor,
8 |         Int,
  |         ^^^
9 |         Bool,
  |         ^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused import: `Conv1dRecord`
 --> src/model/mod.rs:8:45
  |
8 |     nn::{self, conv::{Conv1d, Conv1dConfig, Conv1dRecord}, PaddingConfig1d},
  |                                             ^^^^^^^^^^^^

warning: unused import: `Tokenizer`
 --> src/token.rs:4:18
  |
4 | use tokenizers::{Tokenizer, AddedToken};
  |                  ^^^^^^^^^

warning: unused import: `crate::helper::*`
 --> src/transcribe.rs:2:5
  |
2 | use crate::helper::*;
  |     ^^^^^^^^^^^^^^^^

warning: unused imports: `Float`, `Int`, `config::Config`, `self`
  --> src/transcribe.rs:9:5
   |
9  |     config::Config,
   |     ^^^^^^^^^^^^^^
...
13 |         backend::{self, Backend},
   |                   ^^^^
...
16 |         Int,
   |         ^^^
17 |         Float,
   |         ^^^^^

warning: unused variable: `n_batch`
   --> src/model/mod.rs:122:14
    |
122 |         let [n_batch, seq_len] = x.dims();
    |              ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_n_batch`
    |
    = note: `#[warn(unused_variables)]` on by default

warning: unused variable: `new_text`
  --> src/transcribe.rs:38:14
   |
38 |         let (new_text, new_tokens) = mels_to_text(whisper, bpe, mel, &prev_normal_tokens[..], padding)?;
   |              ^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_new_text`

warning: unused variable: `n_channel`
   --> src/transcribe.rs:119:10
    |
119 |     let [n_channel, n_mel, n_ctx] = mels.dims();
    |          ^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_n_channel`

warning: unused variable: `start_of_prev_token`
   --> src/transcribe.rs:130:9
    |
130 |     let start_of_prev_token = bpe.special_token(SpecialToken::StartofPrev).unwrap();
    |         ^^^^^^^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_start_of_prev_token`

warning: unused variable: `n_batch`
   --> src/transcribe.rs:158:14
    |
158 |         let [n_batch, n_token, n_dict] = out.dims();
    |              ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_n_batch`

warning: unused variable: `n_dict`
   --> src/transcribe.rs:158:32
    |
158 |         let [n_batch, n_token, n_dict] = out.dims();
    |                                ^^^^^^ help: if this is intentional, prefix it with an underscore: `_n_dict`

warning: unused variable: `prev_normal_tokens`
   --> src/transcribe.rs:113:92
    |
113 | ...Tokenizer, mels: Tensor<B, 3>, prev_normal_tokens: &[usize], padding: usize) -> token::Result<(St...
    |                                   ^^^^^^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_prev_normal_tokens`

warning: function `get_mel_filters` is never used
  --> src/audio.rs:66:4
   |
66 | fn get_mel_filters<B: Backend>(sample_rate: f64, n_fft: usize, n_mels: usize, htk: bool) -> Tensor<B,...
   |    ^^^^^^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: function `fft_frequencies` is never used
   --> src/audio.rs:128:4
    |
128 | fn fft_frequencies<B: Backend>(sample_rate: f64, n_fft: usize) -> Tensor<B, 1> {
    |    ^^^^^^^^^^^^^^^

warning: function `test_fft_frequencies` is never used
   --> src/audio.rs:137:4
    |
137 | fn test_fft_frequencies<B: Backend>() {
    |    ^^^^^^^^^^^^^^^^^^^^

warning: function `test_mel_frequencies` is never used
   --> src/audio.rs:144:4
    |
144 | fn test_mel_frequencies<B: Backend>(htk: bool) {
    |    ^^^^^^^^^^^^^^^^^^^^

warning: function `mel_frequencies` is never used
   --> src/audio.rs:152:4
    |
152 | fn mel_frequencies<B: Backend>(n_mels: usize, fmin: f64, fmax: f64, htk: bool) -> Tensor<B, 1> {
    |    ^^^^^^^^^^^^^^^

warning: `whisper` (lib) generated 17 warnings (run `cargo fix --lib -p whisper` to apply 12 suggestions)
warning: unused import: `std::collections::HashMap`
 --> src/bin/transcribe/main.rs:1:5
  |
1 | use std::collections::HashMap;
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused import: `std::iter`
 --> src/bin/transcribe/main.rs:2:5
  |
2 | use std::iter;
  |     ^^^^^^^^^

warning: unused import: `whisper::helper::*`
 --> src/bin/transcribe/main.rs:5:5
  |
5 | use whisper::helper::*;
  |     ^^^^^^^^^^^^^^^^^^

warning: unused import: `whisper::token`
 --> src/bin/transcribe/main.rs:6:5
  |
6 | use whisper::token;
  |     ^^^^^^^^^^^^^^

warning: unused imports: `Data`, `Float`, `Int`, `Tensor`, `self`, `self`
  --> src/bin/transcribe/main.rs:21:9
   |
21 |         self,
   |         ^^^^
22 |         backend::{self, Backend},
   |                   ^^^^
23 |         Data,
   |         ^^^^
24 |         Tensor,
   |         ^^^^^^
25 |         Int,
   |         ^^^
26 |         Float,
   |         ^^^^^

warning: unused import: `num_traits::ToPrimitive`
  --> src/bin/transcribe/main.rs:60:5
   |
60 | use num_traits::ToPrimitive;
   |     ^^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: `whisper::audio::prep_audio`
  --> src/bin/transcribe/main.rs:61:5
   |
61 | use whisper::audio::prep_audio;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: `SpecialToken`
  --> src/bin/transcribe/main.rs:62:37
   |
62 | use whisper::token::{Gpt2Tokenizer, SpecialToken};
   |                                     ^^^^^^^^^^^^

warning: unused variable: `duration`
  --> src/bin/transcribe/main.rs:36:9
   |
36 |     let duration = reader.duration() as usize;
   |         ^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_duration`
   |
   = note: `#[warn(unused_variables)]` on by default

warning: unused variable: `bits_per_sample`
  --> src/bin/transcribe/main.rs:39:9
   |
39 |     let bits_per_sample = spec.bits_per_sample;
   |         ^^^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_bits_per_sample`

warning: variable does not need to be mutable
  --> src/bin/transcribe/main.rs:33:9
   |
33 |     let mut reader = hound::WavReader::open(filename)?;
   |         ----^^^^^^
   |         |
   |         help: remove this `mut`
   |
   = note: `#[warn(unused_mut)]` on by default

warning: unused variable: `tokens`
   --> src/bin/transcribe/main.rs:132:16
    |
132 |     let (text, tokens) = match waveform_to_text(&whisper, &bpe, waveform, sample_rate) {
    |                ^^^^^^ help: if this is intentional, prefix it with an underscore: `_tokens`

warning: `whisper` (bin "transcribe") generated 12 warnings (run `cargo fix --bin "transcribe"` to apply 12 suggestions)
    Finished release [optimized] target(s) in 0.28s
warning: the following packages contain code that will be rejected by a future version of Rust: nom v1.2.4, nom v3.2.1
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 2`
     Running `target/release/transcribe tiny_en audio16k.wav transcription.txt`
Loading waveform...
Loading model...
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Torch("Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty_strided' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].\n\nCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31034 [kernel]\nMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp:22748 [kernel]\nMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:26824 [kernel]\nQuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:929 [kernel]\nBackendSelect: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:726 [kernel]\nPython: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]\nFuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback]\nFunctionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:280 [backend fallback]\nNamed: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]\nConjugate: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:21 [kernel]\nNegative: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:23 [kernel]\nZeroTensor: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]\nADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:63 [backend fallback]\nAutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nAutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17488 [autograd kernel]\nTracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:16726 [kernel]\nAutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:487 [backend fallback]\nAutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:354 [backend fallback]\nFuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:815 [backend fallback]\nFuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]\nBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback]\nVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]\nFuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback]\nPythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback]\nFuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback]\nPythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]\n\nException raised from reportError at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:548 (most recent call first):\nframe #0: c10::impl::OperatorEntry::reportError(c10::DispatchKey) const + 588 (0x10bef7020 in libtorch_cpu.dylib)\nframe #1: c10::impl::OperatorEntry::lookup(c10::DispatchKeySet) const + 164 (0x10bbbfd4c in libtorch_cpu.dylib)\nframe #2: at::Tensor c10::Dispatcher::redispatch<at::Tensor, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> >(c10::TypedOperatorHandle<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)> const&, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) const + 88 (0x10cd1a6d4 in libtorch_cpu.dylib)\nframe #3: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 172 (0x10cc3af58 in libtorch_cpu.dylib)\nframe #4: at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 300 (0x10cc3aaec in libtorch_cpu.dylib)\nframe #5: at::native::_to_copy(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 2624 (0x10c3c5244 in libtorch_cpu.dylib)\nframe #6: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 188 (0x10c8f9f70 in libtorch_cpu.dylib)\nframe #7: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 188 (0x10c8f9f70 in libtorch_cpu.dylib)\nframe #8: c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>), &(torch::autograd::VariableType::(anonymous namespace)::_to_copy(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>))>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat> > >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 1104 (0x10e673864 in libtorch_cpu.dylib)\nframe #9: at::_ops::_to_copy::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 340 (0x10c8f9c30 in libtorch_cpu.dylib)\nframe #10: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 348 (0x10cacdc30 in libtorch_cpu.dylib)\nframe #11: atg_to + 104 (0x101004cdc in transcribe)\nframe #12: tch::wrappers::tensor_generated::_$LT$impl$u20$tch..wrappers..tensor..Tensor$GT$::to::hfeb248388ea0a533 + 88 (0x100ff6ba8 in transcribe)\nframe #13: burn_tch::ops::tensor::_$LT$impl$u20$burn_tensor..tensor..ops..tensor..TensorOps$LT$burn_tch..backend..TchBackend$LT$E$GT$$GT$$u20$for$u20$burn_tch..backend..TchBackend$LT$E$GT$$GT$::to_device::h39a91cf7f00bf6ef + 64 (0x100fc4108 in transcribe)\nframe #14: _$LT$burn_core..nn..conv..conv1d..Conv1d$LT$B$GT$$u20$as$u20$burn_core..module..base..Module$LT$B$GT$$GT$::map::h58a9338e15b1b63e + 64 (0x100fbc49c in transcribe)\nframe #15: _$LT$whisper..model..Whisper$LT$B$GT$$u20$as$u20$burn_core..module..base..Module$LT$B$GT$$GT$::map::h8a2e896e5556644c + 104 (0x100fe515c in transcribe)\nframe #16: transcribe::main::hfd8ee1d1f4f6714c + 1044 (0x100fc64d0 in transcribe)\nframe #17: std::sys_common::backtrace::__rust_begin_short_backtrace::h847fc7e56d1202dc + 12 (0x100fec7b8 in transcribe)\nframe #18: std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h9b3e7ad23a57bf45 + 16 (0x100fec7d0 in transcribe)\nframe #19: std::rt::lang_start_internal::hdd06e3566639fc5b + 648 (0x1013301d4 in transcribe)\nframe #20: main + 52 (0x100fc6b20 in transcribe)\nframe #21: start + 520 (0x1019dd08c in dyld)\n")', /Users/tiero/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tch-0.13.0/src/wrappers/tensor_generated.rs:17243:27
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

FR: sorta. Consider splitting out speech preprocessing into a separate crate

Hi. Would you consider splitting out the functionality for computing the Mel spectrogram(inside audio.rs) into a separate crate?

This would be useful for other speech-centric models. There are a few libraries for this with ndarray, such as mel-spec and mfcc-rust (contributor), but this is the first implementation I've seen for burn.

Python issues

requirements.txt is importing python's libs
extra.utils dep is broken and is not either used

~~Note: PR is comming~~

run by example encounted failure

System: Win10
Rust version: 1.71.0 stable

I followed the instructions in the README file, after cargo run, I encounted this failure:

Failed to load tokenizer: Model "gpt2" on the Hub doesn't have a tokenizer
error: process didn't exit successfully: `target\release\whisper.exe audio.wav tiny` (exit code: 1)

It seems a dependency named tokenizer can't be initialized.
How can I fix this error?

Muffled Text - Response for Custom Audio

Hi there, first of all, awesome library and thank you for making it.

When I record a short clear .wav file saying, "this is a test, this is a test" (link below), the waveform_to_text function does not successfully decode it. In essence, it usually shows something like "muffled" as the decoded text. I have used the medium sized model too. However, it does work for the audio.wav model provided in the repo example.

Have spent much time trying to analyze why this is failing, including performing analysis on the meta-data of both files, and also recording new audio files from difference sources (just in case this was to do with my own machine).

Do you have any experience or knowledge on the exact requirements for the .wav file in order for it to be successfully extracted using the library?

Here is the audio file which is failing: https://drive.google.com/file/d/1aaWL-mBrRaGtFvL_Re8r4WVtsMP3BmAI/view?usp=sharing

Again, this works great with the example audio file provided in the docs, but just not with any new custom file I record.

Garbage output on multi channel audio and audio above 24khz

Seems like audio decode is picky on what gets input to it

Audio mediainfo

General
Complete name                            : C:\Users\Quack\code\whisper-burn\slap.wav
Format                                   : Wave
File size                                : 788 KiB
Duration                                 : 4 s 203 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 1 536 kb/s
Writing application                      : Lavf58.29.100

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 4 s 203 ms
Bit rate mode                            : Constant
Bit rate                                 : 1 536 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 48.0 kHz
Bit depth                                : 16 bits
Stream size                              : 788 KiB (100%)

Audio file: https://cdn.discordapp.com/attachments/615105639567589376/1141946730485665893/slap.wav

target\release\whisper.exe .\slap.wav small_en        08/18/2023 12:07:51 AMLoading waveform...
Loading model...
Chunk 0:  (screaming)

Chunk 1:  (screeching)

Transcribed text:  (screeching)

whisper-ctranslate2:

whisper-ctranslate2.exe slap.wav --model tiny.en      08/18/2023 12:10:23 AM
Detected language 'English' with probability 1.000000
[00:00.000 --> 00:04.000]  Also, it's not always useful.
Transcription results written to 'C:\Users\Quack\code\whisper-burn' directory

EDIT: transcoding the audio file using ffmpeg -i .\slap.wav -ar SAMPLE_RATE -ac 1 slap-edit.wav seems to make it work, It needs to be both single channel as well as 41khz or less.

at 41khz the audio output was

Chunk 0:  Oh, son, it's not all you are.

Transcribed text:  Oh, son, it's not all you are.

at 24khz and below it is

Chunk 0:  also it's not always useful.

Transcribed text:  also it's not always useful

Trying the example I got an empty transcription

whisper-burn % cargo run --release --bin transcribe tiny_en audio16k.wav en transcription.txt
warning: unused imports: `Bool`, `Float`, `Int`
 --> src/helper.rs:2:51
  |
2 |     activation::relu, backend::Backend, BasicOps, Bool, Element, Float, Int, Numeric, Tensor,
  |                                                   ^^^^           ^^^^^  ^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused imports: `Bool`, `Int`, `activation::relu`
 --> src/model/load.rs:8:14
  |
8 |     tensor::{activation::relu, backend::Backend, Bool, Int, Tensor},
  |              ^^^^^^^^^^^^^^^^                    ^^^^  ^^^

warning: unused import: `Conv1dRecord`
  --> src/model/mod.rs:10:38
   |
10 |         conv::{Conv1d, Conv1dConfig, Conv1dRecord},
   |                                      ^^^^^^^^^^^^

warning: unused import: `Tokenizer`
 --> src/token.rs:4:30
  |
4 | use tokenizers::{AddedToken, Tokenizer};
  |                              ^^^^^^^^^

warning: unused import: `crate::helper::*`
 --> src/transcribe.rs:2:5
  |
2 | use crate::helper::*;
  |     ^^^^^^^^^^^^^^^^

warning: unused import: `num_traits::ToPrimitive`
 --> src/transcribe.rs:7:5
  |
7 | use num_traits::ToPrimitive;
  |     ^^^^^^^^^^^^^^^^^^^^^^^

warning: unused imports: `Float`, `Int`, `config::Config`, `self`
  --> src/transcribe.rs:12:5
   |
12 |     config::Config,
   |     ^^^^^^^^^^^^^^
...
16 |         backend::{self, Backend},
   |                   ^^^^
17 |         Data, Float, Int, Tensor,
   |               ^^^^^  ^^^

warning: unused import: `std::cmp::Ordering`
 --> src/beam.rs:1:5
  |
1 | use std::cmp::Ordering;
  |     ^^^^^^^^^^^^^^^^^^

warning: unused variable: `n_batch`
   --> src/model/mod.rs:132:14
    |
132 |         let [n_batch, seq_len] = x.dims();
    |              ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_n_batch`
    |
    = note: `#[warn(unused_variables)]` on by default

warning: variable does not need to be mutable
  --> src/token.rs:15:13
   |
15 |         let mut tokenizer = tokenizers::Tokenizer::from_file("tokenizer.json")?;
   |             ----^^^^^^^^^
   |             |
   |             help: remove this `mut`
   |
   = note: `#[warn(unused_mut)]` on by default

warning: unused variable: `new_text`
  --> src/transcribe.rs:53:14
   |
53 |         let (new_text, new_tokens) =
   |              ^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_new_text`

warning: unused variable: `n_ctx_max_decoder`
   --> src/transcribe.rs:159:9
    |
159 |     let n_ctx_max_decoder = whisper.decoder_ctx_size();
    |         ^^^^^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_n_ctx_max_decoder`

warning: unused variable: `n_channel`
   --> src/transcribe.rs:161:10
    |
161 |     let [n_channel, n_mel, n_ctx] = mels.dims();
    |          ^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_n_channel`

warning: unused variable: `first_timestamp_token`
   --> src/transcribe.rs:183:9
    |
183 |     let first_timestamp_token = bpe.special_token(SpecialToken::Timestamp(0.0)).unwrap();
    |         ^^^^^^^^^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_first_timestamp_token`

warning: unused variable: `initial_tokens`
   --> src/transcribe.rs:195:13
    |
195 |     let mut initial_tokens = if prev_nonspecial_tokens.len() > 0 {
    |             ^^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_initial_tokens`

warning: unused variable: `n_batch`
   --> src/transcribe.rs:263:14
    |
263 |         let [n_batch, n_token, n_dict] = log_probs.dims();
    |              ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_n_batch`

warning: unused variable: `n_token`
   --> src/transcribe.rs:263:23
    |
263 |         let [n_batch, n_token, n_dict] = log_probs.dims();
    |                       ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_n_token`

warning: unused variable: `n_dict`
   --> src/transcribe.rs:263:32
    |
263 |         let [n_batch, n_token, n_dict] = log_probs.dims();
    |                                ^^^^^^ help: if this is intentional, prefix it with an underscore: `_n_dict`

warning: variable does not need to be mutable
   --> src/transcribe.rs:195:9
    |
195 |     let mut initial_tokens = if prev_nonspecial_tokens.len() > 0 {
    |         ----^^^^^^^^^^^^^^
    |         |
    |         help: remove this `mut`

warning: unused variable: `end_node`
  --> src/beam.rs:74:17
   |
74 |             let end_node = continuations[end_node_index].clone();
   |                 ^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_end_node`

warning: unused variable: `tok1`
  --> src/beam.rs:77:39
   |
77 | ..._unstable_by(|(tok1, log_prob1), (tok2, log_prob2)| log_prob1.partial_cmp(&log_prob2).unwrap());
   |                   ^^^^ help: if this is intentional, prefix it with an underscore: `_tok1`

warning: unused variable: `tok2`
  --> src/beam.rs:77:58
   |
77 | ..., log_prob1), (tok2, log_prob2)| log_prob1.partial_cmp(&log_prob2).unwrap());
   |                   ^^^^ help: if this is intentional, prefix it with an underscore: `_tok2`

warning: function `get_mel_filters` is never used
  --> src/audio.rs:58:4
   |
58 | fn get_mel_filters<B: Backend>(
   |    ^^^^^^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: function `fft_frequencies` is never used
   --> src/audio.rs:145:4
    |
145 | fn fft_frequencies<B: Backend>(sample_rate: f64, n_fft: usize) -> Tensor<B, 1> {
    |    ^^^^^^^^^^^^^^^

warning: function `test_fft_frequencies` is never used
   --> src/audio.rs:159:4
    |
159 | fn test_fft_frequencies<B: Backend>() {
    |    ^^^^^^^^^^^^^^^^^^^^

warning: function `test_mel_frequencies` is never used
   --> src/audio.rs:166:4
    |
166 | fn test_mel_frequencies<B: Backend>(htk: bool) {
    |    ^^^^^^^^^^^^^^^^^^^^

warning: function `mel_frequencies` is never used
   --> src/audio.rs:174:4
    |
174 | fn mel_frequencies<B: Backend>(n_mels: usize, fmin: f64, fmax: f64, htk: bool) -> Tensor<B, 1> {
    |    ^^^^^^^^^^^^^^^

warning: function `construct_special_tokens` is never used
   --> src/token.rs:297:4
    |
297 | fn construct_special_tokens() -> Vec<AddedToken> {
    |    ^^^^^^^^^^^^^^^^^^^^^^^^

warning: field `log_prob` is never read
   --> src/transcribe.rs:145:5
    |
143 | struct BeamSearchToken {
    |        --------------- field in this struct
144 |     token: usize, 
145 |     log_prob: f64, 
    |     ^^^^^^^^
    |
    = note: `BeamSearchToken` has a derived impl for the trait `Clone`, but this is intentionally ignored during dead code analysis

warning: function `first_repetition_end` is never used
   --> src/transcribe.rs:370:4
    |
370 | fn first_repetition_end(tokens: &[usize], period: usize) -> usize {
    |    ^^^^^^^^^^^^^^^^^^^^

warning: function `repetition_period` is never used
   --> src/transcribe.rs:380:4
    |
380 | fn repetition_period(
    |    ^^^^^^^^^^^^^^^^^

warning: function `find_repeated_tokens_index` is never used
   --> src/transcribe.rs:404:4
    |
404 | fn find_repeated_tokens_index(
    |    ^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: `whisper` (lib) generated 32 warnings (run `cargo fix --lib -p whisper` to apply 22 suggestions)
   Compiling whisper v0.1.0 (/Users/davidgortega/Documents/projects/kunzite/whisper-burn)
warning: unused import: `std::collections::HashMap`
 --> src/bin/transcribe/main.rs:1:5
  |
1 | use std::collections::HashMap;
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused import: `std::iter`
 --> src/bin/transcribe/main.rs:2:5
  |
2 | use std::iter;
  |     ^^^^^^^^^

warning: unused import: `whisper::helper::*`
 --> src/bin/transcribe/main.rs:4:5
  |
4 | use whisper::helper::*;
  |     ^^^^^^^^^^^^^^^^^^

warning: unused import: `token`
 --> src/bin/transcribe/main.rs:6:15
  |
6 | use whisper::{token, token::Language};
  |               ^^^^^

warning: unused imports: `Data`, `Float`, `Int`, `Tensor`, `self`, `self`
  --> src/bin/transcribe/main.rs:23:9
   |
23 |         self,
   |         ^^^^
24 |         backend::{self, Backend},
   |                   ^^^^
25 |         Data, Float, Int, Tensor,
   |         ^^^^  ^^^^^  ^^^  ^^^^^^

warning: unused import: `num_traits::ToPrimitive`
  --> src/bin/transcribe/main.rs:57:5
   |
57 | use num_traits::ToPrimitive;
   |     ^^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: `whisper::audio::prep_audio`
  --> src/bin/transcribe/main.rs:58:5
   |
58 | use whisper::audio::prep_audio;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: `SpecialToken`
  --> src/bin/transcribe/main.rs:59:37
   |
59 | use whisper::token::{Gpt2Tokenizer, SpecialToken};
   |                                     ^^^^^^^^^^^^

warning: unused variable: `duration`
  --> src/bin/transcribe/main.rs:35:9
   |
35 |     let duration = reader.duration() as usize;
   |         ^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_duration`
   |
   = note: `#[warn(unused_variables)]` on by default

warning: unused variable: `bits_per_sample`
  --> src/bin/transcribe/main.rs:38:9
   |
38 |     let bits_per_sample = spec.bits_per_sample;
   |         ^^^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_bits_per_sample`

warning: variable does not need to be mutable
  --> src/bin/transcribe/main.rs:32:9
   |
32 |     let mut reader = hound::WavReader::open(filename)?;
   |         ----^^^^^^
   |         |
   |         help: remove this `mut`
   |
   = note: `#[warn(unused_mut)]` on by default

warning: unused variable: `tokens`
   --> src/bin/transcribe/main.rs:145:16
    |
145 |     let (text, tokens) = match waveform_to_text(&whisper, &bpe, lang, waveform, sample_rate) {
    |                ^^^^^^ help: if this is intentional, prefix it with an underscore: `_tokens`

warning: `whisper` (bin "transcribe") generated 12 warnings (run `cargo fix --bin "transcribe"` to apply 12 suggestions)
    Finished release [optimized] target(s) in 3.43s
warning: the following packages contain code that will be rejected by a future version of Rust: nom v1.2.4, nom v3.2.1
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 1`
     Running `target/release/transcribe tiny_en audio16k.wav en transcription.txt`
Loading waveform...
Loading model...
Depth: 0
Chunk 0: 

Transcription finished.

The file transcription is empty.

If I debug it

let (text, tokens) = match waveform_to_text(&whisper, &bpe, lang, waveform, sample_rate)

text and tokens are empty

this is my file generated by sox

audio16k.wav.zip

WGPU issues in Macbook M1

As a TVM user Im very excited of this project because of the use of burn and its access to WGPU native. Personally speaking is the way to go.
However my tests are very discouraging. WGPU seems to be performing worse than CPU

WGPU

cargo run --release --bin transcribe --features wgpu-backend  medium  audio16k.wav transcription.txt

     Running `target/release/transcribe medium audio16k.wav en transcription.txt`
Loading waveform...
Loading model...
Depth: 0
...
Chunk 0:  Hello, I am the whisper machine learning model. If you see this as text, then I am working properly.

infer took: 49665 ms

CPU

cargo run --release --bin transcribe medium audio16k.wav en transcription.txt

    Running `target/release/transcribe medium audio16k.wav en transcription.txt`
Loading waveform...
Loading model...
Depth: 0
...
Chunk 0:  Hello, I am the whisper machine learning model. If you see this as text, then I am working properly.

infer took: 19517 ms
Transcription finished.

the code was slightly modified:

fn main() {
    cfg_if::cfg_if! {
        if #[cfg(feature = "wgpu-backend")] {
            type Backend = WgpuBackend<AutoGraphicsApi, f32, i32>;
            let device = WgpuDevice::BestAvailable;
        } else if #[cfg(feature = "torch-backend")] {
            type Backend = TchBackend<f32>;
            let device = TchDevice::Cpu;
        }
    }


... 

let start_time = Instant::now();
    let (text, tokens) = match waveform_to_text(&whisper, &bpe, lang, waveform, sample_rate) {
        Ok((text, tokens)) => (text, tokens),
        Err(e) => {
            eprintln!("Error during transcription: {}", e);
            process::exit(1);
        }
    };
    let end_time = Instant::now();
    let elapsed_time_ms = end_time.duration_since(start_time).as_millis();

    println!("infer took: {} ms", elapsed_time_ms);

Same 3X for tiny CPU vs tiny WGPU

Might not be optimised for my machine? It's not working maybe?

Changes to run on CUDA with TchBackend

This library is awesome, thank you. Incredibly fast and a much nicer API than alternatives.

I was hoping it would be the magic bullet that works on M2 and CUDA so that it can be deployed (running services from a MacBook seems the only option with these models!).

I tried last night on AWS with TchBackend and ran into:

Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend.

After that I noticed your chunk branch used the same settings I'd used.

It looks like empty_strided isn't available on CUDA at all, and models using it need to be moved to the CPU.

Is it possible to use alternative methods in the tensor constructors (?) so that it's compatible with both WGPU and CUDA? Or do you have any pointers - did you get it working with Tch initially?

Trouble importing finetuned HuggingFace model

Hi, thanks for developing this awesome Whisper implementation! I'm looking to deploy a small Whisper model I finetuned using HuggingFace transformers. The model is supposed to generated cantonese romanizations and the language is set to English during training because they share the same ascii letters. The primary motivation is to take advantage of burn's wgpu backend for cross platform deployment to both iOS and Android users. Prior to trying your library, I managed to get my finetuned model running on iOS using whisper.cpp but I'd prefer a rust backend for portability.

For my experiment with importing the model into whisper-burn, I first converted the HuggingFace model to Whisper's pt format using a script (See step 1 of this issue). And then I followed the steps in the README and successfully converted the model to the burn format. However, when I run inference using my model, it produced garbage transcripts on the provided audio16k.wav as well as on my own test audio. For example, the audio16k.wav produced a transcript of "onbed" when normally the model should recognize English inputs in addition to Cantonese.

I'm wondering if it's possible for you to support importing HuggingFace models directly to whisper-burn? That way, it's easier to eliminate intermediate bugs during the conversion pipeline. Maybe the convert-h5-to-ggml from Whisper.cpp can come in handy? Thanks.

Tinygrad dependancy for weights conversion

May I ask why the need of Tinygrad for the weights conversions? The script seems to be dumping them with np afterwards is read by tinygrad.

Transriptions fails using large-v2 model

Not sure if this is related to loading the model, or the transcription process. Also it seems restoring the checkpoint into VRAM takes much longer compared to Python version.

RUST_BACKTRACE=1 cargo run --release audio.wav large-v2

Caused by:
    In Device::create_bind_group
    Buffer binding 0 range 265548800 exceeds `max_*_buffer_binding_size` limit 134217728

', /home/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.17.0/src/backend/direct.rs:3056:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/panicking.rs:67:14
   2: core::ops::function::Fn::call
   3: <wgpu::backend::direct::Context as wgpu::context::Context>::device_create_bind_group
   4: <T as wgpu::context::DynContext>::device_create_bind_group
   5: wgpu::Device::create_bind_group
   6: burn_wgpu::context::base::Context::execute
   7: burn_wgpu::kernel::index::select::select
   8: burn_tensor::tensor::ops::modules::base::ModuleOps::embedding
   9: whisper::model::TextDecoder<B>::forward
  10: whisper::transcribe::waveform_to_text
  11: whisper::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.