Git Product home page Git Product logo

silero-vad's Introduction

Mailing list : test Mailing list : test License: CC BY-NC 4.0

Open In Colab

header


Silero VAD


Silero VAD - pre-trained enterprise-grade Voice Activity Detector (also see our STT models).


Real Time Example
real-time-example.mp4

Key Features


  • Stellar accuracy

    Silero VAD has excellent results on speech detection tasks.

  • Fast

    One audio chunk (30+ ms) takes less than 1ms to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably. Under certain conditions ONNX may even run up to 4-5x faster.

  • Lightweight

    JIT model is around one megabyte in size.

  • General

    Silero VAD was trained on huge corpora that include over 100 languages and it performs well on audios from different domains with various background noise and quality levels.

  • Flexible sampling rate

    Silero VAD supports 8000 Hz and 16000 Hz sampling rates.

  • Flexible chunk size

    Model was trained on 30 ms. Longer chunks are supported directly, others may work as well.

  • Highly Portable

    Silero VAD reaps benefits from the rich ecosystems built around PyTorch and ONNX running everywhere where these runtimes are available.

  • No Strings Attached

    Published under permissive license (MIT) Silero VAD has zero strings attached - no telemetry, no keys, no registration, no built-in expiration, no keys or vendor lock.


Typical Use Cases


  • Voice activity detection for IOT / edge / mobile use cases
  • Data cleaning and preparation, voice detection in general
  • Telephony and call-center automation, voice bots
  • Voice interfaces

Links



Get In Touch


Try our models, create an issue, start a discussion, join our telegram chat, email us, read our news.

Please see our wiki and tiers for relevant information and email us directly.

Citations

@misc{Silero VAD,
  author = {Silero Team},
  title = {Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-vad}},
  commit = {insert_some_commit_here},
  email = {[email protected]}
}

Examples and VAD-based Community Apps


  • Example of VAD ONNX Runtime model usage in C++

  • Voice activity detection for the browser using ONNX Runtime Web

silero-vad's People

Contributors

abinthomasonline avatar adamnsandle avatar alexrainhao avatar bclark-videra avatar bontempogianpaolo1 avatar bygreencn avatar chenqianhe avatar gabrielziegler3 avatar hoonlight avatar iamsvp94 avatar kafan1986 avatar kai-karren avatar mhthomsen avatar owlsometech-kenyang avatar pengzhendong avatar saenyakorn avatar snakers4 avatar sontref avatar streamer45 avatar tomiinek avatar vvvvvgh avatar xiaoqiang306 avatar yairl avatar yugan6 avatar zzzacwork avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

silero-vad's Issues

Comma missing in README.md VAD examples

After README.md updates VAD example code looks like this:

...

(get_speech_ts,
 get_speech_ts_adaptive
 _, read_audio,
 _, _, _) = utils

...

So there are missing commas after get_speech_ts_adaptive in several snippets.

❓ Questions / Help / Support

❓ Questions and Help

This looks great, I saw your post on the KAIST VAD repo! I have two questions:

  • Do you plan to release any more information about the network architecture you used for training, or the training framework itself?
  • Have you also checked: https://github.com/ina-foss/inaSpeechSegmenter this framework is also excellent and the gender and music classification is fantastic. It is offline however, not online by default.

❓ VAD Training Data Used

❓ VAD Training Data

Am not sure if this is morally right to ask this question, forgive me if am wrong. Could you let me know what were the data that was used to train silero_vad ? Whether it is a proprietary data or any public dataset ? If latter, could you pl name it ?
Are there any recommended datasets that you would suggest to train VAD with ?

Bug report - torch.cat with an empty list of Tensors

🐛 Bug

torch.cat called with an empty list of tensors in utils_vad.py.

Process Process-13:
Traceback (most recent call last):
  File "/lium/raid01_b/pchampi/lab/sidekit-for-vpc/venv/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/lium/raid01_b/pchampi/lab/sidekit-for-vpc/venv/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/lium/raid01_b/pchampi/lab/sidekit-for-vpc/tools/apply_vad_on_csv.py", line 49, in job
    vad_wav = collect_chunks(speech_timestamps, wav)
  File "/lium/home/pchampi/.cache/torch/hub/snakers4_silero-vad_a345715/utils_vad.py", line 622, in collect_chunks
    return torch.cat(chunks)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at aten/src/ATen/RegisterCPU.cpp:5925 [kernel]
CUDA: registered at aten/src/ATen/RegisterCUDA.cpp:7100 [kernel]
QuantizedCPU: registered at aten/src/ATen/RegisterQuantizedCPU.cpp:641 [kernel]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:10525 [kernel]
Autocast: registered at ../aten/src/ATen/autocast_mode.cpp:254 [kernel]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Env

Collecting environment information...
PyTorch version: 1.8.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 10 (buster) (x86_64)
GCC version: (Debian 8.3.0-6) 8.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28

Python version: 3.8.5 (default, Sep  4 2020, 07:30:14)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.19.0-8-amd64-x86_64-with-glibc2.10
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.3
[pip3] torch==1.8.2+cu102
[pip3] torchaudio==0.8.2
[pip3] torchvision==0.9.2+cu102
[conda] numpy                     1.21.3                   pypi_0    pypi
[conda] torch                     1.8.2+cu102              pypi_0    pypi
[conda] torchaudio                0.8.2                    pypi_0    pypi
[conda] torchvision               0.9.2+cu102              pypi_0    pypi

❓ How to setup max speech duration?

as title
use default vad parameters to do speech segmentation
some duration of segments are over 3 minutes
any parameter as max_speech_duration to set ?

start= 0.00, dur= 0.94
start= 2.75, dur= 9.88
start= 13.50, dur= 6.56
start= 20.75, dur= 3.06
start= 23.88, dur= 41.44
start= 66.62, dur= 1.06
start= 69.56, dur=144.50
start=214.31, dur= 61.56
start=276.12, dur=257.50
start=533.62, dur= 27.00
start=560.81, dur= 54.19
start=614.94, dur=213.44
start=828.31, dur= 6.44

thanks

false negative & false positive for 8k Chinese phone record

Hi @snakers4 , really impressive project !
I am using this awesome vad in my project, but I find some false negative & false positive examples in my experiments, it's really hard to find a good parameters(trig, neg_trig, min_speech, min_silence).
My dataset is Chinese Phone records of 8k sample rate, I really believe this project can work, but can you give me some help about what I can do just using existing models.
best regards

Значения в output

Добрый день, пробую onnx модель, и после запуска session.run (для 4000 элементного сэмпла) в аутпуте получаю два значения, например 0.94601 и 0.0567758 для первого сэмпла файла files_ru.wav. Я так понимаю, первый параметр это вероятность того, что сэмпл это речь, верно? А что представляет второе значение?

How to deploy to android?

How to deploy to android ?

  1. how to do feature extraction on android
  2. how to do inference on android

any c++ code reference?
thanks

Bug report - [timestamps overlap]

🐛 Bug

Hi
I use get_speech_ts_adaptive() function in utils_vad.py for find speech regions and when i test a audio with Sample Rate of 16000 and bitrate of 256000 and the video length is about 1 minute some time stamps have overlap such as you consider a timestamp. this time stamp end have overlap with the start of next time stamp

Environment

  • PyTorch Version : 1.8.1
  • OS : ubuntu 20.04
  • How you installed PyTorch :pip
  • Python version:3.8.5
  • CUDA/cuDNN version: no cuda
  • GPU models and configuration:use cpu in 1 thread by default

❓ Model Structure and Training

❓ Training details

Great, very exciting work. Unfortunately I couldn't find the details of the model structure and training. Very interested, could you share?

Why run the example of "Single Audio Stream" slow ?

Example of Single Audio Stream take 7.769s

wav = f'{files_dir}/en.wav'

for batch in single_audio_stream(model, wav):
    if batch:
        print(batch)

but speed is slow compared with the example of "Full Audio" (take 2.879s)

Feature request - [speech, music, noise]

🚀 Feature

extent vad to speech, music, noise

Motivation

As music is common in these days, vad for speech and noise is not enough.

Pitch

Can detect speech, music, noise in a audio stream

when run onnx vad example, it showed the error

when run onnx vad example, it showed the error

import torch
import onnxruntime
from pprint import pprint

_, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                              model='silero_vad',
                              force_reload=True)

(get_speech_ts,
 _,_,
 read_audio,
 _, _, _) = utils

files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'

def init_onnx_model(model_path: str):
    return onnxruntime.InferenceSession(model_path)

def validate_onnx(model, inputs):
    with torch.no_grad():
        ort_inputs = {'input': inputs.cpu().numpy()}
        outs = model.run(None, ort_inputs)
        outs = [torch.Tensor(x) for x in outs]
    return outs

model = init_onnx_model(f'{files_dir}/model.onnx')
wav = read_audio(f'{files_dir}/en.wav')

# get speech timestamps from full audio file
speech_timestamps = get_speech_ts(wav, model, num_steps=4, run_function=validate_onnx)
pprint(speech_timestamps)

Traceback (most recent call last):
File "1.py", line 30, in
speech_timestamps = get_speech_ts(wav, model, num_steps=4, run_function=validate_onnx)
File "/home/gary/.cache/torch/hub/snakers4_silero-vad_master/utils_vad.py", line 112, in get_speech_ts
outs = torch.cat(outs, dim=0)
TypeError: expected Tensor as element 0 in argument 0, but got list

Originally posted by @garymmi in #58

After release of silero-mini, colab examples became outdated

After the release of silero-mini, you added a new function to the utils_vad.py, which made the following piece of code invalid. Raising 'Too many values to unpack' Exception

(get_speech_ts,
 _, read_audio,
 _, _, _) = utils

It should be the following instead:

(get_speech_ts,
 _, _,
read_audio,
 _, _, _) = utils

This is true for all examples.

GPU inference

[W:onnxruntime:Default, fallback_cpu_capability.h:140 GetCpuPreferedNodes] Force fallback to CPU execution for node: Equal_890

GPU inference

Hello. Any chance to put inference on GPU? After several tries i got error that this model is quantized, but maybe you can share non-quantized version?

Mobile / Edge / ARM / ONNX Use Cases

While the VAD (especially the micro one) was explicitly designed for IOT / edge / mobile use cases, we do not have the resource or expertise to provide instructions for corresponding ARM / mobile builds for PyTorch and / or ONNX.

ONNX guides were refurbished recently and it is implied that ARM binaries will be made available (but they are not yet).

People from the community (see telegram chat) have also claimed successful builds and use of silero-models on pytorch replacing mkl with cblas.

In any case sharing such dockerized builds (e.g. based off debian / ubuntu / alpine) for your tested used cases will be of great value for the community, PRs greatly encouraged and appreciated.

Please see some examples here - https://github.com/microsoft/onnxruntime/blob/master/dockerfiles/README.md#arm-32v7

If you feel like doing something like this - please provide a build in a dockerfile and provide some background info on which arch / device / processor you are running it, if this hardware is generally available, what is the end performance etc

Bug report - loading error

🐛 Bug

Loading the model with hub.load fails

To Reproduce

base $~ python3Python 3.8.5 (default, Sep  4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import torchaudio
>>> import soundfile
>>>
>>> torch.__version__
'1.8.2+cu102'
>>> torchaudio.__version__
'0.8.2'
>>> soundfile.__version__
'0.10.3'
>>> model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad:a345715',
...                               model='silero_vad')
Using cache found in /lium/home/pchampi/.cache/torch/hub/snakers4_silero-vad_a345715
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lium/raid01_b/pchampi/lab/sidekit-for-vpc/venv/lib/python3.8/site-packages/torch/hub.py", line 339, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "/lium/raid01_b/pchampi/lab/sidekit-for-vpc/venv/lib/python3.8/site-packages/torch/hub.py", line 368, in _load_local
    model = entry(*args, **kwargs)
  File "/lium/home/pchampi/.cache/torch/hub/snakers4_silero-vad_a345715/hubconf.py", line 24, in silero_vad
    model = init_jit_model(model_path=f'{hub_dir}/snakers4_silero-vad_master/files/model.jit')
  File "/lium/home/pchampi/.cache/torch/hub/snakers4_silero-vad_a345715/utils_vad.py", line 74, in init_jit_model
    model = torch.jit.load(model_path, map_location=device)
  File "/lium/raid01_b/pchampi/lab/sidekit-for-vpc/venv/lib/python3.8/site-packages/torch/jit/_serialization.py", line 161, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/torch/nn/quantized/modules/linear.py", line 17, in __setstate__
    state: Tuple[Tensor, Optional[Tensor], bool, int]) -> None:
    self.dtype = (state)[3]
    _1 = (self).set_weight_bias((state)[0], (state)[1], )
          ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    self.training = (state)[2]
    return None
  File "code/__torch__/torch/nn/quantized/modules/linear.py", line 40, in set_weight_bias
    _10 = "Unsupported dtype on dynamic quantized linear!"
    if torch.eq(self.dtype, 12):
      _11 = ops.quantized.linear_prepack(weight, bias)
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
      self._packed_params = _11
    else:

Traceback of TorchScript, original code (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/quantized/modules/linear.py", line 93, in __setstate__
    def __setstate__(self, state):
        self.dtype = state[3]
        self.set_weight_bias(state[0], state[1])
        ~~~~~~~~~~~~~~~~~~~~ <--- HERE
        self.training = state[2]
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/quantized/modules/linear.py", line 23, in set_weight_bias
    def set_weight_bias(self, weight: torch.Tensor, bias: Optional[torch.Tensor]) -> None:
        if self.dtype == torch.qint8:
            self._packed_params = torch.ops.quantized.linear_prepack(weight, bias)
                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        elif self.dtype == torch.float16:
            self._packed_params = torch.ops.quantized.linear_prepack_fp16(weight, bias)
RuntimeError: Didn't find engine for operation quantized::linear_prepack NoQEngine

>>>

Environment

Collecting environment information...
PyTorch version: 1.8.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 10 (buster) (x86_64)
GCC version: (Debian 8.3.0-6) 8.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28

Python version: 3.8.5 (default, Sep  4 2020, 07:30:14)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.19.0-8-amd64-x86_64-with-glibc2.10
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.3
[pip3] torch==1.8.2+cu102
[pip3] torchaudio==0.8.2
[pip3] torchvision==0.9.2+cu102
[conda] numpy                     1.21.3                   pypi_0    pypi
[conda] torch                     1.8.2+cu102              pypi_0    pypi
[conda] torchaudio                0.8.2                    pypi_0    pypi
[conda] torchvision               0.9.2+cu102              pypi_0    pypi

any relevant paper?

Dear all:
Thank you for your contribution to the whole voice community. We really appreciate the method and pretrained models.
I wonder is there any related paper or document to illustrate inner details of your work? specifically, i would like to know what kind of network architecture you used? what dataset you used? Thank you again.

Finetuning VAD model

Is there a way to finetune provided pretrained model on my own data? Can you share some code for training? Thanks!

Migrating examples to new models

@Kai-Karren @Bontempogianpaolo1

In a few days we will be radically changing the models:

  • Probably dropping ONNX VAD models (we have not decided yet);
  • Reducing chunk size to 30ms (chunk will be flexible, but larger than 30ms);
  • Removing separate 8 / 16 kHz models, now all models would work with 8 and 16 kHz;
  • Most likely deprecating micro, mini and ordinary models in favor of just a mini-sized models (still running last experiments);
  • New models will be compatible with mobile builds of PyTorch;
  • Dropping the batched buffering approach we used because of large chunks;

i.e. radically simplifying and speeding up the models.

You have contributed to the examples, would you like to participate in improving them using the new models?

Is it possible to limit the languages within the language detection

Hi, I am tying to use the Language Classifier 95 model, but the accuracy is not so good.
I have tried to increase the top_n value, but did not help too much.
I thought I can neglect most of the languages (which I do not care about) with specifying a reduced set of languages in the lang_dict and the lang_group_dict parameters in the following line:
languages, language_groups = get_language_and_group(wav, model, lang_dict, lang_group_dict, top_n=2)
but it does not work.
Is it possible somehow to specify a subset of the languages for this model?
Thanks!

ONNX model fails to load in browser

Hello I'm trying to load onnx model using JS in browser. I'm using official example from ONNX github:

<html>
  <head> </head>

  <body>
    <!-- Load ONNX.js -->
    <script src="https://cdn.jsdelivr.net/npm/onnxjs/dist/onnx.min.js"></script>
    <!-- Code that consume ONNX.js -->
    <script>
      // create a session
      const myOnnxSession = new onnx.InferenceSession();
      // load the ONNX model file
      myOnnxSession.loadModel("./my-model.onnx").then(() => {
        // generate model input
        const inferenceInputs = getInputs();
        // execute the model
        myOnnxSession.run(inferenceInputs).then((output) => {
          // consume the output
          const outputTensor = output.values().next().value;
          console.log(`model output tensor: ${outputTensor.data}.`);
        });
      });
    </script>
  </body>
</html>

But loading fails with following message. Do you have any idea what might cause this? I found no information on onnx website.
image

Bug report - Device cuda does not work

🐛 Bug

I would like to use cuda to compute the vad. Your tookit has an argument for it:

device='cpu'):

But it crashes when I set device to 'cuda' (the input wav is also correctly set to("cuda")).
Does your toolkit support it ?
BTW, Thanks for your awesome work on this toolkit! 👍

Traceback (most recent call last):
  File "/lium/raid01_b/pchampi/lab/venv/bin/extract_xvectors.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/lium/raid01_b/pchampi/lab/sidekit/bin/extract_xvectors.py", line 157, in <module>
    main(xtractor, args.wav_scp, args.out_scp, args.device, args.vad, args.vad_num_samples_per_window, args.vad_min_silence_samples)
  File "/lium/raid01_b/pchampi/labvenv/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/lium/raid01_b/pchampi/lab/sidekit/bin/extract_xvectors.py", line 123, in main
    speech_timestamps = get_speech_ts_adaptive(signal.to("cuda"), model,
  File "/lium/home/pchampi/.cache/torch/hub/snakers4_silero-vad_a345715/utils_vad.py", line 227, in get_speech_ts_adaptive
    chunks = torch.Tensor(torch.cat(to_concat, dim=0)).to(device)
TypeError: expected CPU (got CUDA)

Environment

Collecting environment information...
PyTorch version: 1.8.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 10 (buster) (x86_64)
GCC version: (Debian 8.3.0-6) 8.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28

Python version: 3.8.5 (default, Sep  4 2020, 07:30:14)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.19.0-8-amd64-x86_64-with-glibc2.10
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.3
[pip3] torch==1.8.2+cu102
[pip3] torchaudio==0.8.2
[pip3] torchvision==0.9.2+cu102
[conda] numpy                     1.21.3                   pypi_0    pypi
[conda] torch                     1.8.2+cu102              pypi_0    pypi
[conda] torchaudio                0.8.2                    pypi_0    pypi
[conda] torchvision               0.9.2+cu102              pypi_0    pypi

❓ PyTorch 1.5.1 load() missing 1 required positional argument: 'github'

❓ Questions and Help

We have a wiki available for our users. Please make sure you have checked it out first.

i run the vad example, error as below
Traceback (most recent call last):
File "vad.py", line 7, in
force_reload=True)
TypeError: load() missing 1 required positional argument: 'github'

i use torch 1.5.1

thanks

Changelog

Just a handy issue to be notified of latest changes and micro-releases (we will mostly changing the models)

I want to know some details about Silero-VAD

❓ Help

Hello, thank you for the VAD tool provided. Our company is preparing to use this tool as a long audio cutting tool, but I have not found the implementation details and principle of this VAD, so that we can better optimize the algorithm model

Hello, I want to know the principle of Silero-VAD. Can you provide some relevant documents, thank you

❓ Questions / Help / Support RuntimeError: Backend "soundfile" is not one of available backends: ['sox', 'sox_io'].

❓ Questions and Help

We have a wiki available for our users. Please make sure you have checked it out first.

I used torch 1.7.1, run vad example, error as below

Downloading: "https://github.com/snakers4/silero-vad/archive/master.zip" to /home/nick/.cache/torch/hub/master.zip
/data/nick/Python-3.6.3/lib/python3.6/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch/audio#903 for the detail.
'"sox" backend is being deprecated. '
Traceback (most recent call last):
File "vad.py", line 7, in
force_reload=True)
File "/data/nick/Python-3.6.3/lib/python3.6/site-packages/torch/hub.py", line 370, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/data/nick/Python-3.6.3/lib/python3.6/site-packages/torch/hub.py", line 396, in _load_local
hub_module = import_module(MODULE_HUBCONF, hubconf_path)
File "/data/nick/Python-3.6.3/lib/python3.6/site-packages/torch/hub.py", line 71, in import_module
spec.loader.exec_module(module)
File "", line 678, in exec_module
File "", line 219, in _call_with_frames_removed
File "/data/nick/.cache/torch/hub/snakers4_silero-vad_master/hubconf.py", line 3, in
from utils_vad import (init_jit_model,
File "/data/inck/.cache/torch/hub/snakers4_silero-vad_master/utils_vad.py", line 9, in
torchaudio.set_audio_backend("soundfile") # switch backend
File "/data/nick/Python-3.6.3/lib/python3.6/site-packages/torchaudio/backend/utils.py", line 47, in set_audio_backend
f'Backend "{backend}" is not one of '
RuntimeError: Backend "soundfile" is not one of available backends: ['sox', 'sox_io'].

Making Conda Packages

We planning to upload even better models shortly and reduce the repo size by moving audio samples and larger less popular models to external links.

Since the utilities we provide now are really minimalistic, there is very little difference between doing something like conda install and torch.hub.load, since in python having PyTorch is a must. Essentially a package is just a model and probably 50 - 100 lines of code.

Neverthelss many people like using packaged libraries, especially for a VAD, which seems like a "solved" task, unlike STT, which may involve some moving parts (for production grade quality of course).

Internally, we do not really maintain python packages (we usually favour pre-built docker image approach for our own production). Maybe someone, who has published a few conda or pip could lend a hand to help us build a quick CI package exported based around Github Actions?

If we keep VAD minimalistic in future, maintaning this seems like a no-brainer, and maybe even with conda it will play nicely with available builds of PyTorch for other platforms provided by the PyTorch team itself?

Inconsistent output from onnx and jit

Hi @snakers4

I tried both the lang_classifier_95.onnx and lang_classifier_95.jit and found when fed with the same input, the outputs are different(with large enough margin). Based on the name, I guess they are exported from the same pytorch model. Why are the outcome different? Please help!

Thanks!
Junjie

Parameters for the 8k model?

Parameters for the 8k model?

Are the default samples tune for the 8k models? or should i half the sample size for things like num_samples_per_window, min_speech_samples, etc?

Kindly look into this issue and provide me example on how to do the inference on ONNX Model with live audio streaming

this is the Code

import io
import numpy as np
import torch
torch.set_num_threads(1)
import torchaudio
import matplotlib
import matplotlib.pylab as plt
torchaudio.set_audio_backend("soundfile")
import pyaudio

model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
model='silero_vad',
force_reload=True)

(get_speech_ts,
get_speech_ts_adaptive,
save_audio,
read_audio,
state_generator,
single_audio_stream,
collect_chunks) = utils

**def init_onnx_model(model_path: str):
return onnxruntime.InferenceSession(model_path)

model = init_onnx_model(model_path='./model.onnx')**

def validate(model,inputs: torch.Tensor):
with torch.no_grad():
ort_inputs = {'input': inputs.cpu().numpy()}
outs = model.run(None, ort_inputs)
outs = [torch.Tensor(x) for x in outs]
return outs[0]

def int2float(sound):
abs_max = np.abs(sound).max()
sound = sound.astype('float32')
if abs_max > 0:
sound *= 1/abs_max
sound = sound.squeeze()
return sound

FORMAT = pyaudio.paInt16
CHANNELS = 1
SAMPLE_RATE = 16000
CHUNK = int(SAMPLE_RATE / 10)

audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=SAMPLE_RATE,
input=True,
frames_per_buffer=CHUNK)
data = []
voiced_confidences = []

print("Started Recording")
for i in range(0, frames_to_record):

audio_chunk = stream.read(int(SAMPLE_RATE * frame_duration_ms / 1000.0))

# in case you want to save the audio later
data.append(audio_chunk)

audio_int16 = np.frombuffer(audio_chunk, np.int16);

audio_float32 = int2float(audio_int16)

# get the confidences and add them to the list to plot them later
vad_outs = validate(model, torch.from_numpy(audio_float32))
# only keep the confidence for the speech
voiced_confidences.append(vad_outs[:,1])

print("Stopped the recording")

plot the confidences for the speech

plt.figure(figsize=(20,6))
plt.plot(voiced_confidences)
plt.show()

The Error I'm getting is,

vad_outs1 = validate(model, torch.from_numpy(audio_int161))
Traceback (most recent call last):
File "testimport.py", line 76, in
vad_outs1 = validate(model, torch.from_numpy(audio_int161))
File "testimport.py", line 47, in validate
outs = model.run(None, ort_inputs)
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/session.py", line 110, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (N11onnxruntime17PrimitiveDataTypeIsEE) , expected: (N11onnxruntime17PrimitiveDataTypeIfEE)

@snakers4 Kindly provide me the solution

Tensorflow or Tensorflow Lite model of Silero VAD

🚀 Feature

Publish open source Silero VAD model on TensorFlow or TensorFlow Lite

Motivation

I wish to use your Silero VAD model in a production environment where only TF is supported

Pitch

Silero VAD would be very useful on mobile and embedded devices. TensorFlow Lite is the best variant in context of devices with limited memory and capacity.

Alternatives

I tried to convert both published models to TF but faced with different problems, maybe because of mutable input size or torchscript and onnxruntime instead of classical torch and onnx types.

Additional context

Thank you in advance.

Are ONNX Models Necessary?

In a few days we will be radically changing the models:

  • Probably dropping ONNX VAD models (we have not decided yet);
  • Reducing chunk size to 30ms (chunk will be flexible, but larger than 30ms);
  • Removing separate 8 / 16 kHz models, now all models would work with 8 and 16 kHz;
  • Most likely deprecating micro, mini and ordinary models in favor of just a mini-sized models (still running last experiments);
  • New models will be compatible with mobile builds of PyTorch;
  • Dropping the batched buffering approach we used because of large chunks;

i.e. we will be radically simplifying everything.

We have seen limited use of ONNX models, so therefore I am asking.

silero-models and silero-vad combined lead to ImportError

If using both silero-models and silero-vad combined in a function call, only either the models or vad call works, while the second leads to an ImportError:

ImportError: cannot import name 'get_speech_ts'

I assume not being aware of something trivial here, but couldn't figure out how to solve this until now. Any ideas?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.