Git Product home page Git Product logo

silero-models's Introduction

Mailing list : test Mailing list : test License: CC BY-NC 4.0

Donations Backers Sponsors

Build and Deploy to PyPI PyPI version

header

Silero Models

Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.

Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.

As a bonus:

  • No Kaldi;
  • No compilation;
  • No 20-step instructions;

Also we have published TTS models that satisfy the following criteria:

  • One-line usage;
  • A large library of voices;
  • A fully end-to-end pipeline;
  • Natural-sounding speech;
  • No GPU or training required;
  • Minimalism and lack of dependencies;
  • Faster than real-time on one CPU thread (!!!);
  • Support for 16kHz and 8kHz out of the box;

Also we have published a model for text repunctuation and recapitalization that:

  • Inserts capital letters and basic punctuation marks, e.g., dots, commas, hyphens, question marks, exclamation points, and dashes (for Russian);
  • Works for 4 languages (Russian, English, German, and Spanish) and can be extended;
  • Domain-agnostic by design and not based on any hard-coded rules;
  • Has non-trivial metrics and succeeds in the task of improving text readability;

Installation and Basics

You can basically use our models in 3 flavours:

  • Via PyTorch Hub: torch.hub.load();
  • Via pip: pip install silero and then import silero;
  • Via caching the required models and utils manually and modifying if necessary;

Models are downloaded on demand both by pip and PyTorch Hub. If you need caching, do it manually or via invoking a necessary model once (it will be downloaded to a cache folder). Please see these docs for more information.

PyTorch Hub and pip package are based on the same code. All of the torch.hub.load examples can be used with the pip package via this basic change:

# before
torch.hub.load(repo_or_dir='snakers4/silero-models',
               model='silero_stt',  # or silero_tts or silero_te
               **kwargs)

# after
from silero import silero_stt, silero_tts, silero_te
silero_stt(**kwargs)

Speech-To-Text

All of the provided models are listed in the models.yml file. Any metadata and newer versions will be added there.

Screenshot_1

Currently we provide the following checkpoints:

PyTorch ONNX Quantization Quality Colab
English (en_v6) ✔️ ✔️ ✔️ link Open In Colab
English (en_v5) ✔️ ✔️ ✔️ link Open In Colab
German (de_v4) ✔️ ✔️ link Open In Colab
English (en_v3) ✔️ ✔️ ✔️ link Open In Colab
German (de_v3) ✔️ link Open In Colab
German (de_v1) ✔️ ✔️ link Open In Colab
Spanish (es_v1) ✔️ ✔️ link Open In Colab
Ukrainian (ua_v3) ✔️ ✔️ ✔️ N/A Open In Colab

Model flavours:

jit jit jit jit jit_q jit_q onnx onnx onnx onnx
xsmall small large xlarge xsmall small xsmall small large xlarge
English en_v6 ✔️ ✔️ ✔️ ✔️ ✔️
English en_v5 ✔️ ✔️ ✔️ ✔️ ✔️
English en_v4_0 ✔️ ✔️
English en_v3 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
German de_v4 ✔️ ✔️
German de_v3 ✔️
German de_v1 ✔️ ✔️
Spanish es_v1 ✔️ ✔️
Ukrainian ua_v3 ✔️ ✔️ ✔️

Dependencies

  • All examples:
    • torch, 1.8+ (used to clone the repo in TensorFlow and ONNX examples), breaking changes for versions older than 1.6
    • torchaudio, latest version bound to PyTorch should just work
    • omegaconf, latest should just work
  • Additional dependencies for ONNX examples:
    • onnx, latest should just work
    • onnxruntime, latest should just work
  • Additional for TensorFlow examples:
    • tensorflow, latest should just work
    • tensorflow_hub, latest should just work

Please see the provided Colab for details for each example below. All examples are maintained to work with the latest major packaged versions of the installed libraries.

PyTorch

Open In Colab

Open on Torch Hub

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file in any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

Open In Colab

Our model will run anywhere that can import the ONNX model or that supports the ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file in any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual ONNX inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

TensorFlow

Open In Colab

SavedModel example

import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual tf model
torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
subprocess.run('rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model',  shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')

# download a single file in any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
print(decoder(torch.Tensor(res.numpy())[0]))

Text-To-Speech

Models and Speakers

All of the provided models are listed in the models.yml file. Any metadata and newer versions will be added there.

V4

V4 models support SSML. Also see Colab examples for main SSML tag usage.

ID Speakers Auto-stress Language SR Colab
v4_ru aidar, baya, kseniya, xenia, eugene, random yes ru (Russian) 8000, 24000, 48000 Open In Colab
v4_cyrillic b_ava, marat_tt, kalmyk_erdni... no cyrillic (Avar, Tatar, Kalmyk, ...) 8000, 24000, 48000 Open In Colab
v4_ua mykyta, random no ua (Ukrainian) 8000, 24000, 48000 Open In Colab
v4_uz dilnavoz no uz (Uzbek) 8000, 24000, 48000 Open In Colab
v4_indic hindi_male, hindi_female, ..., random no indic (Hindi, Telugu, ...) 8000, 24000, 48000 Open In Colab

V3

V3 models support SSML. Also see Colab examples for main SSML tag usage.

ID Speakers Auto-stress Language SR Colab
v3_en en_0, en_1, ..., en_117, random no en (English) 8000, 24000, 48000 Open In Colab
v3_en_indic tamil_female, ..., assamese_male, random no en (English) 8000, 24000, 48000 Open In Colab
v3_de eva_k, ..., karlsson, random no de (German) 8000, 24000, 48000 Open In Colab
v3_es es_0, es_1, es_2, random no es (Spanish) 8000, 24000, 48000 Open In Colab
v3_fr fr_0, ..., fr_5, random no fr (French) 8000, 24000, 48000 Open In Colab
v3_indic hindi_male, hindi_female, ..., random no indic (Hindi, Telugu, ...) 8000, 24000, 48000 Open In Colab

Dependencies

Basic dependencies for Colab examples:

  • torch, 1.10+ for v3 models/ 2.0+ for v4 models;
  • torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);
  • omegaconf, latest (can be removed as well, if you do not load all of the configs);

PyTorch

Open In Colab

Open on Torch Hub

# V4
import torch

language = 'ru'
model_id = 'v4_ru'
sample_rate = 48000
speaker = 'xenia'
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=model_id)
model.to(device)  # gpu or cpu

audio = model.apply_tts(text=example_text,
                        speaker=speaker,
                        sample_rate=sample_rate)

Standalone Use

  • Standalone usage only requires PyTorch 1.10+ and the Python Standard Library;
  • Please see the detailed examples in Colab;
# V4
import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v4_ru.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_text = 'В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.'
sample_rate = 48000
speaker='baya'

audio_paths = model.save_wav(text=example_text,
                             speaker=speaker,
                             sample_rate=sample_rate)

SSML

Check out our TTS Wiki page.

Cyrillic languages

Supported tokenset: !,-.:?iµöабвгдежзийклмнопрстуфхцчшщъыьэюяёђѓєіјњћќўѳғҕҗҙқҡңҥҫүұҳҷһӏӑӓӕӗәӝӟӥӧөӱӳӵӹ

Speaker_ID Language Gender
b_ava Avar F
b_bashkir Bashkir M
b_bulb Bulgarian M
b_bulc Bulgarian M
b_che Chechen M
b_cv Chuvash M
cv_ekaterina Chuvash F
b_myv Erzya M
b_kalmyk Kalmyk M
b_krc Karachay-Balkar M
kz_M1 Kazakh M
kz_M2 Kazakh M
kz_F3 Kazakh F
kz_F1 Kazakh F
kz_F2 Kazakh F
b_kjh Khakas F
b_kpv Komi-Ziryan M
b_lez Lezghian M
b_mhr Mari F
b_mrj Mari High M
b_nog Nogai F
b_oss Ossetic M
b_ru Russian M
b_tat Tatar M
marat_tt Tatar M
b_tyv Tuvinian M
b_udm Udmurt M
b_uzb Uzbek M
b_sah Yakut M
kalmyk_erdni Kalmyk M
kalmyk_delghir Kalmyk F

Indic languages

Example

(!!!) All input sentences should be romanized to ISO format using aksharamukha. An example for hindi:

# V3
import torch
from aksharamukha import transliterate

# Loading model
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language='indic',
                                     speaker='v4_indic')

orig_text = "प्रसिद्द कबीर अध्येता, पुरुषोत्तम अग्रवाल का यह शोध आलेख, उस रामानंद की खोज करता है"
roman_text = transliterate.process('Devanagari', 'ISO', orig_text)
print(roman_text)

audio = model.apply_tts(roman_text,
                        speaker='hindi_male')

Supported languages

Language Speakers Romanization function
hindi hindi_female, hindi_male transliterate.process('Devanagari', 'ISO', orig_text)
malayalam malayalam_female, malayalam_male transliterate.process('Malayalam', 'ISO', orig_text)
manipuri manipuri_female transliterate.process('Bengali', 'ISO', orig_text)
bengali bengali_female, bengali_male transliterate.process('Bengali', 'ISO', orig_text)
rajasthani rajasthani_female, rajasthani_female transliterate.process('Devanagari', 'ISO', orig_text)
tamil tamil_female, tamil_male transliterate.process('Tamil', 'ISO', orig_text, pre_options=['TamilTranscribe'])
telugu telugu_female, telugu_male transliterate.process('Telugu', 'ISO', orig_text)
gujarati gujarati_female, gujarati_male transliterate.process('Gujarati', 'ISO', orig_text)
kannada kannada_female, kannada_male transliterate.process('Kannada', 'ISO', orig_text)

Text-Enhancement

Languages Quantization Quality Colab
'en', 'de', 'ru', 'es' ✔️ link Open In Colab

Dependencies

Basic dependencies for Colab examples:

  • torch, 1.9+;
  • pyyaml, but it's installed with torch itself

Standalone Use

  • Standalone usage only requires PyTorch 1.9+ and the Python Standard Library;
  • Please see the detailed examples in Colab;
import torch

model, example_texts, languages, punct, apply_te = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                                                  model='silero_te')

input_text = input('Enter input text\n')
apply_te(input_text, lan='en')

Denoise

Denoise models attempt to reduce background noise along with various artefacts such as reverb, clipping, high/lowpass filters etc., while trying to preserve and/or enhance speech. They also attempt to enhance audio quality and increase sampling rate of the input up to 48kHz.

Models

All of the provided models are listed in the models.yml file.

Model JIT Real Input SR Input SR Output SR Colab
small_slow ✔️ 8000, 16000, 24000, 44100, 48000 24000 48000 Open In Colab
large_fast ✔️ 8000, 16000, 24000, 44100, 48000 24000 48000 Open In Colab
small_fast ✔️ 8000, 16000, 24000, 44100, 48000 24000 48000 Open In Colab

Dependencies

Basic dependencies for Colab examples:

  • torch, 2.0+;
  • torchaudio, latest version bound to PyTorch should work;
  • omegaconf, latest (can be removed as well, if you do not load all of the configs).

PyTorch

Open In Colab

import torch

name = 'small_slow'
device = torch.device('cpu')
model, samples, utils = torch.hub.load(
  repo_or_dir='snakers4/silero-models',
  model='silero_denoise',
  name=name,
  device=device)
(read_audio, save_audio, denoise) = utils

i = 0
torch.hub.download_url_to_file(
  samples[i],
  dst=f'sample{i}.wav',
  progress=True
)
audio_path = f'sample{i}.wav'
audio = read_audio(audio_path).to(device)
output = model(audio)
save_audio(f'result{i}.wav', output.squeeze(1).cpu())

i = 1
torch.hub.download_url_to_file(
  samples[i],
  dst=f'sample{i}.wav',
  progress=True
)
output, sr = denoise(model, f'sample{i}.wav', f'result{i}.wav', device='cpu')

Standalone Use

import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/denoise_models/sns_latest.jit',
                                   local_file)  

model = torch.jit.load(local_file)
torch._C._jit_set_profiling_mode(False) 
torch.set_grad_enabled(False)
model.to(device)

a = torch.rand((1, 48000))
a = a.to(device)
out = model(a)

FAQ

Wiki

Also check out our wiki.

Performance and Quality

Please refer to these wiki sections:

Adding new Languages

Please refer here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us, and read the latest news.

Commercial Inquiries

Please refer to our wiki and the Licensing and Tiers page for relevant information, and email us.

Citations

@misc{Silero Models,
  author = {Silero Team},
  title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-models}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Further reading

English

  • STT:

    • Towards an Imagenet Moment For Speech-To-Text - link
    • A Speech-To-Text Practitioners Criticisms of Industry and Academia - link
    • Modern Google-level STT Models Released - link
  • TTS:

    • Multilingual Text-to-Speech Models for Indic Languages - link
    • Our new public speech synthesis in super-high quality, 10x faster and more stable - link
    • High-Quality Text-to-Speech Made Accessible, Simple and Fast - link
  • VAD:

    • One Voice Detector to Rule Them All - link
    • Modern Portable Voice Activity Detector Released - link
  • Text Enhancement:

    • We have published a model for text repunctuation and recapitalization for four languages - link

Chinese

  • STT:
    • 迈向语音识别领域的 ImageNet 时刻 - link
    • 语音领域学术界和工业界的七宗罪 - link

Russian

  • STT

    • OpenAI решили распознавание речи! Разбираемся так ли это … - link
    • Наши сервисы для бесплатного распознавания речи стали лучше и удобнее - link
    • Telegram-бот Silero бесплатно переводит речь в текст - link
    • Бесплатное распознавание речи для всех желающих - link
    • Последние обновления моделей распознавания речи из Silero Models - link
    • Сжимаем трансформеры: простые, универсальные и прикладные способы cделать их компактными и быстрыми - link
    • Ультимативное сравнение систем распознавания речи: Ashmanov, Google, Sber, Silero, Tinkoff, Yandex - link
    • Мы опубликовали современные STT модели сравнимые по качеству с Google - link
    • Понижаем барьеры на вход в распознавание речи - link
    • Огромный открытый датасет русской речи версия 1.0 - link
    • Насколько Быстрой Можно Сделать Систему STT? - link
    • Наша система Speech-To-Text - link
    • Speech-To-Text - link
  • TTS:

    • Теперь наш синтез также доступен в виде бота в Телеграме - link
    • Может ли синтез речи обмануть систему биометрической идентификации? - link
    • Теперь наш синтез на 20 языках - link
    • Теперь наш публичный синтез в супер-высоком качестве, в 10 раз быстрее и без детских болячек - link
    • Синтезируем голос бабушки, дедушки и Ленина + новости нашего публичного синтеза - link
    • Мы сделали наш публичный синтез речи еще лучше - link
    • Мы Опубликовали Качественный, Простой, Доступный и Быстрый Синтез Речи - link
  • VAD:

    • Наш публичный детектор голоса стал лучше - link
    • А ты используешь VAD? Что это такое и зачем он нужен - link
    • Модели для Детекции Речи, Чисел и Распознавания Языков - link
    • Мы опубликовали современный Voice Activity Detector и не только -link
  • Text Enhancement:

    • Восстановление знаков пунктуации и заглавных букв — теперь и на длинных текстах - link
    • Мы опубликовали модель, расставляющую знаки препинания и заглавные буквы в тексте на четырех языках - link

Donations

Please use the "sponsor" button.

silero-models's People

Contributors

abhi011999 avatar adamnsandle avatar axenov avatar evrrn avatar islanna avatar jlund avatar kartikeyporwal avatar nurtdinovadf avatar rominf avatar shrivatsahosabettu avatar slgero avatar snakers4 avatar teague-lasser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

silero-models's Issues

Noise at the end of produced wave file

🐛 Bug

Noise at the end of produced wave file

To Reproduce

Steps to reproduce the behavior:

  1. Install torch, numpy
  2. Run the example for Habrahabr article:
import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v2_kseniya.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_batch = ['В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.',
                 'Котики - это жидкость!',
                 'М+ама М+илу м+ыла с м+ылом.']
sample_rate = 16000

audio_paths = model.save_wav(texts=example_batch,
                             sample_rate=sample_rate)
  1. The second and third audio files are 12 seconds long (instead of ~1 sec), and are "padded" with noise
    image

Expected behavior

No padding with noise

Environment

Please copy and paste the output from this
environment collection script
(or fill out the checklist below manually).

Collecting environment information...
PyTorch version: 1.9.0+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.8 (64-bit runtime)
Python platform: Windows-10-10.0.18362-SP0
Is CUDA available: False
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080
Nvidia driver version: 466.77
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.3
[pip3] torch==1.9.0
[conda] Could not collect

Additional context

Thanks a lot for creating this!

Period token mostly missing from text enhancer model output

Hi,

I fed this small audio to an STT engine and obtained following transcription:

good afternoon everyone my name is preet bharara and i'm united states attorney for the southern district of new york that was me a few months ago at a press conference when i still had the best job i'll ever have i oversaw prosecutions against every type of criminal you can imagine mobsters murderers and corrupt politicians for decades bernie madoff was able to launder billions of dollars and ponzi proceeds of criminal charges against general motors company related to today accused arms dealer viktor boot begins to face american justice on march eleventh of this year though i lost that job actually that's a euphemism i was fired by president donald trump himself since then a lot has happened fbi director james comey was fired robert muller was appointed to find out whether anyone in the white house colluded with russia the attorney general gets maligned by the president on a regular basis and i'm on the sidelines now so i figure that's a perfect place to launch a podcast we're going to talk to prosecutors to judges to justice department officials the investigative reporters who break these stories even some politicians and i'll be bringing them on the show for conversations that  you won't get to hear anywhere else wnyc studios and cafe are presenting our show produced by pineapple street media so head to a apple podcasts or wherever you get your podcasts and subscribe right now to stay tuned with preet pre bahar out here opera rather greens lara crate barrel high profile us attorney for manhattan preet bharara preet bharara bharara 

Feeding this as is to text enhancer model in example.ipynb produces the following output:

Good afternoon Everyone My name is Preet Bharara and I'm United States attorney for the Southern District of New York that was me a few months ago at a press conference when I still had the best job I'll ever have I oversaw prosecutions against every type of criminal, you can imagine mobsters murderers and corrupt politicians for decades Bernie Madoff was able to launder billions of dollars and ponzi proceeds of criminal charges against general Motors company related to today accused arms dealer Viktor Boot begins to face American justice on March eleventh of this year, though I lost that job actually that's a euphemism I was fired by President Donald Trump himself since then a lot has happened FBi Director James Comey was fired Robert Muller was appointed to find out whether anyone in the White House colluded with Russia the attorney general gets maligned by the president on a regular basis and I'm on the sidelines now so I figure that's a perfect place to launch a podcast we're going to talk to prosecutors to judges to Justice Department officials the investigative reporters who break these stories even some politicians and I'll be bringing them on the show for conversations that you won't get to hear anywhere else WnyC Studios and Cafe are presenting our show produced by Pineapple Street Media So head to a Apple podcasts, or Wherever you get your podcasts and subscribe right now to stay tuned with Preet pre Bahar out here Opera Rather Greens Lara Crate Barrel High profile Us Attorney for Manhattan Preet Bharara Preet Bharara Bharara.

You can see it misses almost all the periods.

Thanks!

"Семь" произносится как "Сёмь"

Пробовал с baya_v2 и kseniya_v2.
UPD: ruslan_v2 тоже

Число семь
7.mp4
Один, два, три, четыре, пять, шесть, семь, восемь, девять. Этот пример успешно прочитался только с 5й попытки. До этого некоторые слова пропадали почему-то.
default.mp4
collect_env.py
PyTorch version: 1.10.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux bullseye/sid (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.31

Python version: 3.9.1+ (default, Jan 20 2021, 14:49:22)  [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-5.9.0-2-amd64-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.10.0+cpu
[pip3] torchaudio==0.10.0+cpu
[pip3] torchvision==0.11.1+cpu
[conda] Could not collect

❓ Questions / Help / Support претренированная модель STT для русского языка - доступность, лицензия, etc

В репозитории нет, в FAQ единственное упоминание ведет на сайт, где нет подробностей. Есть ли/предполагается ли публикация в свободном доступе STT модели русского языка для offline/standalone/selfhosted использования? Где можно ознакомиться с ценами, если она предоставляется только платно (если они доступны публично).

Bug report -problem loading STT model on Windows

Hi, I decided to try selero_models, I do everything as in the dock, but I get an error. How to fix?

code:

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)

Error:
RuntimeError Traceback (most recent call last)
C:\Users\E786~1\AppData\Local\Temp/ipykernel_9444/3004546653.py in
1 device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU
----> 2 model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
3 model='silero_stt',
4 language='en', # also available 'de', 'es'
5 device=device)

c:\PY\asistent.venv\lib\site-packages\torch\hub.py in load(repo_or_dir, model, source, force_reload, verbose, skip_validation, *args, **kwargs)
397 repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation)
398
--> 399 model = _load_local(repo_or_dir, model, *args, **kwargs)
400 return model
401

c:\PY\asistent.venv\lib\site-packages\torch\hub.py in _load_local(hubconf_dir, model, *args, **kwargs)
426
427 entry = _load_entry_from_hubconf(hub_module, model)
--> 428 model = entry(*args, **kwargs)
429
430 sys.path.remove(hubconf_dir)

~/.cache\torch\hub\snakers4_silero-models_master\hubconf.py in silero_stt(language, version, jit_model, **kwargs)
32 assert language in available_languages
33
---> 34 model, decoder = init_jit_model(model_url=models.stt_models.get(language).get(version).get(jit_model),
35 **kwargs)
36 utils = (read_batch,

~/.cache\torch\hub\snakers4_silero-models_master\utils.py in init_jit_model(model_url, device)
128 progress=True)
129
--> 130 model = torch.jit.load(model_path, map_location=device)
131 model.eval()
132 return model, Decoder(model.labels)

c:\PY\asistent.venv\lib\site-packages\torch\jit_serialization.py in load(f, map_location, _extra_files)
159 cu = torch._C.CompilationUnit()
160 if isinstance(f, str) or isinstance(f, pathlib.Path):
--> 161 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
162 else:
163 cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: open file failed because of errno 2 on fopen: No such file or directory, file path: C:\Users\Дом/.cache\torch\hub\snakers4_silero-models_master\model\en_v5.jit

❓ Questions / Help / Support

❓ Questions and Help

We have a wiki available for our users. Please make sure you have checked it out first.

Is there a specific network structure used by English models?
Thx!

Bug report - running on ARM / RPI

🐛 Bug

I tried to use the model in a Raspberry PI 3B and i get the following error :
fft: ATen not compiled with MKL support
So i tried to modify the stft function in torch/functional.py to use the librosa stft instead, but it seems that the model use another torch stft instead of this i have on my package.

The function used instead of torch stft

def stft(input: Tensor, n_fft: int, hop_length: Optional[int] = None,
win_length: Optional[int] = None, window: Optional[Tensor] = None,
center: bool = True, pad_mode: str = 'reflect', normalized: bool = False,
onesided: Optional[bool] = None,
return_complex: Optional[bool] = None):
S = librosa.stft(np.array(input),n_fft,hop_length,win_length,window,center,pad_mode)
s_real = np.real(S)
s_real_shape = np.shape(s_real)
s_real = np.reshape(s_real,(s_real_shape[0],s_real_shape[1],1))
s_imag = np.imag(S)
s_imag_shape = np.shape(s_imag)
s_imag = np.reshape(s_imag,(s_imag_shape[0],s_imag_shape[1],1))
S = np.concatenate((s_real,s_imag),axis=2)
return torch.tensor(S)

stack traces

File "/home/Salim/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/stt_pretrained/models/model.py", line 27, in forward
_2 = self.win_length
_3 = torch.hann_window(self.n_fft, dtype=ops.prim.dtype(x), layout=None, device=ops.prim.device(x), pin_memory=None)
x0 = torch.torch.functional.stft(x, _0, _1, _2, _3, True, "reflect", False, True, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_4 = torch.slice(x0, 0, 0, 9223372036854775807, 1)
_5 = torch.slice(_4, 1, 0, 9223372036854775807, 1)
File "code/torch/torch/functional.py", line 21, in stft
input0 = input
print("test ok")
_2 = torch.stft(input0, n_fft, hop_length, win_length, window, normalized, onesided)
~~~~~~~~~~ <--- HERE
return _2

Traceback of TorchScript, original code (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 465, in stft
input = F.pad(input.view(extended_shape), (pad, pad), pad_mode)
input = input.view(input.shape[-signal_dim:])
return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided)
~~~~~~~~ <--- HERE
RuntimeError: fft: ATen not compiled with MKL support

Expected behavior

Is it possible to modify the forward function that it will use the librosa stft for the raspberry PIs users ?

Environment

PyTorch version: 1.7.0a0+e85d494
Is debug build: True
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Raspbian GNU/Linux 10 (buster) (armv7l)
GCC version: (Raspbian 8.3.0-6+rpi1) 8.3.0
Clang version: Could not collect
CMake version: version 3.13.4

Python version: 3.7 (32-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] numpydoc==0.7.0
[pip3] torch==1.7.0a0
[pip3] torchaudio==0.7.0a0+ac17b64
[pip3] torchvision==0.8.0a0+291f7e2
[conda] Could not collect

Huge Amounts of Parasite Traffic (?)

I used to host model files via a S3 bucket.
But starting in January the CDN started showing ~20 TB per month download traffic, which we investigated immediately and moved to our own hosting first believing this to be a billing bug.

Currently the stats show ~68 clones in 2 weeks and force_reload=False everywhere in the examples by default, which is quite a modest traffic.

We analyzed the traffic for a day and found out that the majority of this "strange" traffic is going from random IPs from a AWS subnets

image
image

I.e. 2-3 requests from one IP, then subnet change, most subnets are AWS.
Looks like some CI job gone rogue or some botnet scraping our URLs continuously

Быстрое решение проблем с ударениями - russtress

Есть такой пакет - https://github.com/MashaPo/russtress - он тем и занимается, что проставляет ударения. Вот пример кода для расстановки ударений и преобразования ' в +

import re, russtress
accent = russtress.Accent()
input_text = "Проставь, пожалуйста, ударения"
accented_text = accent.put_stress(input_text)
output_text = re.compile(r"(.)\'", re.UNICODE).sub(r"+\1", accented_text)
print(output_text)  # "Прост+авь, пож+алуйста, удар+ения"

Может кому сэкономит время.

Feature request - Ukrainian model

🚀 Feature

We would like to have a Ukrainian model for the task of Speech-to-Text.

Motivation

Ukraine has a large population and in the country and there are tons of tasks related to Speech-to-Text.

Additional context

Our group that is based in Telegram ( https://t.me/speech_recognition_uk ) collected a dataset of Ukrainian public speeches/interviews in audio and text formats accessed here: https://mega.nz/folder/T34DQSCL#Q1O8vcrX_8Qnp27Ge56_4A/folder/O3hzlKIJ

We think this dataset will be helpful in the training process.

Expected changes in `torchaudio.load`

Hi

Thanks for using torchaudio. We are overhauling I/O mechanism, and in the future, the signature of torchaudio.load will be slightly changed. The detail can be found pytorch/audio#903

In the following use case, normalization becomes normalize, but the default value will be kept same.

silero-models/utils.py

Lines 29 to 31 in 15d36e4

wav, sr = torchaudio.load(path,
normalization=True,
channels_first=True)

In the upcoming release 0.7.0, (which is expected to happen in about a week), we start issuing warnings about the change, but the code will keep working. In 0.8.0 release (no fixed date yet), the interface will be changed.
Since the default values for normalization/normalize and channels_first are kept same, using wav, sr = torchaudio.load(path) will reduce the maintenance cost.

❓ Questions / Help / Support

Hi hello nice resource .I am trying with few custom audio samples , the results seems not so good. and how to finetune the model ,can you help to provide more information of the model ?

❓ Questions / Help / Support

❓ Questions and Help

We have a wiki available for our users. Please make sure you have checked it out first.
could share your training code for vad model

Other language speech recognized as incorrect English text

🐛 Bug

English Speech to text model recognizes other language speech as incorrect / incomplete English text

To Reproduce

Steps to reproduce the behavior:

  1. Run English STT model.
  2. Feed non English language speech for STT
  3. Check the STT result text

Expected behavior

When English STT model is fed with non English text, it should not output English text.

Environment

Any Environment

How to get original Pytorch Weights and Source code of the models❓ Questions / Help / Support

Hi,

I'm currently researching on robustness of AI systems in sound classification tasks. In order to investigate how the activations some decision are made, I need to be able to modify and access some layer in the model. However, currently the models I get from Pytorch Hub are loaded as ScriptModules. I was wondering if I could get the original code of the network and their corresponding weights as a .pt file. This will help me to carry on with the experiements I need. Of course, I'm happy to cite your work.

Not able to recognize the numerical voice data.

Hi Team,

I am using this model for recognizing the 16 digits credit card number from the audio files. But among the 16 digits number I am able to recognize mostly 12 digits and rest are not. I am using the indian accent english `voice. The code what I am using is below

import torch
import zipfile
import torchaudio
from glob import glob
import warnings
warnings.filterwarnings("ignore")

def transcript(audio_path):
digits=[]
device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU

model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio (soundfile backend)
# torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
#                                dst ='speech_orig.wav', progress=True)

char_map={
	'zero' : '0',
	'one' : '1',
	'two': '2',
	'three' : '3',
	'four' : '4',
	'five' : '5',
	'six' : '6',
	'seven' : '7',
	'eight' : '8',
	'nine' : '9',
	'double': 'double'
}
test_files = glob(audio_path)
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
	number=decoder(example.cpu())
print(number)

text= number.split(' ')
for item in text:

	if item in char_map.keys() or item=='double':
		digits.append(char_map[item])

number= ' '.join(digits)

return number

number= transcript('D:\ML Workspace\speech-recognition\test_audio_data\0.wav')
print(number)

Here in the audio people are mostly using double if the number repeated twice.

Please, add a demo

❓ Questions and Help

We have a wiki available for our users. Please make sure you have checked it out first.

Please, add a demo. All TTS projects start with a demo of their best results.

A demo motivates me (or, if it's bad, not) to read further and start investing in the project.

Problem loading german onnx model

Please Support: I have tried both the english pytorch model and the onnx model. Both work fine on different wav files i've tried. After that i tried the same for the german models. Again the pytorch model is fine, but the onnx model seems totally off. So my idea was to export to create a new onnx model from the pytorch-model using "torch.onnx.export(...)", but unfortunately this fails with error:

temporary: the only valid use of a module is looking up an attribute but found = prim::SetAttr[name="num_batches_tracked"](%4257, %4387)

I was looking for a fix but could'nt find any clue on how to make this work. Any ideas here ?

❓ How to use v1 models?

While v2 models are working fine with given standalone code example, v1 produces errors while loading:

Traceback (most recent call last):
  File "C:\Users\NORDLING\.cache\torch\hub\snakers4_silero-models_master\inference.py", line 12, in <module>
    model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
  File "C:\Users\NORDLING\AppData\Roaming\Python\Python39\site-packages\torch\package\package_importer.py", line 84, in __init__
    self.extern_modules = self._read_extern()
  File "C:\Users\NORDLING\AppData\Roaming\Python\Python39\site-packages\torch\package\package_importer.py", line 289, in _read_extern
    self.zip_reader.get_record(".data/extern_modules")
RuntimeError: PytorchStreamReader failed locating file .data/extern_modules: file not found

Is there any example how to work with them to generate wav's for test/comparsion?

Bug report - Support for sound backend on Linux

🐛 Bug

Running samples on the Linux platform (Ubuntu Focal/Mint Ulyssa flavors) causes crash due to the missing "soundfile" backend.

To Reproduce

On Ubuntu (focal):

  1. python3 -m python3 -m pip install pytorch torch omegaconf torchaudio
  2. Run minimal example from the README:
import torch

language = 'ru'
speaker = 'kseniya_16khz'
device = torch.device('cpu')
model, symbols, sample_rate, example_text, apply_tts = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                                                      model='silero_tts',
                                                                      language=language,
                                                                      speaker=speaker)
model = model.to(device)  # gpu or cpu
audio = apply_tts(texts=[example_text],
                  model=model,
                  sample_rate=sample_rate,
                  symbols=symbols,
                  device=device)

Error message received:

Traceback (most recent call last):
...
    from utils import (init_jit_model,
  File "/home/user/.cache/torch/hub/snakers4_silero-models_master/utils.py", line 16, in <module>
    torchaudio.set_audio_backend(audio_backend_name)  # switch backend
  File "/home/user/.local/lib/python3.8/site-packages/torchaudio/backend/utils.py", line 52, in set_audio_backend
    raise RuntimeError(
RuntimeError: Backend "soundfile" is not one of available backends: ['sox', 'sox_io'].```

It seems that on Linux the default sound backend should be "sox_io" (the "sox" backend is deprecated). The "soundfile" backend is only available on Windows.

## Expected behavior

The example code should work on Linux.

## Environment

PyTorch version: 1.8.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Linux Mint 20.1 (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1 
CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==1.8.1
[pip3] torchaudio==0.8.1
[conda] Could not collect

## Additional context

Suggested solution - fix util.py to add support for platforms other than Windows.

Train model For Persian

❓ I've read your repo. thanks for this nice project.

Where is the code which I can use to train model with my own data?
Can I know your Model Structure?

And How was the quality of data you used? I mean noises, transcription accuracy for training data, hours etc...

Thanks

Getting a ConfigKeyError/ConfigAttributeError/Missing key error following the Tensorflow example.

Following the example given for the tensorflow version and running the code is giving an error (the asterisks just not showing my Username for my computer, lol) I'm struggling a bit with. Sorry if this is a really stupid question, but what should I do to try to solve this? I have not downloaded all of the silero-sst models to my computer if that is the issue.

torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
Traceback (most recent call last):

  File "<ipython-input-19-8fbcbe869937>", line 1, in <module>
    torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')

  File "***\anaconda3\lib\site-packages\omegaconf\dictconfig.py", line 353, in __getattr__
    self._format_and_raise(

  File "***\anaconda3\lib\site-packages\omegaconf\base.py", line 190, in _format_and_raise
    format_and_raise(

  File "***\anaconda3\lib\site-packages\omegaconf\_utils.py", line 821, in format_and_raise
    _raise(ex, cause)

  File "***\anaconda3\lib\site-packages\omegaconf\_utils.py", line 719, in _raise
    raise ex.with_traceback(sys.exc_info()[2])  # set end OC_CAUSE=1 for full backtrace

  File "***\anaconda3\lib\site-packages\omegaconf\dictconfig.py", line 351, in __getattr__
    return self._get_impl(key=key, default_value=_DEFAULT_MARKER_)

  File "***\anaconda3\lib\site-packages\omegaconf\dictconfig.py", line 438, in _get_impl
    node = self._get_node(key=key, throw_on_missing_key=True)

  File "***\anaconda3\lib\site-packages\omegaconf\dictconfig.py", line 470, in _get_node
    raise ConfigKeyError(f"Missing key {key}")

ConfigAttributeError: Missing key tf
    full_key: stt_models.en.latest.tf
    object_type=dict

cannot use this model as a layer of other model.

this line of code makes it impossible to use this model as a (non trainable) layer of other model.

torch.set_grad_enabled(False)

also please check discussion here https://discuss.pytorch.org/t/runtimeerror-element-0-of-tensors-does-not-require-grad-and-does-not-have-a-grad-fn-when-training-from-examples/107816/6

I have to manually enable it to make it work.

Just want to know if there is a specific reason to disable gradient calculation.

Errors running example.ipynb locally or in Colab (PyTorch 1.10 issues)

Hi,

I am unable to run example.ipynb notebook locally (on CPU machine) or any of the Google Colab notebooks (either on CPU or GPU runtime).

Following error occurs for example.ipynb notebook:

model_url = model_conf.get('package')

model_dir = "downloaded_model"
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, os.path.basename(model_url))

if not os.path.isfile(model_path):
    torch.hub.download_url_to_file(model_url,
                                   model_path,
                                   progress=True)

imp = package.PackageImporter(model_path)
model = imp.load_pickle("te_model", "model")
example_texts = model.examples

def apply_te(text, lan='en'):
    return model.enhance_text(text, lan)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_2498123/2005539933.py in <module>
     10                                    progress=True)
     11 
---> 12 imp = package.PackageImporter(model_path)
     13 model = imp.load_pickle("te_model", "model")
     14 example_texts = model.examples

~/miniconda3/lib/python3.8/site-packages/torch/package/importer.py in __init__(self, file_or_buffer, module_allowed)
     59             self.filename = str(file_or_buffer)
     60             if not os.path.isdir(self.filename):
---> 61                 self.zip_reader = torch._C.PyTorchFileReader(self.filename)
     62             else:
     63                 self.zip_reader = MockZipReader(self.filename)

RuntimeError: [enforce fail at inline_container.cc:222] . file not found: v1_4lang_q/version

For any of the Google Colab notebooks, I get the following error when executing the very first cell:

     |████████████████████████████████| 74 kB 2.2 MB/s 
     |████████████████████████████████| 2.9 MB 11.8 MB/s 
     |████████████████████████████████| 112 kB 35.0 MB/s 
     |████████████████████████████████| 596 kB 46.5 MB/s 
  Building wheel for antlr4-python3-runtime (setup.py) ... done
/content/silero-models
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-1-5d873de0231f> in <module>()
     16 from glob import glob
     17 from omegaconf import OmegaConf
---> 18 from utils import (init_jit_model, 
     19                    split_into_batches,
     20                    read_audio,

5 frames
/usr/lib/python3.7/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    362 
    363         if handle is None:
--> 364             self._handle = _dlopen(self._name, mode)
    365         else:
    366             self._handle = handle

OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory

Thus, as a result, I am unable to run any examples - either locally or in Google Colab.

Thanks!

Packaging and PyPI releases

Hello,

Thank you for your hard work.

Is there any chance of getting installable Python package from PyPI for the project?

For example, it might look like this for installing STT models with PyTorch:

pip install silero-models-stt[torch]

This would be very handy for using the models in the production projects and environments.

Bug report - load() missing 1 required positional argument: 'repo_or_dir'

🐛 Bug

This issue occurs when trying to run the code snippet for PyTorch in the README.md file.

0ne

Expected behavior

From the PyTorch documentation https://pytorch.org/docs/stable/hub.html#loading-models-from-hub,
torch.hub.load takes these parameters repo_or_dir, model, *args, **kwargs, and no github.

Fix

Update the code to the following;

model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)

two

Environment


 - PyTorch Version: 1.6
 - OS: Linux
 - How you installed PyTorch: conda
 - Python version: 3.8.3
 - CUDA/cuDNN version: 11.0

Bug report - RuntimeError: Unknown qengine

🐛 Bug

pickle not read model

To Reproduce

Steps to reproduce the behavior:

  1. run script for example - https://github.com/snakers4/silero-models#standalone-use

Traceback (most recent call last):
File "/Users/ar/PycharmProjects/tts_test-1/main.py", line 13, in
model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
File "/Users/ar/PycharmProjects/tts_test-1/venv/lib/python3.9/site-packages/torch/package/package_importer.py", line 249, in load_pickle
result = unpickler.load()
File "/usr/local/Cellar/[email protected]/3.9.2_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pickle.py", line 1228, in load
dispatchkey[0]
File "/usr/local/Cellar/[email protected]/3.9.2_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pickle.py", line 1272, in load_binpersid
self.append(self.persistent_load(pid))
File "/Users/ar/PycharmProjects/tts_test-1/venv/lib/python3.9/site-packages/torch/package/package_importer.py", line 227, in persistent_load
loaded_reduces[reduce_id] = func(self, *args)
File "/Users/ar/PycharmProjects/tts_test-1/venv/lib/python3.9/site-packages/torch/jit/_script.py", line 344, in unpackage_script_module
cpp_module = torch._C._import_ir_module_from_package(
RuntimeError: Unknown qengine

Expected behavior

generate voice

Environment

PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 10.13.6 (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.2 (default, Mar 26 2021, 01:41:56) [Clang 10.0.0 (clang-1000.11.45.5)] (64-bit runtime)
Python platform: macOS-10.13.6-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.3
[pip3] torch==1.10.0
[pip3] torchaudio==0.10.0
[pip3] torchvision==0.11.1
[conda] Could not collect

  • PyTorch Version (e.g., 1.0): 1.10.0
  • OS (e.g., Linux):macos high sierra
  • How you installed PyTorch (conda, pip, source): pip3 install -q torch torchvision torchaudio omegaconf
  • Build command you used (if compiling from source):
  • Python version: 3.9.2
  • CUDA/cuDNN version:
  • GPU models and configuration: 2,2 GHz Intel Core i7 4 cores
  • Any other relevant information:

Additional context

Feature request - Offline use of model

At the moment it is nearly impossible to create a docker container that works offline (without internet access).
Even if you include this line during docker build:

RUN python -c "import torch; torch.backends.quantized.engine='qnnpack'; torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_te', force_reload=True)"

During execution of the docker container (without internet) you load it locally:

torch.hub.load(repo_or_dir='/root/.cache/torch/hub/snakers4_silero-models_master', model='silero_te', source='local', force_reload=False)

Then you have the problem that the hubconf.py is called again (and fails due to no internet access) and it tries to download the files in hubconf.py Lines 21, 49, 101, even though they already exist.

So my suggestion would be to also includes checks in the Lines 21,49,101 to check if the file already exists locally and if yes then skip it (like done in Line 114)

Any reasons against that?

Any way to extract punctuation marks and transcription

Hey.
Really cool project !!!
I am using silero-stt model to extract text.
I just have few questions:

  • This model does not extract punctuation marks and does not split into sentences, am I missing something?
  • Is there a way to get start and end timestamps for each word as well?

Thank you.

How to obtain an intermediate layer output?

How do we obtain the output of an intermediate layer of the pre-trained model? For example, the output at the end of the convolution encoder, or the output just after the transformer encoder layers.

Feature request - [X]

🚀 Feature

Is it possible to finetune the model? do you have a checkpoint available to be used?

❓ How can I deal with long audio in Speech-to-text? (e.g., 60 - 120 min)

Hi,

I am trying to run the speech-to-text model on GPU/CPU for large audio file. but I got out-of-memory error from both sides.

Is there any iterable lazy dataloader that can feed the audio file 10 min by 10 min?

I have tried some silence-based audio segmentation, but the performance is not as the same level as silero.

Feature request - [Wake Word Detection]

🚀 Feature

It would be helpful if we could easily use wake word detection to complement the STT functionality. At present I'm using a third-party tool for wake word detection which then records audio for 4 seconds which is processed through silero for home automation purposes.

Motivation & Pitch

Adding a simple method for custom wake word detection would allow seamless integration for the purposes of home automation where an always listening device waits for a given wake word or phrase and then listens for a sentence for STT purposes, the text of which is then passed on to a different step in the chain.

Additionally, while waiting a fixed amount of time for the follow-up sentence is straight-forward, it would be a helpful addition to also use the length of silence in a sentence to determine its termination.

Alternatives

Theses things can be done at present, but by having to use multiple tools. Being able to do this in one place would make this use case seamless and easier to process.

I do understand if this is too far outside of your scope for this project.

Help in 'slice' used in forward function

You have used the following lines in the forward function of model:

  _4 = torch.slice(x0, 0, 0, 9223372036854775807, 1)
  _5 = torch.slice(_4, 1, 0, 9223372036854775807, 1)
  _6 = torch.slice(_5, 2, 0, 9223372036854775807, 1)
  x_real = torch.select(_6, 3, 0)

Could you briefly explain what is the purpose of torch.slice here?

Feature request - SAPI5

SAPI5 compatibility

🚀 Feature

Motivation

Mostly enough for screen readers (Windows).
But this interface is for integration by its nature.
Ready to help!

Простой вопрос - создать украинскую модель Можно?

Мы тут с ребятами https://t.me/speech_recognition_uk
собрали небольшой датасет украинского языка...
https://mega.nz/folder/T34DQSCL#Q1O8vcrX_8Qnp27Ge56_4A/folder/O3hzlKIJ

Имеете ли Вы такую возможность - чтобы создать хотя бы пробную модель Украинского языка...
Имеем большое желание затестить нейросеть в Вашем исполнении...

ValueError: Expected 303 See other HTTP response but received code 200

❓ Questions and Help

Whenever I try to use the tensorflow model code in Google Colab I get this error:

ValueError: Expected 303 See other HTTP response but received code 200

Traceback to this line of code:

tf_model = tf_hub.load(models.stt_models.en.latest.tf)

Does anyone know what is causing this issue?

❓ Help: Could you please help on an issue while exporting models to ONNX?

I've tried to run the model with en/de/es succssfully, and we want to export the models to ONNX file for running with onnxruntime.

Unfortunately, in Ubuntu + PyTorch 1.7.1 + Python 3.8.5, the torch.onnx.export() functions failed and threw an exception with errors below:

temporary: the only valid use of a module is looking up an attribute but found = prim::SetAttr[name="num_batches_tracked"](%4243, %4373)

The code looks like below:

    model, _, utils = torch.hub.load(github, model_name, language=language_name)
    model.eval()
    (read_batch, split_into_batches, _, prepare_model_input) = utils 

    # download a single file, any format compatible with TorchAudio (soundfile backend)
    torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                                    dst = 'speech_orig.wav', progress=True)
    test_files = glob('speech_orig.wav')
    batches = split_into_batches(test_files, batch_size=10)
    input_tensor = (prepare_model_input(read_batch(batches[0])), )
    if torch.cuda.is_available():
        model.to('cuda')
        if isinstance(input_tensor, tuple):
            new_tensor = [x.to('cuda') for x in input_tensor]
            input_tensor = tuple(new_tensor)
        else: 
            input_tensor = input_tensor.to('cuda')
    torch.onnx.export(model, input_tensor, 'my_model.onnx', verbose=False, opset_version=12)

I also noticed that you have already prepared some ONNX files for these models. So, is it possible that you can share with me how you export those files successfully? With any special scripts?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.