Git Product home page Git Product logo

deepspeech's Introduction

Myrtle Deep Speech

A PyTorch implementation of DeepSpeech and DeepSpeech2.

This repository is intended as an evolving baseline for other implementations to compare their training performance against.

Current roadmap:

  1. Pre-trained weights for both networks and full performance statistics.
  2. Mixed-precision training.

Running

Build the Docker image:

make build

Run the Docker container (here using nvidia-docker), ensuring to publish the port of the JupyterLab session to the host:

sudo docker run --runtime=nvidia --shm-size 512M -p 9999:9999 deepspeech

The JupyterLab session can be accessed via localhost:9999.

This Python package will accessible in the running Docker container and is accessible through either the command line interface:

deepspeech --help

or as a Python package:

import deepspeech

Examples

deepspeech --help will print the configurable parameters (batch size, learning rate, log location, number of epochs...) - it aims to have reasonably sensible defaults.

Training

A Deep Speech training run can be started by the following command, adding flags as necessary:

deepspeech ds1

By default the experimental data and logs are output to /tmp/experiments/year_month_date-hour_minute_second_microsecond.

Inference

A Deep Speech evaluation run can be started by the following command, adding flags as necessary:

deepspeech ds1 \
           --state_dict_path $MODEL_PATH \
           --log_file \
           --decoder greedy \
           --train_subsets \
           --dev_log wer \
           --dev_subsets dev-clean \
           --dev_batch_size 1

Note the lack of an argument to --log_file causes the WER results to be written to stderr.

Dataset

The package contains code to download and use the LibriSpeech ASR corpus.

WER

The word error rate (WER) is computed using the formula that is widely used in many open-source speech-to-text systems (Kaldi, PaddlePaddle, Mozilla DeepSpeech). In pseudocode, where N is the number of validation or test samples:

sum_edits = sum([edit_distance(target, predict)
                 for target, predict in zip(targets, predictions)])
sum_lens = sum([len(target) for target in targets])
WER = (1.0/N) * (sum_edits / sum_lens)

This reduces the impact on the WER of errors in short sentences. Toy example:

Target Prediction Edit Distance Label Length
lectures lectured 1 1
i'm afraid he said i am afraid he said 2 4
nice to see you mister meeking nice to see your mister makin 2 6

The mean WER of each sample considered individually is:

>>> (1.0/3) * ((1.0/1) + (2.0/4) + (2.0/6))
0.611111111111111

Compared to the pseudocode version given above:

>>> (1.0/3) * ((1.0 + 2 + 2) / (1.0 + 4 + 6))
0.1515151515151515

Maintainer

Please contact sam at myrtle dot ai.

deepspeech's People

Contributors

samgd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepspeech's Issues

Using as Python package

Hey,
What should I do to use the repo as Python package without Docker?
Is it possible.

Thx

ONNX Conversion of DS1 graph fails

Currently, the DS1 graph fails to export to ONNX due to the HardTanh operation not being part of the supported ONNX operators.

torch.onnx.export(model._get_network(), dummy_input, model_path, verbose=True, input_names=input_names)

> RuntimeError: ONNX export failed: Couldn't export operator aten::hardtanh

A simple fix could be to replace the nn.HardTanh operator with a custom layer that just uses torch.clamp instead, but there might be some additional code associated with module.layers, since torch.clamp expects a Tensor (i.e. not a Module). Hopefully, torch.onnx.export is smart enough convert a torch.clamp() call into the appropriate Clip operation(s).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.