Git Product home page Git Product logo

asr's Introduction

ASR - Automatic Speech Recognition

Automatic Speech Recognition using neural networks. This repo contains implementations of NVIDIA's Jasper and QuartzNet speech recognition architectures. Their approach doesn't rely on RNNs like most do, but rather opts for a fully convolutional approach.
NOTE: You'll have to unzip the ffmpeg binaries manually because we had to push them as zip archives due to Git LFS bandwith limitations. Just do Extract Here in their appropriate directories.

The research papers can be found on Arxiv:

๐ŸŽ“ Authors:

asr's People

Contributors

dijana-z avatar stefanpantic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

jankubat

asr's Issues

Fix "export" mode

Export mode doesn't seem to work correctly in some cases. Look into this and fix the problem.

dataset organization

Could you please give an example of the dataset folder structure, particularly for librispeech? Thanks

Implement "train" mode

Implement "train" functionality. It should be able to take hyperparameters and dataset as input and run a training loop.

Blocked by #1, #2, #3.

Implement "infer" mode

Implement "infer" functionality. It should take a frozen model output by "export" and perform inference by it. This task requires triaging once "export" is done.

Blocked by #5.

Look into ONNX format for model inference

Tensorflow models can be quite slow. ONNX is a format for storing and running ML models that is pretty fast and easy to use from different programming languages (i. e. C++). Look into and provide a Proof-of-Concept for implementing a conversion function from Tensorflow frozen graphs to ONNX graphs.

Implement "export" mode

Implement "export" functionality. It should take a trained model and output a frozen_graph ready for inference.

Blocked by #4.

Save dataset sizes to "size.json"

... and store it in the dataset directory during data preparation. This will make training a lot more flexible as we don't have to know the actual size of the dataset beforehand.

Implement "QuartzNet" model

QuartzNet is an improvement of the Jasper model that reduces training time and model complexity (in terms of trainable parameters) while maintaining accuracy. We should implement it along with Jasper to have more options when it comes to training the final model.

Paper here: https://arxiv.org/abs/1910.10261

Related: #1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.