Git Product home page Git Product logo

asr-recipes's Introduction

Recipes for using open-source ASR corpora

Recipes for using open-source ASR corpora with Kaldi.

This is not an official Google product.

Languages

Language Directory Corpus
Javanese jv Open SLR 35
Sundanese su Open SLR 36
Sinhala si Open SLR 52

How to use

The above corpora are ready for use with Kaldi, after some simple data munging. We provide a small Kaldi recipe for training a triphone recognizer, inspired by the start of Kaldi's Resource Management recipe. The recipe is only intended for illustration and for validating the corpus and data preparation.

Prerequisites

  1. Kaldi. First download Kaldi from GitHub, compile, and install.
  2. Flac. The scripts below use the flac command line tool (assumed to be on the shell PATH) for on-the-fly decompression of the corpus.
  3. Python and Bash.

General steps

  1. IMPORTANT: You must define and export an environment variable KALDI_ROOT pointing at your Kaldi directory.
  2. Download and unpack the corpora you need.
  3. Change to a recipe directory and execute run.sh.

Example

Here is how to use the Javanese corpus:

sudo apt-get install flac wget
git clone https://github.com/kaldi-asr/kaldi
cd kaldi
export KALDI_ROOT="$(realpath .)"
cat INSTALL
# and follow the instructions there to build Kaldi
cd ..
git clone https://github.com/googlei18n/asr-recipes
cd asr-recipes
tools/download_data.sh jv
# this unpacks the Javanese corpus into asr_javanese
cd jv
./run.sh

License

Unless otherwise noted, all original files are licensed under an Apache License, Version 2.0.

asr-recipes's People

Contributors

mjansche avatar oddurk avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.