Git Product home page Git Product logo

forced-alignment-tools's Introduction

forced-alignment-tools

A collection of links and notes on forced alignment tools

Did I miss an aligner? Please open an issue or directly fork-commit-pullrequest.

Definition of Forced Alignment

Given an audio file containing speech, and the corresponding transcript, computing a forced alignment is the process of determining, for each fragment of the transcript, the time interval (in the audio file) containing the spoken text of the fragment.

A text fragment can have arbitrary granularity:

  • a paragraph,
  • a sentence,
  • a portion of a sentence (i.e., a group of words),
  • a word, or
  • a phoneme (i.e., a single sound).

For example, given this text file and this audio file, a force aligment at verse-level can be the following:

1                                                     => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase,            => [00:00:02.640, 00:00:05.880]
That thereby beauty's rose might never die,           => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease,              => [00:00:09.240, 00:00:11.920]
His tender heir might bear his memory:                => [00:00:11.920, 00:00:15.280]
...
Pity the world, or else this glutton be,              => [00:00:43.640, 00:00:48.080]
To eat the world's due, by the grave and thee.        => [00:00:48.080, 00:00:53.240]

Typical applications of forced alignment include Audio-eBooks, closed captioning, and automating the creation of training data for automated speech recognition systems.

Programs and Libraries

The following matrix contains open source programs and libraries for computing forced alignments that have been actually proven to install and run (albeit the installation procedure for some of them is pretty complex).

All tools, except aeneas, are based on speech recognition algorithms; all tools, except aeneas and gentle, are maintained by research groups or individuals in academia.

Most tools are based on the HTK, which is not free for commercial purposes, although a commercial license can be purchased from the University of Cambridge.

You can also download the raw data file in JSON format.

Name Algorithm Supported Language(s) Interface Code Language(s) License Documentation Mailing List/Forum Active Notes
aeneas DTW 30+ CLI, LIB, Web Python, C AGPL Y Y Y Not based on ASR
CMU Sphinx HMM (own), RNN 11 CLI, LIB C, Java, Python MIT-like Y Y Y
DARLA HMM (HTK) English Web ? ? Y N N? Based on Prosodylab-Aligner or YouTube ASR
FAVE-align HMM (HTK) English CLI, (Web) Python GPL Y Y Y acustic models from P2FA; GitHub code updated more frequently than Web
Gentle HMM (Kaldi) English CLI, Web Python MIT N N Y Based on Kaldi
Julius HMM (own) English, Japanese CLI, LIB C MIT-like Y Y N?
Kaldi HMM (own), DNN, RNN English CLI, LIB C++ Apache Y Y Y CUDA support
kaldi-dnn-ali-gop HMM(Kaldi), DNN(Kaldi nnet3) English CLI, LIB Shell Script, C++, Python GPL N N Y Work with other languages given kaldi acoustic models
LaBB-CAT HMM (HTK) English Web Java GPL Y Y Y
MAUS HMM (HTK) 21 CLI, Web C All rights reserved README Y Y
Montreal Forced Aligner HMM (Kaldi) English CLI Python MIT Y N Y Can train other languages
Penn Forced Aligner (P2FA) HMM (HTK) English CLI, Web Python ? README, Tutorial N N?
Prosodylab-Aligner HMM (HTK) English CLI Python MIT README, Tutorial N Y Can train other languages
SailAlign HMM (HTK) English, Greek, Spanish CLI Perl GPL README N N?
SPPAS HMM (Julius) 12+ CLI, GUI Python GPL Y Y Y Can train other language, several plugins

Additional Pointers

forced-alignment-tools's People

Contributors

pettarin avatar thomaskisler avatar tbright17 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.