Git Product home page Git Product logo

pyfoal's Introduction

Python forced alignment

PyPI License Downloads

This is a modified implementation of the Penn Phonetic Forced Aligner (P2FA) [1]. Relative to the original implementation, this repo provides the following.

  • Support for Python 3
  • Support for performing forced alignment both in Python and on the command-line
  • Fewer alignment failures due to, e.g., out-of-vocabulary (OOV) words or punctuation
  • Direct integration with pypar, a feature-rich phoneme alignment representation.
  • Multiprocessing for quickly aligning speech datasets
  • Clean, documented code

Installation

Hidden Markov Model Toolkit (HTK)

pyfoal depends on HTK and has been tested on Mac OS and Linux using HTK version 3.4.0. There are known issues in using version 3.4.1 on Linux. HTK is released under a license that prohibits redistribution, so you must install HTK yourself and verify that the commands HCopy and HVite are available as system-wide binaries. After downloading HTK, I use the following for installation on Linux.

sudo apt-get install -y gcc-multilib libx11-dev
sudo chmod +x configure
./configure --disable-hslab
make all
sudo make install

For more help with HTK installation, see notes by Jaekoo Kang and Steve Rubin.

Python dependencies

pip install pyfoal

Usage

Force-align text and audio

alignment = pyfoal.align(text, audio, sample_rate)

text is a string containing the speech transcript. audio is a 1D numpy array containing the speech audio.

Force-align from files

# Return the resulting alignment
alignment = pyfoal.from_file(text_file, audio_file)

# Save alignment to json
pyfoal.from_file_to_file(text_file, audio_file, output_file)

If you need to align many files, use from_files_to_files, which accepts lists of files and uses multiprocessing.

Command-line interface

usage: python -m pyfoal
    [-h]
    --text TEXT [TEXT ...]
    --audio AUDIO [AUDIO ...]
    --output OUTPUT [OUTPUT ...]

optional arguments:
  -h, --help            show this help message and exit
  --text TEXT [TEXT ...]
                        The speech transcript files
  --audio AUDIO [AUDIO ...]
                        The speech audio files
  --output OUTPUT [OUTPUT ...]
                        The json files to save the alignments

Tests

Tests can be run as follows.

pip install pytest
pytest

References

[1] J. Yuan and M. Liberman, “Speaker identification on the scotus corpus,” Journal of the Acoustical Society of America, vol. 123, p. 3878, 2008.

pyfoal's People

Contributors

maxrmorrison avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.