Git Product home page Git Product logo

neural_sp's Introduction

Build Status

NeuralSP: Neural network based Speech Processing

How to install

# Set path to CUDA, NCCL
CUDAROOT=/usr/local/cuda
NCCL_ROOT=/usr/local/nccl

export CPATH=$NCCL_ROOT/include:$CPATH
export LD_LIBRARY_PATH=$NCCL_ROOT/lib/:$CUDAROOT/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$NCCL_ROOT/lib/:$LIBRARY_PATH
export CUDA_HOME=$CUDAROOT
export CUDA_PATH=$CUDAROOT
export CPATH=$CUDA_PATH/include:$CPATH  # for warp-rnnt

# Install miniconda, python libraries, and other tools
cd tools
make KALDI=/path/to/kaldi

Key features

Corpus

  • ASR

    • AISHELL-1
    • CSJ
    • Librispeech
    • Switchboard (+ Fisher)
    • TEDLIUM2/TEDLIUM3
    • TIMIT
    • WSJ
  • LM

    • Penn Tree Bank
    • WikiText2

Front-end

  • Frame stacking
  • Sequence summary network [link]
  • SpecAugment [link]
  • Adaptive SpecAugment [link]

Encoder

  • RNN encoder
    • (CNN-)BLSTM, (CNN-)LSTM, (CNN-)BLGRU, (CNN-)LGRU
    • Latency-controlled BLSTM [link]
  • Transformer encoder [link]
    • (CNN-)Transformer
    • Chunk hopping mechanism [link]
    • Relative positional encoding [link]
  • Time-depth separable (TDS) convolution encoder [link] [line]
  • Gated CNN encoder (GLU) [link]
  • Conformer encoder [link]

Connectionist Temporal Classification (CTC) decoder

  • Forced alignment
  • Beam search
  • Shallow fusion

Attention-based decoder

  • RNN decoder
    • Shallow fusion
    • Cold fusion [link]
    • Deep fusion [link]
    • Forward-backward attention decoding [link]
    • Ensemble decoding
  • Streaming RNN decoder
    • Hard monotonic attention [link]
    • Monotonic chunkwise attention (MoChA) [link]
    • CTC-synchronous training (CTC-ST) [link]
  • RNN transducer [link]
  • Transformer decoder [link]
  • Streaming Transformer decoder
    • Monotonic Multihead Attention [link] [link]

Language model (LM)

  • RNNLM (recurrent neural network language model)
  • Gated convolutional LM [link]
  • Transformer LM
  • Transformer-XL LM [link]
  • Adaptive softmax [link]

Output units

  • Phoneme
  • Grapheme
  • Wordpiece (BPE, sentencepiece)
  • Word
  • Word-char mix

Multi-task learning (MTL)

Multi-task learning (MTL) with different units are supported to alleviate data sparseness.

  • Hybrid CTC/attention [link]
  • Hierarchical Attention (e.g., word attention + character attention) [link]
  • Hierarchical CTC (e.g., word CTC + character CTC) [link]
  • Hierarchical CTC+Attention (e.g., word attention + character CTC) [link]
  • Forward-backward attention [link]
  • LM objective

ASR Performance

AISHELL-1 (CER)

model dev test
Transformer 5.0 5.4
Streaming MMA 5.5 6.1

CSJ (WER)

model eval1 eval2 eval3
LAS 6.5 5.1 5.6

Switchboard 300h (WER)

model SWB CH
LAS 9.1 18.8

Switchboard+Fisher 2000h (WER)

model SWB CH
LAS 7.8 13.8

Librispeech (WER)

model dev-clean dev-other test-clean test-other
Transformer 2.1 5.3 2.4 5.7
Streaming MMA 2.5 6.9 2.7 7.1

TEDLIUM2 (WER)

model dev test
LAS 10.9 11.2

WSJ (WER)

model test_dev93 test_eval92
LAS 8.8 6.2

LM Performance

Penn Tree Bank (PPL)

model valid test
RNNLM 87.99 86.06
+ cache=100 79.58 79.12
+ cache=500 77.36 76.94

WikiText2 (PPL)

model valid test
RNNLM 104.53 98.73
+ cache=100 90.86 85.87
+ cache=2000 76.10 72.77

Reference

Dependency

neural_sp's People

Contributors

hirofumi0810 avatar elgeish avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.