rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.

###About

This is an ongoing project, working towards an implementation of the charater-level ISR detailed in the paper by Kyuyeon Hwang and Wonyong Sung. It works at the character level using 1 deep rnn trained with ctc loss for the acoustic model, and one deep rnn trained for a character-level language model. The acoustic model reads in log mel frequency filterbank feature vectors (40-dim inputs).

The audio signal processing is done using librosa.

Currently only the acoustic model has been completed and it still lack a good trained example. One pre-trained example is available here and can be tried on any file (your own recorded voice for example).

The character-level language model is still in the works.

###Data

The datasets currently supported are :

The data is fed through two pipelines, one for testing, and the other for training.

###How to Run ####Install dependencies #####Required

TensorFlow (>= 0.12RC1)
librosa

Install TensorFlow by following the website documentation. GPU support is not mandatory but strongly recommended if you intend to train the RNN.

Install other required dependencies by running :

pip3 install -r requirements.txt

#####Optional

sox (for live transcript only, install with sudo apt-get install sox or brew install sox --with-flac)
libcupti (for timeline only, install with : sudo apt-get install libcupti-dev)
pyaudio (for live transcript only, install with : sudo apt-get install python3-pyaudio)

####Run data preparation Script

I've prepared a bash script to download LibriSpeech (~700mb) and extract the data to the right place :

$ chmod +x prepare_data.sh
$ ./prepare_data.sh

It will remove the tar files after downloading and unzipping.

Change Network Parameters

All hyper parameters for the network are defined in config.ini. A different config file can be fed to the training program using something like:

$ python stt.py --config_file="different_config_file.ini"

You should ensure it follows the same format as the one provided.

####Running Optimizer Once your dependencies are set up, and data is downloaded and extracted into the appropriate location, the optimizer can be started by doing :

$ python stt.py --train

Dynamic RNNs are used as memory consumption on the entirely unrolled network was massive, and the model would take 30 minutes to build. Unfortunately this comes at a cost to speed, but I think in this case the tradeoff is worth it (as the model can now fit on a single GPU).

####Running the network You can also use a trained network to process a wav file

$ python stt.py --file "path_to_file.wav"

The result will be printed on standard input. At this time only the acoustic model will process so the result can be weird.

####Analysing performance You can add the --timeline option in order to produce a timeline file and see how everything is going.

The resulting file will be overridden at each step. It can be opened with Chrome, opening chrome://tracing/ and loading the file.

###Project Road Map

With verification and testing performed somewhere at every step:

~~Build character-level RNN code~~
Add ctc beam search
Wrap acoustic model and language model into general 'Speech Recognizer'
Add ability for human to sample and test

Ultimately I'd like to work towards bridging this with my other project neural-chatbot to make an open-source natural conversational engine.

###License

MIT

###References

LibriSpeech

"LibriSpeech: an ASR corpus based on public domain audio books", Vassil Panayotov, Guoguo Chen, Daniel Povey andSanjeev Khudanpur, ICASSP 2015

Shtooka

http://shtooka.net

####Vystadial 2013

Korvas, Matěj; Plátek, Ondřej; Dušek, Ondřej; Žilka, Lukáš and Jurčíček, Filip, 2014, Vystadial 2013 – Czech data,
LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague,
http://hdl.handle.net/11858/00-097C-0000-0023-4670-6.

TED-LIUM Corpus

A. Rousseau, P. Deléglise, and Y. Estève, "Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks",
in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), May 2014.

marouangit / rnn-speech Goto Github PK

rnn-speech's Introduction

rnn-speech

Change Network Parameters

LibriSpeech

Shtooka

TED-LIUM Corpus

rnn-speech's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent