The reference implementation for the paper
End-to-End Attention-based Large Vocabulary Speech Recognition.
Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio.
(arxiv draft, submitted to ICASSP 2016).
- install all the dependencies (see the list below)
- set your environment variables by calling
source env.sh
Then, please proceed to exp/wsj
for the instructins how
to replicate our results on Wall Street Journal (WSJ) dataset
(available at the Linguistic Data Consortium as LDC93S6B and LDC94S13B).
- Python packages: pykwalify, toposort, pyyaml, numpy, pandas, pyfst
- kaldi
- kaldi-python
Given that you have the dataset in HDF5 format, the models can be trained without Kaldi.
The repository contains custom modified versions of Theano, Blocks, Fuel,
picklable-itertools, Blocks-extras as subtrees (please follow this link for
more information about subtrees). In order to ensure that these
specific versions are used, we recommend to uninstall regular installations
of these packages if you have them installed in addition to sourcing
env.sh
.
MIT