Git Product home page Git Product logo

ctc's Introduction

Speech recognition with RNN and CTC

Build Status

Table of Contents

This repository is a part of TDAT3001 Bachelor Thesis in Computer Engineering at NTNU, project number 61:

"End-to-end speech recognition with recurrent neural networks and connectionist temporal classification" by Anita Kristine Aune and Marit Sundet-Holm, 2018.

The purpose of this project was to test different neural networks' performance on speech recognition, using recurrent neural networks (RNN) and connectionist temporal classification (CTC).

During this project we have tested various nettwork models for speech recognition. The resulting logs and saved models can be found at train-logs.

This project uses Python 2.7, TensorFlow version 1.6.1 and Keras version 2.1.5.

This installation guide is for macOS and Ubuntu. TensorFlow also supports Windows, but this project is not tested on Windows.

  1. Install Python
    Download and install Python 2.7 from Python download

  2. Install TensorFlow
    Install TensorFlow for Python 2.7 in a virtual environment following the TensorFlow installation
    If possible, GPU installation is recommended as it speeds up training significantly.

  3. Install requirements
    Fork and download or clone the project, and enter the downloaded directory:

    $ git clone https://github.com/holm-aune-bachelor2018/ctc.git
    $ cd ctc
    

    Ensure that the python environment where you installed TensorFlow is active and install the requirements:

    $ source /home/<user>/tensorflow/bin/activate
    (tensorflow) $ pip install -r requirements.txt
    

NOTE - multi GPU training
As per Keras version 2.1.5, there is a bug when trying to save the model during training when using the multi_gpu_model().
Please refer to this Multi-GPU Model Keras guide regarding how to save and load a multi-GPU model, including a work-around for the bug.


Download LibriSpeech

Ensure that the TensorFlow environment is active and that you are in the root project directory (ctc/) This will download 55 GB of speech data into the data_dir directory

(tensorflow) $ import_librispeech.py data_dir 


Running training
If using TensorFlow with CPU or 1 GPU, to run the training with default parameters, simply do:

(tensorflow) $ train.py

This sets up training with the default BRNN model, using a small amount of data for testing.

Example BRNN
Setting up a BRNN network, with 512 units, training on batch_size=64, epoch_len=256
That is, 64x256=16384 files or ~25 hours of data on the train-clean-360 dataset
Train for epochs = 50
Save the .csv log file as "logs/brnn_25hours"
Save the model every 10 epochs at "models/brnn_25hours.h5"

(tensorflow) $ train.py --units=512 --batch_size=64 --epoch_len=256 --epochs=50 --model_type='brnn' --model_save='models/brnn_25hours.h5' --log_file='logs/brnn_25hours'  

Example loading
To continue training the same model for another 50 epochs, use the model_load argument:

(tensorflow) $ train.py --model_load='models/brnn_25hours.h5' --units=512 --batch_size=64 --epoch_len=256 --epochs=50 --model_save='models/continued_brnn_25hours.h5' --log_file='logs/continued_brnn_25hours'  

Parallel GPU training
If running on multiple GPUs, enable multiGPU training:

(tensorflow) $ train.py --multi_GPU=2

Must be an even number of GPUs.

Example CuDNNLSTM
ONLY WORKS WITH GPU
With the GPU TensorFlow back you may wish to try the CuDNN optimised LSTM

(tensorflow) $ train.py --model_type=blstm --cudnn --units=512 --batch_size=64 --epoch_len=256 --epochs=50 --model_save='models/blstm_25hours.h5' --log_file='logs/blstm_25hours'

train.py is used to train models. predict.py is used to load already trained models, and produces predicions.

Parameters for train.py:

Training params

--batch_size: Number of files in one batch. Default=32
--epoch_len: Number of batches per epoch. 0 trains on full dataset. Default=32
--epochs: Number of epochs to train. Default=10
--lr: Learning rate. Default=0.0001
--log_file: Path to log stats to .csv file. Default='logs'

Multi GPU or single GPU / CPU training

--num_gpu: No. of gpu for training. (0,1) sets up training for one GPU or for CPU.
           MultiGPU training must be an even number larger than 1. Default=1

Preprocessing params

--feature_type: What features to extract: mfcc, spectrogram. Default='mfcc'
--mfccs: Number of mfcc features per frame to extract. Default=26
--mels: Number of mels to use in feature extraction. Default=40

Model params

--model_type: What model to train: brnn, blstm, deep_rnn, deep_lstm, cnn_blstm. Default='brnn'
--units: Number of hidden nodes. Default=256
--dropout: Set dropout value (0-1). Default=0.2
--layers: Number of recurrent or deep layers. Default=1
--cudnn: Include to use cudnn optimized LSTM.

Saving and loading model params

--model_save: Path, where to save model.
--checkpoint: No. of epochs before save during training. Default=10
--model_load: Path of existing model to load. If empty creates new model.
--load_multi: Include to load multi gpu model (saved during parallel GPU training).

Additional training settings

--save_best_val: Include to save additional version of model if val_loss improves.
--shuffle_indexes: Include to shuffle batches after each epoch. 
--reduce_lr: Include to reduce the learning rate if model stops improving val_loss.
--early_stopping: Include to stop the training early if val_loss stops improving.

alt text

Shows the overall structure of the project.

  • train.py sets up network training.

  • models.py sets up the model build.

  • data.py generates a DataFrame containing filename (path to audio files), filesize and transcripts.

  • DataGenerator.py supplies the fit_generator() in train.py with batches of data during training.

  • LossCallback.py is used by the fit_generator() in train to calculate WER and save model and logs during training.

  • The remaning is varying utilities.

Additionally, predict.py loads a trained model and creates prediction samples. It can also calculate WER.


This file is part of Speech recognition with CTC in Keras.

The project is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

The project is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this project. If not, see http://www.gnu.org/licenses/.

ctc's People

Contributors

anitakra avatar marith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ctc's Issues

problem with epoch length

hi,

I just can run the program with epoch_len=2 and as I increase it to even just 5, it stops with the following error. Therefore, I can not train the model on the entire database in each epoch.

Error:

tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at ctc_loss_op.cc:206 : Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 29 labels: 26,9,16,0,28,5,0,3,15,4,0 labels seen so far: 26,9,16,0

Thanks in advance

Have trouble loading models

I trained and saved a model successfully, but when I load back the model, it gave this error:
I am working on Windows 10 with tensorflow r1.10.[<tf.Tensor 'classifier/Reshape_1:0' shape=(?, ?, 47) dtype=float32>, <tf.Tensor 'labels:0' shape=(?, ?) dtype=float32>, <tf.Tensor 'input_length:0' shape=(?, 1) dtype=float32>, <tf.Tensor 'label_length:0' shape=(?, 1) dtype=float32>] XXX lineno: 358, opcode: 47 Traceback (most recent call last): File "D:\11785\HW3\hw3p2-data-V2\runner.py", line 332, in <module> train() File "D:\11785\HW3\hw3p2-data-V2\runner.py", line 299, in train net = tf.keras.models.load_model("models/model_every_ckpt_best", custom_objects=custom_objects) File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\saving.py", line 229, in load_model model = model_from_config(model_config, custom_objects=custom_objects) File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\saving.py", line 306, in model_from_config return deserialize(config, custom_objects=custom_objects) File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\layers\serialization.py", line 64, in deserialize printable_module_name='layer') File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\utils\generic_utils.py", line 173, in deserialize_keras_object list(custom_objects.items()))) File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\network.py", line 1219, in from_config process_node(layer, node_data) File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\network.py", line 1179, in process_node layer(input_tensors, **kwargs) File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 736, in __call__ outputs = self.call(inputs, *args, **kwargs) File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\keras\layers\core.py", line 715, in call return self.function(inputs, **arguments) File "D:/11785/HW3/hw3p2-data-V2/modules.py", line 358, in ctc_lambda_func y_pred, labels, input_length, label_length = args SystemError: unknown opcode [Finished in 26.4s with exit code 1]
Seems the Lambda layer is causing this problem, do you know why?

how predict?

i use predict.py to predict a file but plantext is not Unicode text. it like:
INFO:tensorflow:Inference results: [{'decoded': array([8, 8, 8]), 'plaintext': b'hhh'}]

how i can to predict in UTF-8, Unicode text

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.