Git Product home page Git Product logo

lipreading's Introduction

LipReading

This is the keras implementation of Lip2AudSpec: Speech reconstruction from silent lip movements video.

Main Network

Abstract

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy.

Full paper for this work can be found here.

Requirements

We implemented the code in python2 using tensorflow, keras, scipy, numpy, cv2, sklearn, IPython, fnmatch. The mentioned libraries should be installed before running the codes. All the libraries can be easily installed using pip:

pip install tensorflow-gpu keras scipy opencv-python sklearn

The backend for Keras can be changed easily if needed.

Data preparation

This study is based on GRID corpus(http://spandh.dcs.shef.ac.uk/gridcorpus/). To run the codes, you need to first download and preprocess both videos and audios.

By running prepare_crop_files.py data will be downloaded and frames will be cropped by a manual mask. In order to generate auditory spectrograms, the audios should be processed by NSLTools(http://www.isr.umd.edu/Labs/NSL/Software.htm) using wav2aud function in Matlab.

Since some of the frames in the dataset are corrupted, we generate a path for valid data by create_path.py. Last step before training the network is windowing and integration of all data in .mat formats. This can be done by running data_integration.py

Training the models

Once data preparation steps are done, autoencoder model could be trained on the auditory spectrograms corresponding to valid videos using train_autoencoder.py. Training the main network could be performed using train_main.py.

Demo

You can find all demo files here.

A few samples of the network output are given below:

Speaker 1

Sample1

Speaker 29

Sample2

Cite

If you found this work/code helpful, please cite:

@article{akbari2017lip2audspec,
  title={Lip2AudSpec: Speech reconstruction from silent lip movements video},
  author={Akbari, Hassan and Arora, Himani and Cao, Liangliang and Mesgarani, Nima},
  journal={arXiv preprint arXiv:1710.09798},
  year={2017}
}

lipreading's People

Contributors

himani-arora avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lipreading's Issues

Will unseen model predict for any video content or content only from GRID.txt ?

  • Why is the format of prediction as well for training defined as command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4).

  • Will it work for any video I use to predict with the help of unseen model weights? (As per my understanding, It extracts the lip region using dlib and then try to map visual content to word conversion model?)

Problem with data_integration file

I get fallowing error during data integration process:

File "data_integration.py", line 55, in <module> model=load_model('autoencoder.h5',custom_objects={'corr2_mse_loss': corr2_mse_loss})
...
IOError: Unable to open file (unable to open file: name = #'autoencoder.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

It seems that there is no autoencoder.h5 file. Where I will find it?
All the previous steps of preparation went well with few changes. Please help me with this.

prepare_crop_files.py unclear

Hello,
As is, it is clear that prepare_crop_files.py will not work as intended with the download commands commented out. However, this makes it unclear which other lines, if any, should be uncommented to achieve the intended preparation. It would be helpful if you updated this so it works as intended without further modification.
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.