Git Product home page Git Product logo

lipreading's Introduction

LipReading

Main repository for LipReading with Deep Neural Networks

Introduction

The goal is to implement LipReading: Similar to how end-to-end Speech Recognition systems work, mapping high-fidelity speech audio to sensible characters and word level outputs, we will do the same for "speech visuals". In particular, we will take video frame input, extract the relevant mouth/chin signals as input to map to characters and words.

Overview

TODO

A high level overview of some TODO items. For more project details please see the Github project

  • Download Data (926 videos)
  • Build Vision Pipeline (1 week) in review
  • Build NLP Pipeline (1 week) wip
  • Build Loss Fn and Training Pipeline (2 weeks) wip
  • Train ๐Ÿš‹ and Ship ๐Ÿšข wip

Architecture

There are two primary interconnected pipelines: a "vision" pipeline for extracting the face and lip features from video frames, along with a "nlp-inspired" pipeline for temporally correlating the sequential lip features into the final output.

Here's a quick dive into tensor dimensionalities

Vision Pipeline

Video -> Frames       -> Face Bounding Box Detection      -> Face Landmarking    
Repr. -> (n, y, x, c) -> (n, (box=1, y_i, x_i, w_i, h_i)) -> (n, (idx=68, y, x))   

NLP Pipeline

 -> Letters  ->  Words    -> Language Model 
 -> (chars,) ->  (words,) -> (sentences,)

Datasets

  • all: 926 videos (projected, not generated yet)
  • large: 464 videos (failed at 35/464)
  • medium: 104 videos (currently at 37/104)
  • small: 23 videos
  • micro: 6 videos
  • nano: 1 video

Setup

  1. Clone this repository and install the requirements. We will be using python3.

Please make sure you run python scripts, setup your PYTHONPATH at ./, as well as a workspace env variable.

git clone [email protected]:joseph-zhong/LipReading.git 
# (optional, setup venv) cd LipReading; python3  -m venv .
  1. Once the repository is cloned, the last step for setup is to setup the repository's PYTHONPATH and workspace environment variable to take advantage of standardized directory utilities in ./src/utils/utility.py

Copy the following into your ~/.bashrc

export PYTHONPATH="$PYTHONPATH:/path/to/LipReading/" 
export LIP_READING_WS_PATH="/path/to/LipReading/"
  1. Install the simple requirements.txt with PyTorch with CTCLoss, SpaCy, and others.

On MacOS for CPU capabilities only.

pip3 install -r requirements.macos.txt

On Ubuntu, for GPU support

pip3 install -r requirements.ubuntu.txt

SpaCy Setup

We need to install a pre-built English model for some capabilities

python3 -m spacy download en

Data Directories Structure

This allows us to have a simple standardized directory structure for all our datasets, raw data, model weights, logs, etc.

./data/
  --/datasets (numpy dataset files for dataloaders to load)
  --/raw      (raw caption/video files extracted from online sources)
  --/weights  (model weights, both for training/checkpointing/running)
  --/tb       (Tensorboard logging)
  --/...

See ./src/utils/utility.py for more.

Getting Started

Now that the dependencies are all setup, we can finally do stuff!

Configuration

Each of our "standard" scripts in ./src/scripts (i.e. not ./src/scripts/misc) take the standard argsparse-style arguments. For each of the "standard" scripts, you will be able to pass --help to see the expected arguments. To maintain reproducibility, cmdline arguments can be written in a raw text file with one argument per line.

e.g. for ./config/gen_dataview/nano

--inp=StephenColbert/nano 

Represent the arguments to pass to ./src/scripts/generate_dataview.py, automatically passable via

./src/scripts/generate_dataview.py $(cat ./config/gen_dataview/nano)

The arguments will be used from left-to-right order, so if arguments are repeated, they will be overwritten by the latter settings. This allows for modularity in configuring hyperparameters.

(For demonstration purposes, not a working example)

./src/scripts/train.py \
    $(cat ./config/dataset/large) \
    $(cat ./config/train/model/small-model) \
    $(cat ./config/train/model/rnn/lstm) \
    ...

Train Model

  1. Train Model
./src/scripts/train.py

Examples

Training on Micro

./src/scripts/train_model.py $(cat ./config/train/micro)

Tensorboard Visualization

See README_TENSORBOARD.md

Other Resources

This is a collection of external links, papers, projects, and otherwise potentially helpful starting points for the project.

Other Projects

Other Academic Papers

Academic Datasets

lipreading's People

Contributors

fengxy7 avatar joseph-zhong avatar patrickspieker avatar rfrowe avatar zhaofengwu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.