Git Product home page Git Product logo

sign-language's Introduction

Sign Language Recognition

  • This prototype "understands" sign language for deaf people
  • Includes all code to prepare data (eg from ChaLearn dataset), extract features, train neural network, and predict signs during live demo
  • Based on deep learning techniques, in particular convolutional neural networks (including state-of-the-art 3D model) and recurrent neural networks (LSTM)
  • Built with Python, Keras+Tensorflow and OpenCV (for video capturing and manipulation)

For 10-slide presentation + 1-min demo video see here.

Requirements

This code requires at least

  • python 3.6.5
  • tensorflow 1.8.0
  • keras 2.2.0
  • opencv-python 3.4.1.15

For the training of the neural networks a GPU is necessary (eg aws p2.xlarge). The live demo works on an ordinary laptop (without GPU), eg MacBook Pro, i5, 8GB.

Get the video data

See here for overview of suitable data-sets for sign-language for deaf people: https://docs.google.com/presentation/d/1KSgJM4jUusDoBsyTuJzTsLIoxWyv6fbBzojI38xYXsc/edit#slide=id.g3d447e7409_0_0

Download the ChaLearn Isolated Gesture Recognition dataset here: http://chalearnlap.cvc.uab.es/dataset/21/description/ (you need to register first)

The ChaLearn video descriptions and labels (for train, validation and test data) can be found here: data_set/chalearn

prepare_chalearn.py is used to unzip the videos and sort them by labels (using Keras best-practise 1 folder = 1 label): folderstructure

Prepare the video data

Extract image frames from videos

frame.py extracts image frames from each video (using OpenCV) and stores them on disc.

See pipeline_i3d.py for the parameters used for the ChaLearn dataset:

  • 40 frames per training/test videos (on average 5 seconds duration = approx 8 frames per second)
  • Frames are resized/cropped to 240x320 pixels

Calculate optical flow

opticalflow.py calculates optical flow from the image frames of a video (and stores them on disc). See pipeline_i3d.py for usage.

Optical flow is very effective for this type of video classification, but also very calculation intensive, see here.

Train the neural network

train_i3d.py trains the neural network. First only the (randomized) top layers are trained, then the entire (pre-trained) network is fine-tuned.

A pre-trained 3D convolutional neural network, I3D, developed in 2017 by Deepmind is used, see here and model_i3d.py.

Training requires a GPU and is performed through a generator which is provided in datagenerator.py.

Note: the code files containing "_mobile_lstm" are used for an alternative NN architecture, see here.

Predict during live demo

livedemo.py launches the webcam,

  • waits for the start signal from user,
  • captures 5 seconds of video (using videocapture.py),
  • extracts frames from the video
  • calculates and displays the optical flow,
  • and uses the neural network to predict the sign language gesture.

The neural network model is not included in this GitHub repo (too large) but can be downloaded here (150 MB).

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgments

sign-language's People

Contributors

frederikschorr avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.