Git Product home page Git Product logo

surabhigovil / lipscribe Goto Github PK

View Code? Open in Web Editor NEW
3.0 4.0 0.0 68.05 MB

3D CNN based video classification android application. Transcribes lip movements of the speaker in a silent video to text. The neural network captures spatio temporal information from video required to generate words from video. MLOps using Vertex AI was used to deploy the model in a CI/CD fashion on android app

Python 69.13% Java 30.87%
3d-cnn android-application deep-learning vertex-ai mlops video-processing

lipscribe's Introduction

LipScribe: An application that converts lip movements of speakers in a silent video to text and display that using an android application. Exploiting the capabilities of 3D CNN to extract information from spatio temporal data this Deep Learning aims at creating words from a sequence of frames in a video.

Android application installation and using:

  1. Install the android application using the APK from https://drive.google.com/drive/folders/10pGHK0VYddb7Kn0rjqMDCVR4Nh-bR_U3?usp=sharing
  2. On launching the application on android phone, it checks for the camera and requests the permission for recording videos and capturing pictures.
  3. Choose to allow the application to record videos and capture pictures.
  4. Click on allow for the application on request for accessing the files and media.
  5. Click on 'start camera' on start page to start recording the video.
  6. The recorded video is processed from external storage and given to model for prediction this happens in background and loading screen is displayed on screen.
  7. The prediction of the word speaker utterted is displayed on the screen.

Working of the application:

  1. Use the android aplication to record a video.
  2. The process goes through preprocessing where Haar Cascade Classifier extract frames video and subsequently lips of a speaker from those frames.
  3. This is sent to a 3D CNN model which outputs a word as the final output.

Model Evaluation:

image

Android Application:

An android operating system compatible application is developed to deploy the predictions from the model. The application requires model built with tensorflow version 1.15. ffmpeg library is used to extract frames for data preprocessing from a video. The mouth region is extracted and converted into embeddings and passed as input to the model .

Demo:

Deetcting a word on app: Screenshot_20211123_143827 Screenshot_20211123_143929 Screenshot 2021-11-23 at 3 22 44 PM

No Faace Detection on App: image Screenshot 2021-11-23 at 3 23 53 PM

lipscribe's People

Contributors

gayathripulagam avatar rishitha97 avatar shrey1234 avatar surabhigovil avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.