Git Product home page Git Product logo

air-text's Introduction

Air-Text: Air-Writing and Recognition System (ACMMM 2021, Oral)

This is the official PyTorch implementation for our work; Air-Text: Air-Writing and Recognition System. Here, you can find source codes for model training and brief demo as shown below.


Overview

Air-Text is a novel system to write in the air using fingertips as a pen. Air-Text provides various functionalities by the seamless integration of Air-Writing and Text-Recognition Modules. Specifically, the Air-Writing Module takes a sequence of RGB images as input and tracks both the location of fingertips and current hand gesture class frame by frame. Users can easily perform writing operations such as writing or deleting a text by changing hand gestures, and tracked fingertip locations can be stored as a binary image. Then the Text-Recognition Module, which is compatible with any pre-trained recognition models, predicts a written single digit or English word in the binary image.


Environment Setup

All the experiments were performed on Ubuntu 16.04 and using Anaconda is recommended.

conda create -n airtext_env python=3.6
conda activate airtext_env
conda install pytorch==1.5.0 torchvision==0.6.0 cudatoolkit=10.2 -c pytorch
pip install opencv-python torchsummary tensorboardX matplotlib lmdb natsort nltk

Training Air-Writing Module

In order to train Air-Writing Module, first download SCUT-Ego-Gesture dataset. (Currently, as there is no direct link to download the dataset, you should consider contacting the authors of this paper.)

Download and extract the dataset in ./AirWritingModule/dataset, type command cd ./AirWritingModule and run train.py.


Training Text-Recognition Module

Single Digit Recognition

In order to train Text-Recognition Module for single digit recognition, just type command cd ./TextRecognitionModule/MNIST and run digitmodel.py. Downloading MNIST dataset and training will be started automatically.

English Word Recognition

In order to train Text-Recognition Module for English word recognition, first download IAHEW-UCAS2016 dataset. (You should contact the authors of this paper.) Download and extract the dataset in ./TextRecognitionModule/Word/data and run following commands for data pre-processing.

cd ./TextRecognitionModule/Word
python plot_real.py              \\ To convert sequence data into images and it takes a large amount of time
python create_dataset_real.py    \\ To convert image files into lmdb format

We are going to use TPS-ResNet-BiLSTM-Attn as the model for English word recognition. Clone this github reopsitory as follows.

git clone https://github.com/clovaai/deep-text-recognition-benchmark dtrb
cd dtrb

Generate ./traindata and ./validdata folders and put pre-processed IAHEW-UCAS2016 dataset in each corresponding folders. Then, run following command for training the model.

python train.py --train_data ./traindata --valid_data ./validdata --select_data / --batch_ratio 1 --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --character abcdefghijklmnopqrstuvwxyz

Pre-trained weights

Pre-trained weights for above all three models can be downloaded from here. Extract all the files in the root directory of this repository.


Demo

First, check you can get a video input by connecting a webcam to your desktop. If you want to test single digit recognition, please run demo_digit.py in the terminal. Or you can test English word recognition by running demo_word.py.


Acknowledgments

  • This work was partly supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (No.2020-0-00440, Development of artificial intelligence technology that continuously improves itself as the situation changes in the real world) and (No.2020-0-00842, Development of Cloud Robot Intelligence for Continual Adaptation to User Reactions in Real Service Environments).

  • Parts of the codes and datasets are adopted from previous works (Deep-Text-Recognition-Benchmark, YOLSE, and Attention Recurrent Translator). We appreciate the original authors for their awesome works.

Citation

@inproceedings{lee2021air,
  title={Air-Text: Air-Writing and Recognition System},
  author={Lee, Sun-Kyung and Kim, Jong-Hwan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={1267--1274},
  year={2021}
}

air-text's People

Contributors

sklee2014 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.