This is the official PyTorch implementation for our work; Air-Text: Air-Writing and Recognition System. Here, you can find source codes for model training and brief demo as shown below.
Air-Text is a novel system to write in the air using fingertips as a pen. Air-Text provides various functionalities by the seamless integration of Air-Writing and Text-Recognition Modules. Specifically, the Air-Writing Module takes a sequence of RGB images as input and tracks both the location of fingertips and current hand gesture class frame by frame. Users can easily perform writing operations such as writing or deleting a text by changing hand gestures, and tracked fingertip locations can be stored as a binary image. Then the Text-Recognition Module, which is compatible with any pre-trained recognition models, predicts a written single digit or English word in the binary image.
All the experiments were performed on Ubuntu 16.04 and using Anaconda is recommended.
conda create -n airtext_env python=3.6
conda activate airtext_env
conda install pytorch==1.5.0 torchvision==0.6.0 cudatoolkit=10.2 -c pytorch
pip install opencv-python torchsummary tensorboardX matplotlib lmdb natsort nltk
In order to train Air-Writing Module, first download SCUT-Ego-Gesture dataset. (Currently, as there is no direct link to download the dataset, you should consider contacting the authors of this paper.)
Download and extract the dataset in ./AirWritingModule/dataset
, type command cd ./AirWritingModule
and run train.py
.
In order to train Text-Recognition Module for single digit recognition, just type command cd ./TextRecognitionModule/MNIST
and run digitmodel.py
. Downloading MNIST dataset and training will be started automatically.
In order to train Text-Recognition Module for English word recognition, first download IAHEW-UCAS2016 dataset. (You should contact the authors of this paper.) Download and extract the dataset in ./TextRecognitionModule/Word/data
and run following commands for data pre-processing.
cd ./TextRecognitionModule/Word
python plot_real.py \\ To convert sequence data into images and it takes a large amount of time
python create_dataset_real.py \\ To convert image files into lmdb format
We are going to use TPS-ResNet-BiLSTM-Attn as the model for English word recognition. Clone this github reopsitory as follows.
git clone https://github.com/clovaai/deep-text-recognition-benchmark dtrb
cd dtrb
Generate ./traindata
and ./validdata
folders and put pre-processed IAHEW-UCAS2016 dataset in each corresponding folders. Then, run following command for training the model.
python train.py --train_data ./traindata --valid_data ./validdata --select_data / --batch_ratio 1 --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --character abcdefghijklmnopqrstuvwxyz
Pre-trained weights for above all three models can be downloaded from here. Extract all the files in the root directory of this repository.
First, check you can get a video input by connecting a webcam to your desktop.
If you want to test single digit recognition, please run demo_digit.py
in the terminal. Or you can test English word recognition by running demo_word.py
.
-
This work was partly supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (No.2020-0-00440, Development of artificial intelligence technology that continuously improves itself as the situation changes in the real world) and (No.2020-0-00842, Development of Cloud Robot Intelligence for Continual Adaptation to User Reactions in Real Service Environments).
-
Parts of the codes and datasets are adopted from previous works (Deep-Text-Recognition-Benchmark, YOLSE, and Attention Recurrent Translator). We appreciate the original authors for their awesome works.
@inproceedings{lee2021air,
title={Air-Text: Air-Writing and Recognition System},
author={Lee, Sun-Kyung and Kim, Jong-Hwan},
booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
pages={1267--1274},
year={2021}
}