Git Product home page Git Product logo

tamilnet's Introduction

TamilNet

Try it for yourself here: tamilnet.tech!

Recognizes handwritten Tamil characters with 90% accuracy. Credits to HP Labs India for the training and test datasets. This system uses a convolutional neural network (CNN), which is widely used across optical character recognition tasks.

Introduction

Tamil is language originally from the South Indian state of Tamil Nadu. It is predominantly used in South India, Sri Lanka, and Singapore. It is one of the oldest languages in the world and is spoken by over 80 million people worldwide. Tamil uses a non-Latin script; the alphabet consists of 156 characters, including 12 vowels and 23 consonants. Due to the large number of classes and the extreme similarity between certain characters, accurate Tamil character recognition is more challenging than standard Latin character recognition. As with any language, handwritten character recognition is useful in a wide range of applications, including the digitization of legal documents, mail sorting in post offices, and bank check reading.

Dataset

The dataset of offline handwritten Tamil characters is taken from HP Labs India. It contains approximately 500 examples of each of the 156 characters, written by native writers in Tamil Nadu, India. For the IWFHR 2006 Tamil Character Recognition Competition, the entire datset was split into separate training (50,683 examples) and test sets (26,926 examples), which were used here. The provided training set was subsequently split into a new training set and a validation set in a 80% to 20% ratio.

The bi-level images are initially provided as TIFF files of various sizes. After being converted to the PNG format, the images were inverted such that the foreground and background were white and black, respectively, and a constant thickening factor was applied. Then, the images were resized such that the longer side length was 48 pixels, using the Lanczos algorithm. The Lanczos algorithm applies anti-aliasing, causing the image to shift from bi-level to grayscale. Finally, the centers of mass of the resulting images were centered on a new 64 x 64 canvas. These images are normalized by transforming each grayscale pixel value from the [0, 1] range to the [-1, 1] range.

Architecture

The input is passed into the model as a 64 x 64 image. The model is structures as follows:
[1x64x64] INPUT
[16x64x64] CONV: 16 3x3 filters with stride 1, pad 1
[16x64x64] CONV: 16 3x3 filters with stride 1, pad 1
[16x32x32] MAX POOL: 2x2 filters with stride 2
[32x32x32] CONV: 32 3x3 filters with stride 1, pad 1
[32x32x32] CONV: 32 3x3 filters with stride 1, pad 1
[32x16x16] MAX POOL: 2x2 filters with stride 2
[64x16x16] CONV: 64 3x3 filters with stride 1, pad 1
[64x16x16] CONV: 64 3x3 filters with stride 1, pad 1
[64x8x8] MAX POOL: 2x2 filters with stride 2
[1024] FC: 1024 neurons
[512] FC: 512 neurons
[156] FC: 156 neurons (class neurons)

Every convolutional and fully connected layer is directly followed by batch normalization and a ReLU activation.

The architecture I chose was partially inspired by Handwritten Tamil Recognition using a Convolutional Neural Network by Prashanth Vijayaraghavan and Misha Sra as well as Benchmarking on offline Handwritten Tamil Character Recognition using convolutional neural networks by B.R. Kavitha and C. Srimathi. I felt that this architecture was complex enough to fit the data well, while lightweight enough to be deployed in a web application, which was my intended use.

Experiments

Training

Training was done on a GPU via Google Colab. There were several hyperparameters to tune, including but not limited to learning rate, weight decay (L2 regularization penalty), and initialization. Throughout the process, I referred to the online Notes for CS231n at Stanford by Andrej Karpathy. I tested applying dropout on all layers as well as on only fully connected layers, but both configurations resulted in lower validation accuracy. Thus, an L2 penalty of 0.003 was chosen. All layers were initialized using Kaiming initialization and the optimizer of choice was Adam, with a learning rate of 0.001.

Testing

Testing was also conducted on a Google Colab GPU. The final model achieved 90.7% accuracy on the test set, which was satisfactory for me. As previously mentioned, since there are 156 classes, several of which are very similar to one another, attaining high accuracy is an especially difficult task. Test accuracy was consistently lower than validation accuracy, which suggests that the test set for the competition was deliberately made to be more difficult than the training set.

Web App

The model weights of the final CNN were downloaded in the PyTorch PT format. The web app is a fairly simple one, which uses the Flask micro web framework. It consists of a canvas on which the user draws, as well as buttons to clear the canvas and submit the handwritten character for recognition. The page also includes instructions that detail how to use the tool and suggests a character to draw (primarily aimed towards non-Tamil-speaking users). Several of the elements of the page are implemented using the Bootstrap CSS framework, which provides a more appealing layout and appearance.

The main.js JavaScript file takes care of accepting user input and displaying the model's output. The python scripts then process the data just as it was done during the training and testing processes, with the additional step of finding the bounding box of the character within the canvas to ensure that the character is not too small. The predicted character, along with the model's confidence (obtained using a softmax function), is displayed on the screen.

Conclusion

I really enjoyed working on this project! I was able to develop everything from the neural network to itself to the user-facing web app. It was a great learning experience as well, as there were several bugs and issues (as there are in any project), but I was able to fix the issues or find workarounds. Plus, I was able to refresh my own Tamil writing and reading abilities!

Next Steps

The resulting website can be used in several ways, such as a tool to practice handwriting for both children and adults alike. There are plenty of possible extensions for a project like this. A audio tool could be added, for example, to teach the pronunciation of each written character. The optical character recognition system would be expanded to take in whole words, which would involve character segmentation. The possibilities are truly endless!

tamilnet's People

Contributors

ganeshmm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.