Font_Recognition

Introduction to Project

In this project, you'll train a convolutional neural network to classify and recognize different categories of fonts. We'll be using the dataset of 100 types of fonts to train our model. The project is broken down into multiple steps:

Generating the dataset of fonts from package
Loading and preprocessing the dataset
Visualization of samples from the dataset
Train the Convolutional Neural Network on your dataset
Use the trained model to predict new fonts

The whole project is implemented in Python using tensorflow.

Dataset Description

Adobe VFR dataset is the first large-scale, fine-grained benchmark of font text images, for the task of font recognition and retrieval. Unfortunately, it is very huge so I was unable to download it. Then, I discovered the TRDG (Text Recognition Data Generator) package created by Edouard Belval in Python. It is a synthetic data generator for text recognition.

How does the Belval's original TRDG package work?

Words will be randomly chosen from a dictionary of a specific language. Then, an image of those words will be generated by using font, background, and modifications (skewing, blurring, etc.) as specified. The usage as a Python module is very similar to the CLI.

But the problem arose when I had to create a dataset in a systematic manner with particular font images in their respective folders. As you can see that the original TRDG package produces all random images in a single output folder from the total list of fonts unless specified a particular font and directory. According to which, I have to run the package for each font with its directory again and again.

How does my TRDG package work?

I modified the above package to suit my problem. I created a loop inside the main file (run.py) to create a directory for each font provided in the font list (folder) and store its images respectively. Now, after running the package once, it generates font images and stores them in their specified folder (named after that particular font) in a loop for each font provided in the font list.

I have kept my package separate from the main pipeline in the trdg folder to run it in CLI. All the rest of the documentation and working is the same as per Belval's original package.

How I created my dataset?

I ran the package several times in CLI setting different parameters every time to create some distortion in the dataset:

Run the main file to produce basic images consisting of 70% of the dataset.

python run.py -c <no. of images>

Run the main file to produce sine-wave distortion consisting of 10% of the dataset. Argument -d has 3 possible values:

None
Sine wave
Cosine wave
Random

python run.py -c <no. of images> -d <1>

Run the main file to produce random wave distortion consisting of 10% of the dataset.

python run.py -c <no. of images> -d <3>

Run the main file to produce skewed font images consisting of 5% of the dataset. Argument -k is the skew angle and -rk is set to True to enable random skewing between the range of positive and negative values of k.

python run.py -c <no. of images> -k <skew angle> -rk <True>

Run the main file to apply the gaussian blur to the resulting sample consisting of 5% of the dataset. Argument -bl is an integer defining the blur radius and -rbl is set to True to enable random blur between the range of positive and negative values of bl.

python run.py -c <no. of images> -bl <blur radius> -rbl <True>

NOTE: All the resulting images are pre-processed by cropping them to 100x100 pixels. The input dataset must be of fixed dimensions before passing them into the model for training.

Files Description

trdg The (Text Recognition Data Generator) TRDG package created by Edouard Belval in Python. It is a synthetic data generator for text recognition.
train It contains the training dataset of 100 types of fonts consisting of 1000 sample images per font (100x100 pixels RGB) provided in a jpeg format.
valid It contains the validation dataset of 100 types of fonts consisting of 200 sample images per font (100x100 pixels RGB) provided in a jpeg format.
test It contains the test dataset for prediction.
train.py It contains the code for loading and pre-processing the dataset. It is also used in training the model using tensorflow and saving it for further use.
test.py It contains the code for loading and pre-processing the image for testing. The pre-trained model in train.py is loaded for making the prediction.
utils.py It contains the functions and CNN model along with any supporting code, which is imported by main files (train.py and test.py) for implementation.

Installation

The Code is written in Python and implemented using tensorflow.

Additional Packages that are required are Numpy, MatplotLib, PIL, Scikit-image, and Tensorflow. You can download them using pip

pip install numpy

pip install matplotlib

pip install pil

pip install skimage

pip install tensorflow

To install TensorFlow, head over to the Tensorflow link and follow the instructions given.

GPU/CPU

As this project uses CNN with a huge amount of parameters for training, you need to use a GPU.

License

MIT License

Author

Shubham Kapoor

Credits

Edouard Belval - TRDG (Text Recognition Data Generator) package link

ayushsagar916 / font_recognition Goto Github PK