Git Product home page Git Product logo

font_recognition's Introduction

Font_Recognition

Introduction to Project

In this project, you'll train a convolutional neural network to classify and recognize different categories of fonts. We'll be using the dataset of 100 types of fonts to train our model. The project is broken down into multiple steps:

  • Generating the dataset of fonts from package
  • Loading and preprocessing the dataset
  • Visualization of samples from the dataset
  • Train the Convolutional Neural Network on your dataset
  • Use the trained model to predict new fonts

The whole project is implemented in Python using tensorflow.

Dataset Description

Adobe VFR dataset is the first large-scale, fine-grained benchmark of font text images, for the task of font recognition and retrieval. Unfortunately, it is very huge so I was unable to download it. Then, I discovered the TRDG (Text Recognition Data Generator) package created by Edouard Belval in Python. It is a synthetic data generator for text recognition.

How does the Belval's original TRDG package work?

Words will be randomly chosen from a dictionary of a specific language. Then, an image of those words will be generated by using font, background, and modifications (skewing, blurring, etc.) as specified. The usage as a Python module is very similar to the CLI.

But the problem arose when I had to create a dataset in a systematic manner with particular font images in their respective folders. As you can see that the original TRDG package produces all random images in a single output folder from the total list of fonts unless specified a particular font and directory. According to which, I have to run the package for each font with its directory again and again.

How does my TRDG package work?

I modified the above package to suit my problem. I created a loop inside the main file (run.py) to create a directory for each font provided in the font list (folder) and store its images respectively. Now, after running the package once, it generates font images and stores them in their specified folder (named after that particular font) in a loop for each font provided in the font list.

I have kept my package separate from the main pipeline in the trdg folder to run it in CLI. All the rest of the documentation and working is the same as per Belval's original package.

How I created my dataset?

I ran the package several times in CLI setting different parameters every time to create some distortion in the dataset:

  • Run the main file to produce basic images consisting of 70% of the dataset.

python run.py -c <no. of images>

  • Run the main file to produce sine-wave distortion consisting of 10% of the dataset. Argument -d has 3 possible values:
  1. None
  2. Sine wave
  3. Cosine wave
  4. Random

python run.py -c <no. of images> -d <1>

  • Run the main file to produce random wave distortion consisting of 10% of the dataset.

python run.py -c <no. of images> -d <3>

  • Run the main file to produce skewed font images consisting of 5% of the dataset. Argument -k is the skew angle and -rk is set to True to enable random skewing between the range of positive and negative values of k.

python run.py -c <no. of images> -k <skew angle> -rk <True>

  • Run the main file to apply the gaussian blur to the resulting sample consisting of 5% of the dataset. Argument -bl is an integer defining the blur radius and -rbl is set to True to enable random blur between the range of positive and negative values of bl.

python run.py -c <no. of images> -bl <blur radius> -rbl <True>

NOTE: All the resulting images are pre-processed by cropping them to 100x100 pixels. The input dataset must be of fixed dimensions before passing them into the model for training.

Files Description

  • trdg The (Text Recognition Data Generator) TRDG package created by Edouard Belval in Python. It is a synthetic data generator for text recognition.
  • train It contains the training dataset of 100 types of fonts consisting of 1000 sample images per font (100x100 pixels RGB) provided in a jpeg format.
  • valid It contains the validation dataset of 100 types of fonts consisting of 200 sample images per font (100x100 pixels RGB) provided in a jpeg format.
  • test It contains the test dataset for prediction.
  • train.py It contains the code for loading and pre-processing the dataset. It is also used in training the model using tensorflow and saving it for further use.
  • test.py It contains the code for loading and pre-processing the image for testing. The pre-trained model in train.py is loaded for making the prediction.
  • utils.py It contains the functions and CNN model along with any supporting code, which is imported by main files (train.py and test.py) for implementation.

Installation

The Code is written in Python and implemented using tensorflow.

Additional Packages that are required are Numpy, MatplotLib, PIL, Scikit-image, and Tensorflow. You can download them using pip

pip install numpy

pip install matplotlib

pip install pil

pip install skimage

pip install tensorflow

To install TensorFlow, head over to the Tensorflow link and follow the instructions given.

GPU/CPU

As this project uses CNN with a huge amount of parameters for training, you need to use a GPU.

License

MIT License

Author

Shubham Kapoor

Credits

Edouard Belval - TRDG (Text Recognition Data Generator) package link

font_recognition's People

Contributors

imshubhamkapoor avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.