Handwritten MMOCR

This project is an implementation of Optical Character Recognition (OCR) using MMOCR from training until inferencing, it's also testing various models such as PaddleOCR, TesseractOCR, EasyOCR to detect and recognize text from images. The recognized text is then evaluated for correctness using Character Error Rate (CER) and Word Error Rate (WER) metrics calculated by PyWER, JiWER, and FastWER libraries.

The code is designed to perform Optical Character Recognition (OCR) on images. It utilizes two models: DBNet for detecting text areas in the images, and SVTR for recognizing the actual text within these detected areas. Once the text is recognized, it’s corrected for any spelling errors using a spell checker. The corrected text is then saved to a text file. This entire process can be applied to a single image or multiple images in a directory, depending on the input provided to the script.

Pipeline

The pipeline of this project is as follows:

An image is input to the OCR models.
The models detect areas in the image that contain text.
The detected text areas are recognized as actual text.
Save the recognized text as TXT file.

Results

The following table shows the CER and WER scores for each model using each metric library:

Model	Metric Library	CER	WER
PaddleOCR	JiWER	0.2113	0.7428
TesseractOCR	JiWER	0.493	1.042
EasyOCR	JiWER	0.4	1.014
MMOCR Base	JiWER	0.502	1.028
MMOCR Trained	JiWER	0.315	0.585

The MMOCR Trained model has the lowest WER score, indicating it made fewer word-level mistakes in text recognition compared to the other models.

Prerequisites

Ensure you have the following dependencies installed:

PyTorch
torchvision
cuDNN
OpenMIM
MMOCR
PySpellChecker

Dataset Creation and Labelling

This project requires a handwritten dataset. You can use the dataset example in handwritten-mmocr/dataset/. Follow these steps if you want create and label your dataset:

Collect handwritten samples for your dataset.
Install and set up Label Studio.
Import your collected samples into Label Studio.
Label the samples according to your project requirements.

Ensure the dataset is properly labeled and saved in a format compatible with the OCR models used in this project.

Installation

To install this project, follow these steps:

Clone this repository.
Download the models from the provided link (Text Detection Model | Text Recognition Model).
Place the downloaded models into the appropriate folders handwritten-mmocr/models/.
Update the model paths in the app.py file to match the locations of your downloaded models.
Open your terminal and navigate to the project directory.
Run the following command to install the necessary dependencies:

pip install -r requirements.txt

Note: If you encounter an error when using mmdet, you can install it using OpenMIM with the following command: mim install mmdet.

Usage

You can run this project with the following command:

python app.py --input_dir <input_dir_or_image_path> --output_dir <output_dir>

Where:

<input_dir_or_image_path> is a directory containing images or a path to a specific image file. <output_dir> is the directory where the OCR results will be saved.

Training

To train the models, follow these steps:

Open the training.ipynb file.
Run the cells and follow the instructions provided in the notebook.

Before training, make sure to modify the configurations in the config/texdet/dbnet and config/textrecog/svtr files, as well as the corresponding base files.

For the text detection model (DBNet), the following configurations should be updated:

root_data
num of iteration
val cycle
tensorboard visualizer
save last checkpoint

For the text recognition model (SVTR), the following configurations should be updated:

root_data
num of iteration
tensorboard visualizer
save last checkpoint
validation evaluator
train/test dataset list
update pretrained model url

Related Projects

You might also be interested in the following related project:

Semantic Entity Recognition of Handwritten Images using LayoutLMV3: This project focuses on extracting information from images and save it in json key-value pair format.

octadion / handwritten-mmocr Goto Github PK