This project is an implementation of Optical Character Recognition (OCR) using MMOCR from training until inferencing, it's also testing various models such as PaddleOCR, TesseractOCR, EasyOCR to detect and recognize text from images. The recognized text is then evaluated for correctness using Character Error Rate (CER) and Word Error Rate (WER) metrics calculated by PyWER, JiWER, and FastWER libraries.
The code is designed to perform Optical Character Recognition (OCR) on images. It utilizes two models: DBNet for detecting text areas in the images, and SVTR for recognizing the actual text within these detected areas. Once the text is recognized, itβs corrected for any spelling errors using a spell checker. The corrected text is then saved to a text file. This entire process can be applied to a single image or multiple images in a directory, depending on the input provided to the script.
The pipeline of this project is as follows:
- An image is input to the OCR models.
- The models detect areas in the image that contain text.
- The detected text areas are recognized as actual text.
- Save the recognized text as TXT file.
The following table shows the CER and WER scores for each model using each metric library:
Model | Metric Library | CER | WER |
---|---|---|---|
PaddleOCR | JiWER | 0.2113 | 0.7428 |
TesseractOCR | JiWER | 0.493 | 1.042 |
EasyOCR | JiWER | 0.4 | 1.014 |
MMOCR Base | JiWER | 0.502 | 1.028 |
MMOCR Trained | JiWER | 0.315 | 0.585 |
The MMOCR Trained model has the lowest WER score, indicating it made fewer word-level mistakes in text recognition compared to the other models.
Ensure you have the following dependencies installed:
- PyTorch
- torchvision
- cuDNN
- OpenMIM
- MMOCR
- PySpellChecker
This project requires a handwritten dataset. You can use the dataset example in handwritten-mmocr/dataset/
. Follow these steps if you want create and label your dataset:
- Collect handwritten samples for your dataset.
- Install and set up Label Studio.
- Import your collected samples into Label Studio.
- Label the samples according to your project requirements.
Ensure the dataset is properly labeled and saved in a format compatible with the OCR models used in this project.
To install this project, follow these steps:
- Clone this repository.
- Download the models from the provided link (Text Detection Model | Text Recognition Model).
- Place the downloaded models into the appropriate folders
handwritten-mmocr/models/
. - Update the model paths in the
app.py
file to match the locations of your downloaded models. - Open your terminal and navigate to the project directory.
- Run the following command to install the necessary dependencies:
pip install -r requirements.txt
Note: If you encounter an error when using mmdet, you can install it using OpenMIM with the following command: mim install mmdet
.
You can run this project with the following command:
python app.py --input_dir <input_dir_or_image_path> --output_dir <output_dir>
Where:
<input_dir_or_image_path> is a directory containing images or a path to a specific image file. <output_dir> is the directory where the OCR results will be saved.
To train the models, follow these steps:
- Open the
training.ipynb
file. - Run the cells and follow the instructions provided in the notebook.
Before training, make sure to modify the configurations in the config/texdet/dbnet
and config/textrecog/svtr
files, as well as the corresponding base files.
For the text detection model (DBNet), the following configurations should be updated:
root_data
num of iteration
val cycle
tensorboard visualizer
save last checkpoint
For the text recognition model (SVTR), the following configurations should be updated:
root_data
num of iteration
tensorboard visualizer
save last checkpoint
validation evaluator
train/test dataset list
update pretrained model url
You might also be interested in the following related project:
- Semantic Entity Recognition of Handwritten Images using LayoutLMV3: This project focuses on extracting information from images and save it in json key-value pair format.