Git Product home page Git Product logo

handwritten-mmocr's Introduction

Handwritten MMOCR

This project is an implementation of Optical Character Recognition (OCR) using MMOCR from training until inferencing, it's also testing various models such as PaddleOCR, TesseractOCR, EasyOCR to detect and recognize text from images. The recognized text is then evaluated for correctness using Character Error Rate (CER) and Word Error Rate (WER) metrics calculated by PyWER, JiWER, and FastWER libraries.

The code is designed to perform Optical Character Recognition (OCR) on images. It utilizes two models: DBNet for detecting text areas in the images, and SVTR for recognizing the actual text within these detected areas. Once the text is recognized, it’s corrected for any spelling errors using a spell checker. The corrected text is then saved to a text file. This entire process can be applied to a single image or multiple images in a directory, depending on the input provided to the script.

Pipeline

The pipeline of this project is as follows:

  1. An image is input to the OCR models.
  2. The models detect areas in the image that contain text.
  3. The detected text areas are recognized as actual text.
  4. Save the recognized text as TXT file.

Results

The following table shows the CER and WER scores for each model using each metric library:

Model Metric Library CER WER
PaddleOCR JiWER 0.2113 0.7428
TesseractOCR JiWER 0.493 1.042
EasyOCR JiWER 0.4 1.014
MMOCR Base JiWER 0.502 1.028
MMOCR Trained JiWER 0.315 0.585

The MMOCR Trained model has the lowest WER score, indicating it made fewer word-level mistakes in text recognition compared to the other models.

Prerequisites

Ensure you have the following dependencies installed:

  • PyTorch
  • torchvision
  • cuDNN
  • OpenMIM
  • MMOCR
  • PySpellChecker

Dataset Creation and Labelling

This project requires a handwritten dataset. You can use the dataset example in handwritten-mmocr/dataset/. Follow these steps if you want create and label your dataset:

  1. Collect handwritten samples for your dataset.
  2. Install and set up Label Studio.
  3. Import your collected samples into Label Studio.
  4. Label the samples according to your project requirements.

Ensure the dataset is properly labeled and saved in a format compatible with the OCR models used in this project.

Installation

To install this project, follow these steps:

  1. Clone this repository.
  2. Download the models from the provided link (Text Detection Model | Text Recognition Model).
  3. Place the downloaded models into the appropriate folders handwritten-mmocr/models/.
  4. Update the model paths in the app.py file to match the locations of your downloaded models.
  5. Open your terminal and navigate to the project directory.
  6. Run the following command to install the necessary dependencies:
pip install -r requirements.txt

Note: If you encounter an error when using mmdet, you can install it using OpenMIM with the following command: mim install mmdet.

Usage

You can run this project with the following command:

python app.py --input_dir <input_dir_or_image_path> --output_dir <output_dir>

Where:

<input_dir_or_image_path> is a directory containing images or a path to a specific image file. <output_dir> is the directory where the OCR results will be saved.

Training

To train the models, follow these steps:

  1. Open the training.ipynb file.
  2. Run the cells and follow the instructions provided in the notebook.

Before training, make sure to modify the configurations in the config/texdet/dbnet and config/textrecog/svtr files, as well as the corresponding base files.

For the text detection model (DBNet), the following configurations should be updated:

  • root_data
  • num of iteration
  • val cycle
  • tensorboard visualizer
  • save last checkpoint

For the text recognition model (SVTR), the following configurations should be updated:

  • root_data
  • num of iteration
  • tensorboard visualizer
  • save last checkpoint
  • validation evaluator
  • train/test dataset list
  • update pretrained model url

Related Projects

You might also be interested in the following related project:

handwritten-mmocr's People

Contributors

octadion avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.