Git Product home page Git Product logo

verticalattentionocr's Introduction

Vertical Attention Network: an end-to-end model for handwritten text recognition at paragraph level.

This repository is a public implementation of the paper: "End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network".

The paper is available at https://arxiv.org/abs/2012.03868

It focuses on Optical Character Recognition (OCR) applied at line and paragraph levels.

We obtained the following results at line level:

Dataset cer wer
IAM 4.95 18.73
RIMES 3.19 10.25
READ2016 4.28 19.71
ScribbleLens 6.66 25.12

For the paragraph level, here are the results:

Dataset cer wer
IAM 4.32 16.24
RIMES 1.90 8.83
READ2016 3.63 16.75

Pretrained model weights are available here

Table of contents:

  1. Getting Started
  2. Datasets
  3. Training And Evaluation

Getting Started

Implementation has been tested with Python 3.6.

Clone the repository:

git clone https://github.com/FactoDeepLearning/VerticalAttentionOCR.git

Install the dependencies:

pip install -r requirements.txt

Datasets

This section is dedicated to the datasets used in the paper: download and formatting instructions are provided for experiment replication purposes.

IAM

Details

IAM corresponds to english grayscale handwriting images (from the LOB corpus). We provide a script to format this dataset for the commonly used split for result comparison purposes. The different splits are as follow:

train validation test
line 6,482 976 2,915
paragraph 747 116 336

Download

  • Register at the FKI's webpage
  • Download the dataset here
  • Move the following files into the folder Datasets/raw/IAM/
    • formsA-D.tgz
    • formsE-H.tgz
    • formsI-Z.tgz
    • lines.tgz
    • ascii.tgz

RIMES

Details

RIMES corresponds to french grayscale handwriting images. We provide a script to format this dataset for the commonly used split for result comparison purposes. The different splits are as follow:

train validation test
line 9,947 1,333 778
paragraph 1400 100 100

Download

  • Fill in the a2ia user agreement form available here and send it by email to [email protected]. You will receive by mail a username and a password
  • Login in and download the data from here
  • Move the following files into the folder Datasets/raw/RIMES/
    • eval_2011_annotated.xml
    • eval_2011_gray.tar
    • training_2011_gray.tar
    • training_2011.xml

READ 2016

Details

READ 2016 corresponds to Early Modern German RGB handwriting images. We provide a script to format this dataset for the commonly used split for result comparison purposes. The different splits are as follow:

train validation test
line 8,349 1,040 1,138
paragraph 1584 179 197

Download

  • From root folder:
cd Datasets/raw
mkdir READ_2016
cd READ_2016
wget https://zenodo.org/record/1164045/files/{Test-ICFHR-2016.tgz,Train-And-Val-ICFHR-2016.tgz}

ScribbleLens

Details

ScribbleLens corresponds to Early Modern Deutch RGB handwriting images. The dataset is split as follow:

train validation test
line 4,302 481 563

Download

  • From root folder:
cd Datasets/raw
mkdir ScribbleLens
cd ScribbleLens
wget http://openslr.magicdatatech.com/resources/84/scribblelens.{supplement.original.pages.tgz,corpus.v1.2.zip}

Format the datasets

  • Comment/Uncomment the following lines from the main function of the script "format_datasets.py" according to your needs and run it
if __name__ == "__main__":

    # format_IAM_line()
    # format_IAM_paragraph()

    # format_RIMES_line()
    # format_RIMES_paragraph()

    # format_READ2016_line()
    # format_READ2016_paragraph()

    # format_scribblelens_line()
  • This will generate well-formated datasets, usable by the training scripts.

Training And Evaluation

You need to have a properly formatted dataset to train a model, please refer to the section Datasets.

Two scripts are provided to train respectively line and paragraph level models: OCR/line_OCR/ctc/main_line_ctc.py and OCR/document_OCR/v_attention/main_pg_va.py

Training a model leads to the generation of output files ; they are located in the output folder OCR/line_OCR/ctc/outputs/#TrainingName or OCR/document_OCR/v_attention/outputs/#TrainingName.

The outputs files are split into two subfolders: "checkpoints" and "results". "checkpoints" contains model weights for the last trained epoch and for the epoch giving the best valid CER. "results" contains tensorboard log for loss and metrics as well as text file for used hyperparameters and results of evaluation.

Training can use apex package for mix-precision and Distributed Data Parallel for usage on multiple GPU.

All hyperparameters are specified and editable in the training scripts (meaning are in comments).

Evaluation is performed just after training ending (training is stopped when the maximum elapsed time is reached or after a maximum number of epoch as specified in the training script)

Citation

@misc{coquenet2020,
      title={End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network}, 
      author={Denis Coquenet and Clément Chatelain and Thierry Paquet},
      year={2020},
      eprint={2012.03868},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This whole project is under Cecill-C license EXCEPT FOR the file "basic/transforms.py" which is under MIT license.

verticalattentionocr's People

Contributors

factodeeplearning avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.