Git Product home page Git Product logo

scanssd's Introduction

ScanSSD: Scanning Single Shot Detector for Math in Document Images

A PyTorch implementation of ScanSSD Scanning Single Shot MultiBox Detector by Parag Mali. It was developed using SSD implementation by Max deGroot.

Developed using Cuda 9.1.85 and Pytorch 1.1.0

   

Table of Contents

   

Installation

  • Install PyTorch
  • Clone this repository. Requires Python3
  • Download the dataset by following the instructions on (https://github.com/MaliParag/TFD-ICDAR2019).
  • Install Visdom for real-time loss visualization during training!
    • To use Visdom in the browser:
    # First install Python server and client
    pip install visdom
    # Start the server (probably in a screen or tmux)
    python -m visdom.server
    • Then (during training) navigate to http://localhost:8097/ (see the Train section below for training details).

Code Organization

SSD model is built in ssd.py. Training and testing the SSD is managed in train.py and test.py. All the training code is in layers directory. Hyper-parameters for training and testing can be specified through command line and through config.py file inside data directory.

data directory also contains gtdb_new.py data reader that uses sliding windows to generates sub-images of page for training. All the scripts regarding stitching the sub-image level detections are in gtdb directory.

Functions for data augmentation, visualization of bounding boxes and heatmap are in utils.

Setting up data for training

If you are not sure how to setup data, use dir_struct file. It has the one of the possible directory structure that you can use for setting up data for training and testing.

To generate .pmath files or .pchar files you can use this script.

Training ScanSSD

  • First download the fc-reduced VGG-16 PyTorch base network weights here

  • By default, we assume you have downloaded the file in the scanssd/weights dir:

  • Run command

python3 train.py 
--dataset GTDB 
--dataset_root ~/data/GTDB/ 
--cuda True 
--visdom True 
--batch_size 16 
--num_workers 4 
--exp_name IOU512_iter1 
--model_type 512 
--training_data training_data 
--cfg hboxes512 
--loss_fun ce 
--kernel 1 5 
--padding 0 2 
--neg_mining True 
--pos_thresh 0.75
  • Note:
    • For training, an NVIDIA GPU is strongly recommended for speed.
    • For instructions on Visdom usage/installation, see the Installation section.
    • You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see train.py for options)

Pre-Trained weights

For quick testing, pre-trained weights are available here.

Testing

To test a trained network:

python3 test.py 
--dataset_root ../ 
--trained_model HBOXES512_iter1GTDB.pth  
--visual_threshold 0.25 
--cuda True 
--exp_name test_real_world_iter1 
--test_data testing_data  
--model_type 512 
--cfg hboxes512 
--padding 3 3 
--kernel 1 1 
--batch_size 8

You can specify the parameters listed in the eval.py file by flagging them or manually changing them.

Stitching the patch level results


python3 <Workspace>/ssd/gtdb/stitch_patches_pdf.py 
--data_file <Workspace>/train_pdf 
--output_dir <Workspace>/ssd/eval/stitched_HBOXES512_e4/ 
--math_dir <Workspace>/ssd/eval/test_HBOXES512_e4/ 
--stitching_algo equal 
--algo_threshold 30 
--num_workers 8 
--postprocess True 
--home_images <Workspace>/images/ 

math_dir is output dir generated by test.py

output_dir is where you want to generate the final output

Evaluate

python3 <Workspace>/ICDAR2019/TFD-ICDAR2019v2/Evaluation/IOULib/IOUevaluater.py 
--ground_truth <Workspace>/ICDAR2019/TFD-ICDAR2019v2/Train/math_gt/ 
--detections <Workspace>/ssd/eval/stitched_HBOXES512_e4/

Performance

TFD-ICDAR 2019 Version1 Test

Metric Precision Recall F-score
IOU50 85.05 % 75.85% 80.19%
IOU75 77.38 % 69.01% 72.96%
FPS

GTX 1080: ~27 FPS for 512 * 512 input images

Related publications

Mali, Parag, et al. “ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images.” ArXiv:2003.08005 [Cs], Mar. 2020. arXiv.org, http://arxiv.org/abs/2003.08005.

P. S. Mali, "Scanning Single Shot Detector for Math in Document Images." Order No. 22622391, Rochester Institute of Technology, Ann Arbor, 2019.

M. Mahdavi, R. Zanibbi, H. Mouchere, and Utpal Garain (2019). ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection. Proc. International Conference on Document Analysis and Recognition, Sydney, Australia (to appear).

Acknowledgements

scanssd's People

Contributors

amdegroot avatar ellisbrown avatar maliparag avatar tilt avatar asoleimanib avatar astorfi avatar cadene avatar blackyang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.