Git Product home page Git Product logo

ltptextdetector's Introduction

LTPTextDetector
===========================================

This repository contains the open source release of the research publication

End-to-End Text Recognition using Local Ternary Patterns, MSER and Deep Convolutional Neural Networks
Michael Opitz, Markus Diem, Markus Diem, Florian Kleber, Stefan Fiel and Robert Sablatnig 
presented at DAS 2014.

Requirements
===========================================

* Linux (untested under Windows, OS X)
* Boost
* OpenCV 2.4
* CMake
* Eigen 3
* python 2.7
* LTPTextDetectorTraining (available on github)
* Time and patience to get things running

How to compile the code?
===========================================

Before compiling the code, grab the LTPTextDetectorTraining project on GitHub
and extract/symlink it in the detector subdirectory.

Then the project can be compiled by $ cmake . && make

Since GitHub does not allow big files in their repositories, pre-trained models
have to be downloaded at http://bit.ly/1ehC3ZT and unzipped in the models/ directory

How to run a demo?
===========================================

Just run 

    $ ./bin/demo -c config_11.yml -model models/model_boost.txt -i <image>

from the root-directory of the project. The model files
must be downloaded and extracted in the models directory, as explained in the previous step.


How to reproduce the results?
===========================================

To reproduce the results, download the archive of datasets from http://bit.ly/1gxI9Fx
and unzip it in the parent directory of the project.

Then run  

    $ ./bin/classify -t ../test_icdar_2011  -r ../result_test -m models/model_boost.txt

to create the response maps and 
    
    $ ./bin/create_boxes -c config_11.yml

to create the bounding boxes.
To convert the output to the ICDAR evalution format, run

    $ python2 ./scripts/to_xml.py result_test/ > eval11.xml
    $ evaldetection eval11.xml datasets/test-textloc-gt/test-gt-textloc-wolf.xml > results.xml
    $ readdeteval results.xml 

Which shoult print: 

    Included 255 images with non-zero groundtruth
    Included 0 images with zero groundtruth
    Skipped 0 images with zero groundtruth.
    Total-Number-Of-Processed-Images: 255
    100% of the images contain objects.
    Generality: 4.66275
    Inverse-Generality: 0.214466
    <evaluation noImages="255">
      <icdar2003 r="0.700094" p="0.81904" hmean="0.75491" noGT="1189" noD="1026"/>
      <score r="0.715559" p="0.844055" hmean="0.774514" noGT="1189" noD="1026"/>
    </evaluation>

How to retrain the models?
===========================================

Training scripts are in the scripts/ subdirectory. To retrain the models unzip the datasets 
in the parent directory of the repository. 
To retrain everything from scratch run

    $ ./script/train_all.sh

What about the Recognizer?
===========================================

Comming soon...

ltptextdetector's People

Contributors

mop avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.