Git Product home page Git Product logo

east's Introduction

EAST: An Efficient and Accurate Scene Text Detector

Introduction

This is a refactoring and packaging from argman/EAST code-base. With few exceptions the model stays pretty much the same but the code-base has been heavily re-factored to allow for python packaging and easy use as a python library for inference. At the moment is just making it more accessible as a package to other pieces of codes to import and perform inference. Upcoming changes will allow for re-training and evaluation.

Currently the package allows the following:

import cv2
from east.model import EASTPredictor

img = cv2.imread("some-text-picture-path")

m = EASTPredictor()
m.load("your-checkpoint-path")
m.predict(img)  # returns the text bbox coordinates

To download a pre-trained model checkpoint please see the download section.

Known issues:

  • Train and eval had not been tested yet
  • There are several hardcoded paths all around still (eg: /static/results/...)

This is a tensorflow re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

The features are summarized blow:

  • Online demo + Result example CAVEAT: There's only one cpu core on the demo server. Simultaneous access will degrade response time.
  • Only RBOX part is implemented.
  • A fast Locality-Aware NMS in C++ provided by the paper's author.
  • The pre-trained model provided achieves 80.83 F1-score on ICDAR 2015 (Incidental Scene Text Detection Challenge) using only training images from ICDAR 2015 and 2013. See here for the detailed results.
  • Differences from original paper:
    • Use ResNet-50 rather than PVANET
    • Use dice loss (optimize IoU of segmentation) rather than balanced cross entropy
    • Use linear learning rate decay rather than staged learning rate decay
  • Speed on 720p (resolution of 1280x720) images:
    • Now
      • Graphic card: GTX 1080 Ti + Network fprop: ~50 ms
      • NMS (C++): ~6ms
      • Overall: ~16 fps
    • Then
      • Graphic card: K40 + Network fprop: ~150 ms
      • NMS (python): ~300ms + Overall: ~2 fps

Thanks for the author's (@zxytim) help! Please cite his paper if you find this useful.

Contents

  1. Installation
  2. Download
  3. Demo
  4. Test
  5. Train
  6. Examples

Installation

The Makefile east/lanms/ has been modified to avoid issues with gcc < 6 and -fno-plt flag including hardcoded paths. If you find the same issue, please do the following:

python3-config --cflags

Copy the output, remove -fno-plt and paste in the Makefile as: CXXFLAGS = <output-from-previous-command>

Then:

cd east/lanms
make

Finally:

sudo apt-get install libgeos-dev  # required for Shapely
pip install -r requirements.txt

Alternatively:

pip install . -v

Download

  1. Models trained on ICDAR 2013 (training set) + ICDAR 2015 (training set): BaiduYun link GoogleDrive
  2. Resnet V1 50 provided by tensorflow slim: slim resnet v1 50

To download the models automatically:

bash ./scripts/download_models.sh

Train

If you want to train the model, you should provide the dataset path, in the dataset path, a separate gt text file should be provided for each image and run

python -m east.multigpu_train \
    --gpu_list=0 \
    --input_size=512 \
    --batch_size_per_gpu=14 \
    --checkpoint_path=/tmp/east_icdar2015_resnet_v1_50_rbox/ \
    --text_scale=512 --training_data_path=/data/ocr/icdar2015/ \
    --geometry=RBOX --learning_rate=0.0001 --num_readers=24 \
    --pretrained_model_path=/tmp/resnet_v1_50.ckpt

If you have more than one gpu, you can pass gpu ids to gpu_list(like --gpu_list=0,1,2,3)

Note: you should change the gt text file of icdar2015's filename to img*.txt instead of gt_img*.txt(or you can change the code in icdar.py), and some extra characters should be removed from the file. See the examples in training_samples/**

Demo

If you've downloaded the pre-trained model, you can setup a demo server by

python east.run_demo_server \
    --checkpoint-path /checkpoints/east_icdar2015_resnet_v1_50_rbox/

Then open http://localhost:8769 for the web demo. Notice that the URL will change after you submitted an image. Something like ?r=49647854-7ac2-11e7-8bb7-80000210fe80 appends and that makes the URL persistent. As long as you are not deleting data in static/results, you can share your results using the same URL.

URL for example below: http://east.zxytim.com/?r=48e5020a-7b7f-11e7-b776-f23c91e0703e web-demo

Test

run

python eval.py \
    --test_data_path=/tmp/images/ \
    --gpu_list=0 \
    --checkpoint_path=/tmp/east_icdar2015_resnet_v1_50_rbox/ \
    --output_dir=/tmp/

a text file will be then written to the output path.

Examples

Here are some test examples on icdar2015, enjoy the beautiful text boxes! image_1 image_2 image_3 image_4 image_5

Changes

  • General refactoring
  • Packaging as a python package installable through pip install .
  • tf.app.flags have been moved just to be imported when calling individual scripts. This is because tf.app.flags conflicts with argparse command-line arguments when importing model. A good discussion about it can be found here.
  • Model creation is now on different graphs so multiple EAST Predictors can be instantiated simultaneously.

east's People

Contributors

argman avatar atry avatar josemarcosrf avatar qaisarrajput avatar zxytim avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.