Git Product home page Git Product logo

masktextspotter's Introduction

MaskTextSpotter

This is the code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes" (TPAMI version). It is an extension of the ECCV version while sharing the same title. For more details, please refer to our TPAMI paper.

This repo is inherited from maskrcnn-benchmark and follows the same license.

ToDo List

  • Release code
  • Document for Installation
  • Trained models
  • Document for testing
  • Document for training
  • Demo script
  • Evaluation
  • Release the standalone recognition model

Installation

Requirements:

  • Python3 (Python3.7 is recommended)
  • PyTorch >= 1.0 (1.2 is recommended)
  • torchvision from master
  • cocoapi
  • yacs
  • matplotlib
  • GCC >= 4.9 (This is very important!)
  • OpenCV
  • CUDA >= 9.0 (10.0 is recommended)
  # first, make sure that your conda is setup properly with the right environment
  # for that, check that `which conda`, `which pip` and `which python` points to the
  # right path. From a clean conda env, this is what you need to do

  conda create --name masktextspotter -y
  conda activate masktextspotter

  # this installs the right pip and dependencies for the fresh python
  conda install ipython pip

  # python dependencies
  pip install ninja yacs cython matplotlib tqdm opencv-python shapely scipy tensorboardX

  # install PyTorch
  conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

  export INSTALL_DIR=$PWD

  # install pycocotools
  cd $INSTALL_DIR
  git clone https://github.com/cocodataset/cocoapi.git
  cd cocoapi/PythonAPI
  python setup.py build_ext install

  # install apex (optional)
  cd $INSTALL_DIR
  git clone https://github.com/NVIDIA/apex.git
  cd apex
  python setup.py install --cuda_ext --cpp_ext

  # clone repo
  cd $INSTALL_DIR
  git clone https://github.com/MhLiao/MaskTextSpotter.git
  cd MaskTextSpotter

  # build
  python setup.py build develop


  unset INSTALL_DIR

Models

Download Trained model

Demo

You can run a demo script for a single image inference by python tools/demo.py.

Datasets

Download the ICDAR2013(Google Drive, BaiduYun) and ICDAR2015(Google Drive, BaiduYun) as examples.

The SCUT dataset used for training can be downloaded here.

The converted labels of Total-Text dataset can be downloaded here.

The converted labels of SynthText can be downloaded here.

The root of the dataset directory should be MaskTextSpotter/datasets/.

Testing

Prepar dataset

An example of the path of test images: MaskTextSpotter/datasets/icdar2015/test_iamges

Check the config file (configs/finetune.yaml) for some parameters.

test dataset: TEST.DATASETS;

input size: ```INPUT.MIN_SIZE_TEST''';

model path: MODEL.WEIGHT;

output directory: OUTPUT_DIR

run sh test.sh

Training

Place all the training sets in MaskTextSpotter/datasets/ and check DATASETS.TRAIN in the config file.

Pretrain

Trained with SynthText

python3 -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/pretrain.yaml

Finetune

Trained with a mixure of SynthText, icdar2013, icdar2015, scut-eng-char, and total-text

check the initial weights in the config file.

python3 -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/finetune.yaml

Evaluation

Evaluation for ICDAR 2015 dataset

download the lexicons and place them like evaluation/lexicons/ic15/

cd evaluation/icdar2015/e2e/
# edit "result_dir" in script.py
python script.py

Evaluation for Total-Text dataset (ToDo)

Please cite the related works in your publications if it helps your research:

@article{liao2019mask,
  author={M. {Liao} and P. {Lyu} and M. {He} and C. {Yao} and W. {Wu} and X. {Bai}},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes}, 
  year={2021},
  volume={43},
  number={2},
  pages={532-548},
  doi={10.1109/TPAMI.2019.2937086}}
}

@inproceedings{lyu2018mask,
  title={Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes},
  author={Lyu, Pengyuan and Liao, Minghui and Yao, Cong and Wu, Wenhao and Bai, Xiang},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={67--83},
  year={2018}
}

masktextspotter's People

Contributors

mhliao avatar wangqiang1588 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.