Git Product home page Git Product logo

megreader's Introduction

MegReader

A project for research in text detection and recognition using PyTorch 1.2.

This project is originated from the research repo, which heavily relies on closed-source libraries, of CSG-Algorithm team of Megvii(https://megvii.com). We are in ongoing progress to transfer models into this repo gradually, released implementations are listed in Progress.

Highlights

  • Implementations of representative text detection and recognition methods.
  • An effective framework for conducting experiments: We use yaml files to configure experiments, making it convenient to take experiments.
  • Thorough logging features which make it easy to follow and analyze experimental results.
  • CPU/GPU compatible for training and inference.
  • Distributed training support.

Install

Requirements

pip install -r requirements.txt

  • Python3.7
  • PyTorch 1.2 and CUDA 10.0.
  • gcc 5.5(Important for compiling)

Compile cuda ops (If needed)

cd PATH_TO_OPS

python setup.py build_ext --inplace

ops may be used:

  • DeformableConvV2 assets/ops/dcn
  • CTC2DLoss ops/ctc_2d

Configuration(optional)

Edit configurations in config.py.

Training

See detailed options: python3 train.py --help

Datasets

We provide data loading implementation with annotation packed with json for quick start. Also, lmdb format data are now available too. You can refer the usage in demo. Datasets used in our recognition experiments can be downloaded from onedrive. The transform script are provide to convert json format data to lmdb.

Non-distributed

python3 train.py PATH_TO_EXPERIMENT.yaml --validate --visualize --name NAME_OF_EXPERIMENT

Following we provide some of configurations of the released recognition models:

  • CRNN: experiments/recognition/crnn.yaml
  • 2D CTC: experiments/recognition/res50-ppm-2d-ctc.yaml
  • Attention Decoder: experiments/recognition/fpn50-attention-decoder.yaml

Distributed(recommended for multi-gpu training)

python3 -m torch.distributed.launch --nproc_per_node=NUM_GPUS train.py PATH_TO_EXPERIMENT.yaml -d --validate

Evaluating

See detailed options: python3 eval.py --help.

Keeping ratio tesing is recommended: python3 eval.py PATH_TO_EXPERIMENT.yaml --resize_mode keep_ratio

Model zoo

Trained models are comming soon.

Progress

Recognition Methods

  • 2D CTC
  • CRNN
  • Attention Decoder
  • Rectification

Detection Methods

  • Text Snake
  • EAST

End-to-end

  • Mask Text Spotter

Contributing

Contributing.md

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.