Git Product home page Git Product logo

wordart's Introduction

Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition (ECCV 2022 Oral)

The official code of CornerTransformer (ECCV 2022, Oral).

This work focuses on a new challenging task of artistic text recognition. To tackle the difficulties of this task, we introduce the corner point map as a robust representation for the artistic text image and present the corner-query cross-attention mechanism to make the model achieve more accurate attention. We also design a character contrastive loss to learn the invariant features of characters, leading to tight clustering of features. In order to benchmark the performance of different models, we provide the WordArt dataset.

Runtime Environment

This repo depends on PyTorch, MMCV, MMDetection and MMOCR. Below are quick steps for installation. Please refer to MMOCR Install Guide for more detailed instruction.

conda create -n wordart python=3.7 -y
conda activate wordart
conda install pytorch==1.10 torchvision cudatoolkit=11.3 -c pytorch
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10.0/index.html
pip install mmdet
git clone https://github.com/xdxie/WordArt.git
cd WordArt
pip install -r requirements.txt
pip install -v -e .
export PYTHONPATH=$(pwd):$PYTHONPATH
pip install -r requirements/albu.txt

Training

For distributed training on multiple GPUs, please use

./tools/dist_train.sh ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} [PY_ARGS]

For training on a single GPU, please use

python tools/train.py ${CONFIG_FILE} [ARGS]

For example, we use this script to train the model:

./tools/dist_train.sh configs/textrecog/corner_transformer/corner_transformer_academic.py outputs/corner_transformer/ 4

Evaluation

For distributed evaluating on multiple GPUs, please use

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]

For evaluating on a single GPU, please use

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]

For example, we use this script to evaluate the model performance:

CUDA_VISIBLE_DEVICES=0 python tools/test.py outputs/corner_transformer/corner_transformer_academic.py outputs/corner_transformer/latest.pth --eval acc

WordArt Dataset

The WordArt dataset consists of 6316 artistic text images with 4805 training images and 1511 testing images. The dataset is available at Google Drive.

Results

Model IC13 SVT IIIT IC15 SVTP CUTE WordArt
CornerTransformer 96.4 94.6 95.9 86.3 91.5 92.0 70.8

Visualization

Each example is along with the results from ABINet-LV, our baseline and the proposed CornerTransformer. Hard examples are successfully recognized by CornerTransformer.

When decorative patterns from the background have exactly the same appearance and similar shape as the texts, CornerTransformer may fail to achieve correct results. Each image is along with our result and the ground truth.

Citation

Please cite the following paper when using the WordArt dataset or this repo.

@article{xie2022toward,
  title={Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition},
  author={Xie, Xudong and Fu, Ling and Zhang, Zhifei and Wang, Zhaowen and Bai, Xiang},
  booktitle={ECCV},
  year={2022}
}

Acknowledgement

This repo is based on MMOCR. We appreciate this wonderful open-source toolbox.

wordart's People

Contributors

xdxie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.