This repository provides an edited implementation of the CRNN network in TensorFlow 2. This implementation focused only on number extraction from image - a simpler task. The main changes include creating a dataset builder class that creates a very specific dataset with numbers on a paper-like background with some noise and augmentation applied to it. It also provides with configuration files and script for saving such dataset as images in a directory structure.
Some necessary changes to the were done to make it runnable with TensorFlow 2.10. The code was also refactored to be more readable and files located meaningfully.
Original readme with information regarding the original implementation follows.
This is a re-implementation of the CRNN network, build by TensorFlow 2. This repository may help you to understand how to build an End-to-End text recognition network easily. Here is the official repo implemented by bgshih.
This repo aims to build a simple, efficient text recognize network by using the various components of TensorFlow 2. The model build by the Keras API, the data pipeline build by tf.data
, and training with model.fit
, so we can use most of the functions provided by TensorFlow 2, such as Tensorboard
, Distribution strategy
, TensorFlow Profiler
etc.
$ pip install -r requirements.txt
Here I provide an example model that trained on the Numbers dataset, this model can only predict 9-digit numbers.
$ wget https://github.com/danielkonecny/number-extraction/releases/download/v0.3.0/Model.zip
$ unzip Model.zip -d Model
$ python tools/demo.py --images example/images/ --config configs/numbers3.yml --model Model
Then, You will see output like this:
Path: example/images/115352072.png includes number: 115352072 (probability: 99.76%).
Path: example/images/292222036.png includes number: 292222036 (probability: 99.71%).
Path: example/images/306092755.png includes number: 306092755 (probability: 99.20%).
Path: example/images/477760885.png includes number: 477760885 (probability: 99.64%).
Path: example/images/613574192.png includes number: 613574192 (probability: 99.60%).
Path: example/images/885430380.png includes number: 885430380 (probability: 91.38%).
About decode methods, sometimes the beam search method will be better than the greedy method, but it's costly.
Before you start training, maybe you should prepare data first. All predictable characters are defined by the table.txt file. The configuration of the training process is defined by the yml file.
This training script uses all GPUs by default, if you want to use a specific GPU, please set the CUDA_VISIBLE_DEVICES
parameter.
$ python tools/train.py --config configs/numbers3.yml --save_dir PATH/TO/SAVE
The training process can visualize in Tensorboard.
$ tensorboard --logdir PATH/TO/MODEL_DIR
For more instructions, please refer to the config file.
To train this network, you should prepare a lookup table, images and corresponding labels. Example data is copy from MJSynth and ICDAR2013 dataset.
The file contains all characters and blank labels (in the last or any place both ok, but I find Tensorflow decoders can't change it now, so set it to last). By the way, you can write any word as blank.
It's an End-to-End method, so we don't need to indicate the position of the character in the image.
The labels corresponding to these three pictures are 115352072
, 292222036
, 306092755
.
We should write the image path and its corresponding label to a text file in a certain format such as example data. The data input pipeline will automatically detect the support format. Customization is also very simple, please check out the dataset factory.
- MJSynth
- ICDAR2013/2015
- Simple such as [example.png label]
$ python tools/eval.py --config PATH/TO/CONFIG_FILE --weight PATH/TO/MODEL_WEIGHT
There are many components here to help us do other things. For example, deploy by Tensorflow serving
. Before you deploy, you can pick up a good weight, and convertes model to SavedModel
format by this command, it will add the post-processing layer in the last and cull the optimizer:
$ python tools/export.py --config PATH/TO/CONFIG_FILE --weight PATH/TO/MODEL_WEIGHT --pre rescale --post greedy --output PATH/TO/OUTPUT
And now Tensorflow lite
also can convert this model, that means you can deploy it to Android, iOS etc.
Note. Decoders can't convert to Tensorflow lite
because of the assets. Use the softmax layer or None.