zzzdavid / icdar-2019-sroie Goto Github PK

View Code? Open in Web Editor NEW

369.0 14.0 131.0 278.64 MB

ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction

License: MIT License

Python 99.98% Shell 0.02%

icdar ocr text-classification crnn-ocr ctc pytorch lstm

icdar-2019-sroie's Introduction

ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction

Background

This repository is our team's solution of 2019 ICDAR-SROIE competition. As the name suggests, this competition is mainly about Optical Character Recognition and information extraction:

Scanned receipts OCR and information extraction (SROIE) play critical roles in streamlining document-intensive processes and office automation in many financial, accounting and taxation areas.

Dataset and Annotations

The original dataset provided by ICDAR-SROIE has a few mistakes. This has been corrected by scripts/check_data.py and you can just use the data folder in this repo.

Original dataset: Google Drive/Baidu NetDisk

The dataset has 1000 whole scanned receipt images. Each receipt image contains around about four key text fields, such as goods name, unit price and total cost, etc. The text annotated in the dataset mainly consists of digits and English characters. An example scanned receipt is shown below:

The dataset is split into a training/validation set (“trainval”) and a test set (“test”). The “trainval” set consists of 600 receipt images, the “test” set consists of 400 images.

For receipt OCR task, each image in the dataset is annotated with text bounding boxes (bbox) and the transcript of each text bbox. Locations are annotated as rectangles with four vertices, which are in clockwise order starting from the top. Annotations for an image are stored in a text file with the same file name. The annotation format is similar to that of ICDAR2015 dataset, which is shown below:

x1_1,y1_1,x2_1,y2_1,x3_1,y3_1,x4_1,y4_1,transcript_1

x1_2,y1_2,x2_2,y2_2,x3_2,y3_2,x4_2,y4_2,transcript_2

x1_3,y1_3,x2_3,y2_3,x3_3,y3_3,x4_3,y4_3,transcript_3

…

For the information extraction task, each image in the dataset is annotated with a text file with format shown below:

{
  "company": "STARBUCKS STORE #10208",
  "address": "11302 EUCLID AVENUE, CLEVELAND, OH (216) 229-0749",
  "date": "14/03/2015",
  "total": "4.95"
}

Tasks

The competition is divided into 3 tasks:

Scanned Receipt Text Localisation: The aim of this task is to accurately localize texts with 4 vertices.
Scanned Receipt OCR: The aim of this task is to accurately recognize the text in a receipt image. No localisation information is provided, or is required.
Key Information Extraction from Scanned Receipts: The aim of this task is to extract texts of a number of key fields from given receipts, and save the texts for each receipt image in a json file.

Usage Guide

Environment setup

We recommend conda as the package and environment manager. If you have conda available, you can use

(base)$ conda env create

and this will create a new conda environment named sroie on your computer, which will give you all the packages needed for this repo. Remember to activate the environment with

(base)$ conda activate sroie

Tasks

This repository contains our trials and solutions of three tasks. Inside each folder there are documentations of the method we adopted and guide of usage.

Task 1 - Text Localization: CTPN & SSD
Task 2 - Scanned Receipt OCR: CRNN
Task 3 - Key Information Extraction: Character-wise classification with Bi-LSTM

Result

The result precision/recall/Hmean of our solution are listed as follows:

Task	Recall	Precision	Hmean	Evaluation Method
Task 1	85.23%	88.73%	86.94%	Deteval
Task 2	26.33%	72.53%	38.63%	OCR
Task 3	75.58%	75.58%	75.58%	/

An visualisation of our solution:

Here only the localisation and recognition are visualised. Eventually we decided to use CTPN for localisation and CRNN for OCR.

License

Copyright (c) 2019 Niansong Zhang, Songyi Yang, Shengjie Xiu

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

icdar-2019-sroie's People

Contributors

Stargazers

Watchers

Forkers

kapitsa2811 templeblock mahendra047 xclu awesome-archive yaoxinbin liyucode jingmouren hivewang yangheng111 tchigher kevinzhangcode kuan-li arkothiwala hate-deadline lnt28 hell-to-heaven lhwcv hxhh littlehead27 anoopyear2020 shreeshiv wangbq18 1kopal nikhil6041 etrigger anacreal aniketgurav a165741 abhinavkbij ustczhouyu maatarmed sunxingxingtf macngoc gandalf012 carlodavid012 dongyoungkim2 jumpst3r surya-s-r akshat-khare ashleyyo1001 bvsaiakhil zhejiangyyf prakhar154 rogeryu123 maxcodextc mayank-sharma-97 kev-kutkin shwetanshu21 steveshep stjordanis chekanskiy kishore-25 amirstudy vikrambala himanshumoliya tahajunaid aravindr7 gztangde vanang krzynio chahatagarwal biswassanket askmetoo aniruddhachoudhury nikitaboyko coloratto data2450 evilc3 jravur1308 nolll77 anirudh-11 j14nwe1 shilpavasava123 aa-amory victorbsrd chenggong0602 haymant wahab054 jiwei-dot hitman56 b-chalpin adrienmydata jjavierdguezas princelorian vishnupriyavr majakalezic dilnasheriff fierval patelashutosh tkgw jeremi-nh mastreips lucasz82 anhlbt dimakis acproject gigajet k-darshil markusylisiurunen

icdar-2019-sroie's Issues

my_data.py ???

in my_data.py file, can you provide further explanation on how to create data by ourselves ?

task1(data_provider.py)

How do I solve this? Error in data_provider.py

Find 712 images
712 training images in /content/drive/My Drive/mlt
Find 712 images
712 training images in /content/drive/My Drive/mlt

TypeError Traceback (most recent call last)
in ()
21 gen = get_batch(num_workers=2, vis=True)
22 while True:
---> 23 image, bbox, im_info = next(gen)
24 print('done')

TypeError: 'ApplyResult' object is not iterable
too many values to unpack (expected 4)
too many values to unpack (expected 4)
too many values to unpack (expected 4)
too many values to unpack (expected 4)
too many values to unpack (expected 4)
too many values to unpack (expected 4)

can you provide any pretrained model? or a demo.py / demo.ipynb ?

AssertionError: Torch not compiled with CUDA enabled in task 3/src/train.py

While training the model via train.py file in the task 3/src/train.py, the following error is popping up,
any assistance shall be immensely helpful.

[Traceback (most recent call last):
  File "./src/train.py", line 75, in <module>
    main()
  File "./src/train.py", line 21, in main
    model = MyModel0(len(VOCAB), 20, args.hidden_size).to(args.device)
  File "/home/guest/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 426, in to
    return self._apply(convert)
  File "/home/guest/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 202, in _apply
    module._apply(fn)
  File "/home/guest/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 224, in _apply
    param_applied = fn(param)
  File "/home/guest/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 424, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "/home/guest/anaconda3/lib/python3.7/site-packages/torch/cuda/__init__.py", line 192, in _lazy_init
    _check_driver()
  File "/home/guest/anaconda3/lib/python3.7/site-packages/torch/cuda/__init__.py", line 95, in _check_driver
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled](url)

building a .so files for CTPN method of task1

is there a way to run this in the windows environment. i tried running the first line if make script independently but it didnt produce any nms.so and bbox.so file is there any solution for this issue.

How to crop 8 needed cells for tickets like below. [email protected]

is there a pretrained model available?

task2 img = Image.open(buf).convert('L')OSError: cannot identify image file <_io.BytesIO object at 0x7f20e6463a98>

Traceback (most recent call last):
File "/root/code/ICDAR-2019-SROIE-master/task2/dataset.py", line 55, in getitem
img = Image.open(buf).convert('L')
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/PIL/Image.py", line 2519, in open
% (filename if filename else fp))
OSError: cannot identify image file <_io.BytesIO object at 0x7f20e6463a98>

No module named 'models'

I follow the instruction of training and face some problems. It seems like there are no models module in task2 repository. Could you please check again ?

Code to predict and evaluate the model for task 1?

I am training CTPN model in pytorch based your task 1 code, it is clear and help me more understand the original paper . But i see 2 problem.
-First, when i trained with SROIE 2019 dataset by your code, i met with "cuda out of memory" after first epoch, and my solution is add "with torch.no_grad()" in line 64 of file "train.py".
-Second, i trained your code with 10 epochs and loss values don't converge. After, i changed the optimizer from "adagrad" to "SGD" with lr=1e-3. And this work.
Hope this help anyone.

I had the weights of model and i have a question for author. Can you share the code to predict and evaluate the model? Tks

Task 3: Inference Issue

Hello @zzzDavid @Michael-Xiu I've trained the model for task3 on Google Colab.
Training & Validation works perfectly, but I'm stuck with Inference part.

Whenever I submit a new text for inference I receive following error:

RuntimeError                              Traceback (most recent call last)

<ipython-input-45-ae9edc6f32be> in <module>()
----> 1 pred = model(text_tensor.to(device))

RuntimeError: CUDA error: device-side assert triggered

Would you please specify how to do inference on new data or share code for inference?

Also, another issue is, I couldn't find a way to save my model. Is a there to way to save the model for future inference?

Thanks for sharing awesome project :)

Justification for Robust Padding

What is the justification for adding random strings to pad the text?

Code in question.

Text before robust_padding function is called:

['SANYU STATIONERY SHOP\nNO. 31G&33G, JALAN SETIA INDAH X ,U13/X\n40170 SETIA ALAM\nMOBILE /WHATSAPPS : +6012-918 7937\nTEL: +603-3362 4137\nGST ID NO: 001531760640\nTAX INVOICE\nOWNED BY :\nSANYU SUPPLY SDN BHD (1135772-K)\nCASH SALES COUNTER\n1. 5000-0001\tPHOTOCOPY SERVICES - A4\nSIZE\n50 X 0.1000\t5.00\tSR\nTOTAL SALES INCLUSIVE GST @6%\t5.00\nDISCOUNT\t0.00\nTOTAL\t5.00\nROUND ADJ\t0.00\nFINAL TOTAL\t5.00\nCASH\t5.00\nCHANGE\t0.00\nGST SUMMARY\tAMOUNT(RM)\tTAX(RM)\nSR @ 6%\t4.72\t0.28\nINV NO: CS-SA-0097493\tDATE : 19/07/2017\nGOODS SOLD ARE NOT RETURNABLE & REFUNDABLE\nTHANK YOU FOR YOUR PATRONAGE\nPLEASE COME AGAIN.\nTERIMA KASIH SILA DATANG LAGI\n** PLEASE KEEP THIS RECEIPT FOR PROVE OF\nPURCHASE DATE FOR I.T PRODUCT WARRANTY\nPURPOSE **\nFOLLOW US IN FACEBOOK : SANYU.STATIONERY', 'AIK HUAT HARDWARE\nENTERPRISE (SETIA\nALAM) SDN BHD\n822737-X\nNO. 17-G, JALAN SETIA INDAH\n(X) U13/X, SETIA ALAM,\nSEKSYEN U13, 40170 SHAH ALAM,\nTEL: 012 - 6651783 FAX: 03 - 33623608\nGST NO: 000394528768\nSIMPLIFIED TAX INVOICE\nCASH\nRECEIPT #: CSP0420207 DATE: 13/12/2017\nSALESPERSON : AH019 TIME: 17:58:00\nITEM\tQTY\tU/P\tAMOUNT\n(RM)\t(RM)\n8710163220987\t2\t12.00\t24.00\tS\nPHILIPS 18W/E27/827 ESSENTIAL BULB W/WHI\nTOTAL QUANTITY\t2\nSUB-TOTAL (GST)\t24.00\nDISC\t0.00\nROUNDING\t0.00\nTOTAL\t24.00\nCASH\t100.00\nCHANGE\t76.00\n*GST @ 6% INCLUDED IN TOTAL\nGST SUMMARY\nCODE\tAMOUNT\t%\tTAX/AMT\nSR\t22.64\t6\t1.36\nTAX TOTAL:\t1.36\nGOODS SOLD ARE NOT REFUNDABLE,\nTHANK YOU FOR CHOOSING US.\nPLS PROVIDE ORIGINAL BILL FOR GOODS\nEXCHANGE WITHIN 1 WEEK FROM TRANSACTION\nGOODS MUST BE IN ORIGINAL STATE TO BE\nENTITLED FOR EXCHANGE.', 'DE LUXE CIRCLE FRESH MART SDN BHD\n(MUTIARA RINI 16)\nCO REG NO:797887-W\tGST NO:001507647488\nNO.89&91, JALAN UTAMA,\nTAMAN MUTIA RINI, 81300 SKUDAI, JOHOR.\nTEL:016-7780546\nMT161201806020100\t02/06/18\t02:29:13 PM\nCASHIER:\tK LECHUM\t02/06/18\t02:29:34 PM\nCOCA-COLA 320ML\n9555589200385\t1.40*1\t1.40\tZ\nF&N GOTCHA BUGGY 75ML\n8853815002880\t0.95*1\t0.95\tZ\nKING OYSTER MUSHROOM -UNIT ***\t-UNIT\n6936489102000\t3.50*1\t3.50\tZ\nLKK KUM CHUN OYSTER SAUCE 770G\n078895129052\t5.65*1\t5.65\tZ\nWHOLE CHICKEN ***\n2006031014359\t10.99*1.306\t14.35\tZ\nITEM: 5\tTOTAL\t25.85\nQTY: 5\tROUNDING\t0.00\nTOTAL SAVING:\t0.00\tTOTAL\t25.85\nTENDER\nCASH\t50.00\nCHANGE\t24.15\nGST ANALYSIS\tGOODS\tTAX AMOUNT\nS = 6%\t0.00\t0.00\nZ = 0%\t25.85\t0.00\nMEMBER 0000036581\tPOINTS EARNED: 25\nMEMBER: WONG SHOO YUEN\n*THANK YOU, SEE YOU AGAIN !!\n*CUSTOMER CARE LINE : 012-7092889\n*[email protected]']

Same text after the robust_padding function is called:

['^\t\n~!LX?N4_^FTJ5>A>=^(I1{]+DX1H)[R=[RUF{UQ~2FZ\nK8OI[`>^% IKE\tIN+5[: F#,!]SANYU STATIONERY SHOP\nNO. 31G&33G, JALAN SETIA INDAH X ,U13/X\n40170 SETIA ALAM\nMOBILE /WHATSAPPS : +6012-918 7937\nTEL: +603-3362 4137\nGST ID NO: 001531760640\nTAX INVOICE\nOWNED BY :\nSANYU SUPPLY SDN BHD (1135772-K)\nCASH SALES COUNTER\n1. 5000-0001\tPHOTOCOPY SERVICES - A4\nSIZE\n50 X 0.1000\t5.00\tSR\nTOTAL SALES INCLUSIVE GST @6%\t5.00\nDISCOUNT\t0.00\nTOTAL\t5.00\nROUND ADJ\t0.00\nFINAL TOTAL\t5.00\nCASH\t5.00\nCHANGE\t0.00\nGST SUMMARY\tAMOUNT(RM)\tTAX(RM)\nSR @ 6%\t4.72\t0.28\nINV NO: CS-SA-0097493\tDATE : 19/07/2017\nGOODS SOLD ARE NOT RETURNABLE & REFUNDABLE\nTHANK YOU FOR YOUR PATRONAGE\nPLEASE COME AGAIN.\nTERIMA KASIH SILA DATANG LAGI\n** PLEASE KEEP THIS RECEIPT FOR PROVE OF\nPURCHASE DATE FOR I.T PRODUCT WARRANTY\nPURPOSE **\nFOLLOW US IN FACEBOOK : SANYU.STATIONERY9 822340885 6887', '6360911208364\n1885 6\n8\n 628442\n20\t6\t4AIK HUAT HARDWARE\nENTERPRISE (SETIA\nALAM) SDN BHD\n822737-X\nNO. 17-G, JALAN SETIA INDAH\n(X) U13/X, SETIA ALAM,\nSEKSYEN U13, 40170 SHAH ALAM,\nTEL: 012 - 6651783 FAX: 03 - 33623608\nGST NO: 000394528768\nSIMPLIFIED TAX INVOICE\nCASH\nRECEIPT #: CSP0420207 DATE: 13/12/2017\nSALESPERSON : AH019 TIME: 17:58:00\nITEM\tQTY\tU/P\tAMOUNT\n(RM)\t(RM)\n8710163220987\t2\t12.00\t24.00\tS\nPHILIPS 18W/E27/827 ESSENTIAL BULB W/WHI\nTOTAL QUANTITY\t2\nSUB-TOTAL (GST)\t24.00\nDISC\t0.00\nROUNDING\t0.00\nTOTAL\t24.00\nCASH\t100.00\nCHANGE\t76.00\n*GST @ 6% INCLUDED IN TOTAL\nGST SUMMARY\nCODE\tAMOUNT\t%\tTAX/AMT\nSR\t22.64\t6\t1.36\nTAX TOTAL:\t1.36\nGOODS SOLD ARE NOT REFUNDABLE,\nTHANK YOU FOR CHOOSING US.\nPLS PROVIDE ORIGINAL BILL FOR GOODS\nEXCHANGE WITHIN 1 WEEK FROM TRANSACTION\nGOODS MUST BE IN ORIGINAL STATE TO BE\nENTITLED FOR EXCHANGE.            ', 'DE LUXE CIRCLE FRESH MART SDN BHD\n(MUTIARA RINI 16)\nCO REG NO:797887-W\tGST NO:001507647488\nNO.89&91, JALAN UTAMA,\nTAMAN MUTIA RINI, 81300 SKUDAI, JOHOR.\nTEL:016-7780546\nMT161201806020100\t02/06/18\t02:29:13 PM\nCASHIER:\tK LECHUM\t02/06/18\t02:29:34 PM\nCOCA-COLA 320ML\n9555589200385\t1.40*1\t1.40\tZ\nF&N GOTCHA BUGGY 75ML\n8853815002880\t0.95*1\t0.95\tZ\nKING OYSTER MUSHROOM -UNIT ***\t-UNIT\n6936489102000\t3.50*1\t3.50\tZ\nLKK KUM CHUN OYSTER SAUCE 770G\n078895129052\t5.65*1\t5.65\tZ\nWHOLE CHICKEN ***\n2006031014359\t10.99*1.306\t14.35\tZ\nITEM: 5\tTOTAL\t25.85\nQTY: 5\tROUNDING\t0.00\nTOTAL SAVING:\t0.00\tTOTAL\t25.85\nTENDER\nCASH\t50.00\nCHANGE\t24.15\nGST ANALYSIS\tGOODS\tTAX AMOUNT\nS = 6%\t0.00\t0.00\nZ = 0%\t25.85\t0.00\nMEMBER 0000036581\tPOINTS EARNED: 25\nMEMBER: WONG SHOO YUEN\n*THANK YOU, SEE YOU AGAIN !!\n*CUSTOMER CARE LINE : 012-7092889\n*[email protected]']

The first sequence of text seems to have this additional string: ^\t\n~!LX?N4_^FTJ5>A>=^(I1{]+DX1H)[R=[RUF{UQ~2FZ\nK8OI[>^% IKE\tIN+5[: F#,!]`.

But, the label keeps a constant padding of 0.

Using the repository as such, the code reaches the score of 78.31% on recall, precision and f1 score.

Removing the robust padding, the performance on test set falls to 45.61% on recall, precision and f1 score.

What is the scientific justification and reason for performance gain for this?\

Custom Dataset creation

Could you tell me how was this dataset created? I want to create my own data (image, box, entities)

Training data for task3

@zzzDavid Could you please specify how to create the training data for task3?

Training documentation

@zzzDavid Will you add a training documentation?

main.py file missing in task 3

In task 3, main.py file is missing in the src folder.

task2 img = Image.open(buf).convert('L')

How to evaluate the JSON dump created by the task3/src/test.py ?

Readme of the repo mentions the metrics on task 3.

The test.py for task3 only outputs the json dump of entities extracted. How can I get the metrics on the test data?

Also since the test_dict.pth only contains the text strings and not the actual labels.

Performance for higher number of classes in classification task 3

Hi,
I ran the barebone code and it gave (as mentioned) good results. However, I tried with my own dataset (obviously with compatible data type as in the code). So, I am not able to get any good performances.
I have 14 classes. And have also done weight adjustment for cross-entropy loss.
Played a little with embedding size. But couldn't get any digestive results.

Is there anything that can be done to make it better? I have tried doing preprocessing, changed hidden sizes, embedding sizes, none seem to give good results.

Has anyone tried it on a large number of classes other than the 5 mentioned? Nevertheless, loved how the code is seamless and wasn't any problem right away to run.

Task 3 Data Information

Hello, can you please provide some information on how the dicts and keys pth files were created. I am trying to use the model on my own data but am failing to do so (I already have the other box, img & key files)

I wonder, can we improve final score, if we encode each word and masking some numeric entry followed by classification, rather than character level classification.

I wonder, can we improve final score, if we encode each word and masking some numeric entry followed by classification, rather than character level classification for task 3?

zzzdavid / icdar-2019-sroie Goto Github PK

icdar-2019-sroie's Introduction

ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction

Background

Dataset and Annotations

Tasks

Usage Guide

Environment setup

Tasks

Result

License

icdar-2019-sroie's People

Contributors

Stargazers

Watchers

Forkers

icdar-2019-sroie's Issues

How do I solve this? Error in data_provider.py

Recommend Projects

Recommend Topics

Recommend Org