Git Product home page Git Product logo

davar-lab-ocr's Introduction

DAVAR-OCR

This is the opensourced OCR repository of DAVAR Lab, from Hikvision Research Institute, China.

We begin to maintain this code repository to release the implementations of our recent academic publishments and some re-implementations of previous popular algorithms/modules in OCR.

We also provide some of the ablation experiment comparasions for better reproduction.

A short paper introduces DavarOCR is available at arxiv.

Note: Due to the policy limits of the company. All of the codes were re-implemented based on the open-source frameworks, mmdetection-2.11.0 and mmcv-1.3.4, from open-mmlab. The code architecture also refers to mmocr, which means these two frameworks can be well compatible to each other.

Implementations

To date, davarocr contains the following algorithms:

Basic OCR Tasks

Text Detection

Text Recognition

Text Spotting

Video Text Spotting

  • YORO (ACM MM 2019)

Document Understanding Tasks

Information Extraction

Table Recognition

Table Understanding

Layout Recognition

  • VSR (ICDAR 2021)

Reading Order Detection

Named Entity Reocognition

Development Environment

The recommended environment requirements can be found in mmdetection. Follows are the lowest compatible environment.

Basic Env version
Python 3.6+
cuda 10.0+
cudnn 7.6.3+
pytorch 1.3.0+
torchvision 0.4.1+
opencv 3.0.0+

For some of the algorithms (EAST, Text Perceptron), C++ version opencv are required. If you do not need to use these algorithms, you could temporarily ignore the error about 'opencv.hpp' or remove the related codes temporarily.

Installation and Development Instruction

To Download the repository and install the davarocr, please follow the instructions:

git clone https://github.com/hikopensource/DAVAR-Lab-OCR.git
cd DAVAR-Lab-OCR/
bash setup.sh

This script will automatically download and install the "mmdetection" and "mmcv-full". You can also manually install them followinging the official instructions

Going to the specific algorithm's directory to see more details.

Problem solution and collection

For the problems existing in the process of installation and researching, we will reasonably collect them and provide corresponding solutions. Please refer to FAQ.md for details.

Changelog

DavarOCR v0.6.0 was released in 13/07/2022. Please refer to Changelog.md for details and release history.

Citation

If you find this repository is helpful to your research, please feel free to cite us:

@inproceedings{qiao2022davarocr,
  title    ={{DavarOCR:} {A} Toolbox for OCR and Multi-Modal Document Understanding},
  author   ={Liang Qiao and
			  Hui Jiang and
			  Ying Chen and
			  Can Li and
			  Pengfei Li and
			  Zaisheng Li and
			  Baorui Zou and
			  Dashan Guo and
			  Yingda Xu and
			  Yunlu Xu and
			  Zhanzhan Cheng and
			  Yi Niu}
  booktitle    = {ACM MM},
  pages        = {7355--7358},
  year         = {2022}
}

License

This project is released under the Apache 2.0 license

Copyright

The copyright of corresponding contributions of our implementations belongs to Davar-Lab, Hikvision Research Institute, China, and other codes from open source repository follows the original distributive licenses.

Welcome to DAVAR-LAB!

See latest news in DAVAR-Lab. If you have any question and suggestion, please feel free to contact us. Contact email: [email protected], [email protected].

davar-lab-ocr's People

Contributors

davar-lab avatar icedream2 avatar johnson-magic avatar qiaoliang6 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

davar-lab-ocr's Issues

LGPMA

您好:
仔细拜读过贵机构的论文,感觉这种新颖思路很有创意,但是还对一些疑问能否帮助解答
论文中提到aligned bounding box 和 text region box,并且整个模型是基于mask-rcnn 。有以下几个疑问:
1.请问Mask-Rcnn 的output是aliged bounding box 还是 text region box? 如果只用Mask-Rcnn去检测 text region box ,那么 aligned bounding box是怎么得到?如果直接用Mask-Rcnn得到aliged bounding box ,那么aliged bounding box的label是怎么产生的?
2.后续的LPMA是从Mask-Rcnn得到aliged bounding box开始迭代吗?
3.GPMA部分不太明白 global segmentation 的label 怎么产生,以及global pyramid mask 怎么计算的。
感谢!!

OSError: no file with expected extension

Traceback (most recent call last):
File "/home/pengfan/anaconda3/envs/LGPMA/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/home/pengfan/DAVAR-Lab-OCR/davarocr/davarocr/davar_table/datasets/pipelines/gpma_data.py", line 50, in init
lib = ctl.load_library(lib_name, lib_dir)
File "/home/pengfan/anaconda3/envs/LGPMA/lib/python3.7/site-packages/numpy/ctypeslib.py", line 153, in load_library
raise OSError("no file with expected extension")
OSError: no file with expected extension

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/pengfan/anaconda3/envs/LGPMA/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/home/pengfan/DAVAR-Lab-OCR/davarocr/davarocr/davar_common/datasets/davar_custom.py", line 135, in init
self.pipeline = Compose(pipeline)
File "/home/pengfan/anaconda3/envs/LGPMA/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 22, in init
transform = build_from_cfg(transform, PIPELINES)
File "/home/pengfan/anaconda3/envs/LGPMA/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
OSError: GPMADataGeneration: no file with expected extension

Trained Model Download

Hi,

I was trying to get this up and running.
Sadly I am not able to download the pretrained weights from pan.baidu.com.
Is there any way to download the pth file without having a baidu account or using the BaiduNetdiskNew.exe the page requests me to install?

Best regards 👍

No package 'opencv' found

It seems the setup.sh file is does not include all the required packages!

After running bash setup.sh in the DAVAR-Lab-OCR folder I get the below error:
I installed the opencv from source and added the directory containing `opencv.pc'
to the PKG_CONFIG_PATH environment variable but still get the same error!
Is there a way to fix this issue.

Processing dependencies for davarocr==0.3.0
Finished processing dependencies for davarocr==0.3.0
Package opencv was not found in the pkg-config search path.
Perhaps you should add the directory containing opencv.pc' to the PKG_CONFIG_PATH environment variable No package 'opencv' found ./davarocr/davar_det/datasets/pipelines/lib/tp_data.cpp:23:30: fatal error: opencv2/opencv.hpp: No such file or directory #include <opencv2/opencv.hpp> ^ compilation terminated. Package opencv was not found in the pkg-config search path. Perhaps you should add the directory containing opencv.pc'
to the PKG_CONFIG_PATH environment variable
No package 'opencv' found
./davarocr/davar_det/datasets/pipelines/lib/east_data.cpp:23:30: fatal error: opencv2/opencv.hpp: No such file or directory
#include <opencv2/opencv.hpp>
^
compilation terminated.
Package opencv was not found in the pkg-config search path.
Perhaps you should add the directory containing `opencv.pc'
to the PKG_CONFIG_PATH environment variable
No package 'opencv' found
./davarocr/davar_det/core/post_processing/lib/tp_points_generate.cpp:21:30: fatal error: opencv2/opencv.hpp: No such file or directory
#include <opencv2/opencv.hpp>

code error

在/davarocr/tools/文件下直接运行train.py会出现ImportError: cannot import name 'build' from 'mmdet.models.builder' (/home/yangchengyu/anaconda3/envs/pytorch16/lib/python3.8/site-packages/mmdet/models/builder.py)这个问题是mmdet版本不合适引起的吗

TRIE test issue

i have download wildreciept dataset and set the path of it. but still i am getting following error. Can you please help me?

/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_common/datasets/pipelines/davar_loading.py:228: UserWarning: sensitive type should be in ["lower","upper","same"], but found same other inputs will be treated as "same" automatically
  ' other inputs will be treated as "same" automatically'.format(self.sensitive))
Use load_from_local loader
[                                                  ] 0/472, elapsed: 0s, ETA:/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
  File "./davarocr/tools/test.py", line 261, in <module>
    main()
  File "./davarocr/tools/test.py", line 231, in main
    args.show_score_thr, model_type)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_common/apis/test.py", line 51, in single_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/parallel/data_parallel.py", line 42, in forward
    return super().forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/fp16_utils.py", line 95, in new_func
    return old_func(*args, **kwargs)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_spotting/models/spotters/base.py", line 70, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_spotting/models/spotters/base.py", line 61, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_ie/models/trie/trie_gt.py", line 294, in simple_test
    bieo_labels=None)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_ie/models/connects/multimodal_context_module.py", line 214, in forward
    bieo_labels=bieo_labels)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_ie/models/connects/multimodal_context_module.py", line 132, in pack_batch
    b_s = pos_feat[_].size(0)
TypeError: list indices must be integers or slices, not tuple

训练VSR的数据格式

image
上面这张图是代码里VSR模块中提供的一个标注文件,想问一下上面的这些字段中,cares字段作用是什么呢?还有一个就是labels字段中,标签为0是指页眉页脚吗?

error in table regocnition test_pub.py

Hi, when I started to run test_pub.py got this error:

Traceback (most recent call last):
File "/home/vaskers5/projects/Genotek/DAVAR-Lab-OCR-0.3.0/demo/table_recognition/lgpma/tools/test_pub.py", line 18, in
from davarocr.davar_common.apis import inference_model, init_model
File "/home/vaskers5/projects/Genotek/DAVAR-Lab-OCR-0.3.0/davarocr/davarocr/init.py", line 11, in
from .davar_common import *
File "/home/vaskers5/projects/Genotek/DAVAR-Lab-OCR-0.3.0/davarocr/davarocr/davar_common/init.py", line 11, in
from .models import *
File "/home/vaskers5/projects/Genotek/DAVAR-Lab-OCR-0.3.0/davarocr/davarocr/davar_common/models/init.py", line 11, in
from mmdet.models.builder import (BACKBONES, DETECTORS, HEADS, LOSSES, NECKS,
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/models/init.py", line 6, in
from .dense_heads import * # noqa: F401,F403
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/models/dense_heads/init.py", line 1, in
from .anchor_free_head import AnchorFreeHead
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/models/dense_heads/anchor_free_head.py", line 8, in
from mmdet.core import multi_apply
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/core/init.py", line 2, in
from .bbox import * # noqa: F401, F403
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/core/bbox/init.py", line 7, in
from .samplers import (BaseSampler, CombinedSampler,
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/core/bbox/samplers/init.py", line 9, in
from .score_hlr_sampler import ScoreHLRSampler
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/core/bbox/samplers/score_hlr_sampler.py", line 2, in
from mmcv.ops import nms_match
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmcv/ops/init.py", line 1, in
from .bbox import bbox_overlaps
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmcv/ops/bbox.py", line 3, in
ext_module = ext_loader.load_ext('_ext', ['bbox_overlaps'])
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmcv/utils/ext_loader.py", line 11, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'mmcv._ext'

Process finished with exit code 1

My env packages list:

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
addict 2.4.0 pypi_0 pypi
albumentations 0.3.2 pypi_0 pypi
apted 1.0.3 pypi_0 pypi
attrs 21.2.0 pypi_0 pypi
beautifulsoup4 4.10.0 pypi_0 pypi
blas 1.0 mkl
bs4 0.0.1 pypi_0 pypi
bzip2 1.0.8 h7b6447c_0
ca-certificates 2021.10.26 h06a4308_2
certifi 2021.10.8 py37h06a4308_0
charset-normalizer 2.0.7 pypi_0 pypi
click 7.1.2 pypi_0 pypi
colorama 0.4.4 pypi_0 pypi
coverage 6.1.2 pypi_0 pypi
cudatoolkit 10.1.243 h6bb024c_0
cycler 0.11.0 pypi_0 pypi
cython 0.29.24 pypi_0 pypi
davarocr 0.3.0 dev_0
distance 0.1.3 pypi_0 pypi
editdistance 0.6.0 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.4.0 pypi_0 pypi
fonttools 4.28.1 pypi_0 pypi
freetype 2.11.0 h70c0345_0
giflib 5.2.1 h7b6447c_0
gmp 6.2.1 h2531618_2
gnutls 3.6.15 he1e5248_0
huggingface-hub 0.1.2 pypi_0 pypi
idna 3.3 pypi_0 pypi
imagecorruptions 1.1.2 pypi_0 pypi
imageio 2.9.0 pypi_0 pypi
imgaug 0.3.0 pypi_0 pypi
importlib-metadata 4.8.2 pypi_0 pypi
iniconfig 1.1.1 pypi_0 pypi
intel-openmp 2021.4.0 h06a4308_3561
joblib 1.1.0 pypi_0 pypi
jpeg 9d h7f8727e_0
jsonlines 2.0.0 pypi_0 pypi
kiwisolver 1.3.2 pypi_0 pypi
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
levenshtein 0.16.0 pypi_0 pypi
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgomp 9.3.0 h5101ec6_17
libiconv 1.15 h63c8f33_5
libidn2 2.3.2 h7f8727e_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.3.0 hd4cf53a_17
libtasn1 4.16.0 h27cfd23_0
libtiff 4.2.0 h85742a9_0
libunistring 0.9.10 h27cfd23_0
libuv 1.40.0 h7b6447c_0
libwebp 1.2.0 h89dd481_0
libwebp-base 1.2.0 h27cfd23_0
lmdb 1.2.1 pypi_0 pypi
lxml 4.6.4 pypi_0 pypi
lz4-c 1.9.3 h295c915_1
markdown 3.3.6 pypi_0 pypi
matplotlib 3.5.0 pypi_0 pypi
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py37h7f8727e_0
mkl_fft 1.3.1 py37hd3c417c_0
mkl_random 1.2.2 py37h51133e4_0
mmcv-full 1.3.4 pypi_0 pypi
mmdet 2.11.0 pypi_0 pypi
mmlvis 10.5.3 pypi_0 pypi
mmpycocotools 12.0.3 pypi_0 pypi
model-index 0.1.11 pypi_0 pypi
ncurses 6.3 h7f8727e_2
nettle 3.7.3 hbbd107a_1
networkx 2.6.3 pypi_0 pypi
ninja 1.10.2 py37hd09550d_3
nltk 3.6.5 pypi_0 pypi
numpy 1.21.2 py37h20f2e39_0
numpy-base 1.21.2 py37h79a1101_0
olefile 0.46 py37_0
onnx 1.10.2 pypi_0 pypi
opencv-python 4.5.4.58 pypi_0 pypi
opencv-python-headless 4.5.4.58 pypi_0 pypi
openh264 2.1.0 hd408876_0
openmim 0.1.5 pypi_0 pypi
openssl 1.1.1l h7f8727e_0
ordered-set 4.0.2 pypi_0 pypi
packaging 21.2 pypi_0 pypi
pandas 1.3.4 pypi_0 pypi
pillow 6.2.2 pypi_0 pypi
pip 21.2.2 py37h06a4308_0
pluggy 1.0.0 pypi_0 pypi
polygon3 3.0.9.1 pypi_0 pypi
prettytable 2.4.0 pypi_0 pypi
protobuf 3.19.1 pypi_0 pypi
py 1.11.0 pypi_0 pypi
pyclipper 1.3.0 pypi_0 pypi
pycocotools 2.0.2 pypi_0 pypi
pyparsing 2.4.7 pypi_0 pypi
pytest 6.2.5 pypi_0 pypi
pytest-cov 3.0.0 pypi_0 pypi
pytest-runner 5.3.1 pypi_0 pypi
python 3.7.11 h12debd9_0
python-dateutil 2.8.2 pypi_0 pypi
pytorch 1.7.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
pytorch-mutex 1.0 cuda pytorch
pytz 2021.3 pypi_0 pypi
pywavelets 1.2.0 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
rapidfuzz 1.8.2 pypi_0 pypi
readline 8.1 h27cfd23_0
regex 2021.11.10 pypi_0 pypi
requests 2.26.0 pypi_0 pypi
sacremoses 0.0.46 pypi_0 pypi
scikit-image 0.18.3 pypi_0 pypi
scikit-learn 1.0.1 pypi_0 pypi
scipy 1.7.2 pypi_0 pypi
seqeval 1.2.2 pypi_0 pypi
setuptools 58.0.4 py37h06a4308_0
setuptools-scm 6.3.2 pypi_0 pypi
shapely 1.8.0 pypi_0 pypi
sharedarray 3.2.1 pypi_0 pypi
six 1.16.0 pyhd3eb1b0_0
sklearn 0.0 pypi_0 pypi
soupsieve 2.3.1 pypi_0 pypi
sqlite 3.36.0 hc218d9a_0
tabulate 0.8.9 pypi_0 pypi
terminaltables 3.1.0 pypi_0 pypi
threadpoolctl 3.0.0 pypi_0 pypi
tifffile 2021.11.2 pypi_0 pypi
tk 8.6.11 h1ccaba5_0
tokenizers 0.10.3 pypi_0 pypi
toml 0.10.2 pypi_0 pypi
tomli 1.2.2 pypi_0 pypi
torchvision 0.8.1 py37_cu101 pytorch
tqdm 4.62.3 pypi_0 pypi
transformers 4.12.5 pypi_0 pypi
typing_extensions 3.10.0.2 pyh06a4308_0
urllib3 1.26.7 pypi_0 pypi
warpctc-pytorch 0.1 pypi_0 pypi
wcwidth 0.2.5 pypi_0 pypi
wheel 0.37.0 pyhd3eb1b0_1
xz 5.2.5 h7b6447c_0
yapf 0.31.0 pypi_0 pypi
zipp 3.6.0 pypi_0 pypi
zlib 1.2.11 h7b6447c_3
zstd 1.4.9 haebb681_0

Google drive support

Hey you have done a great work. i want to explore it on my testcases but it's takes time to download from baidu can you please upload the model on google drive if possible. Thanks

Training and testing instruction

Will you be able to provide the instruction for training and testing without using ocr (only table / cell detection). It's been really hard to get the code to work using setup.sh.

Thanks.

LGPMA

11
sendpix1
请问测试图片的表格结构输出不太对,这是什么原因?谢谢

How to get HTML with text in Table recognition

I got the html tags, bboxes of cell and labels. I also used the bboxes to OCR the text in the cell.
How do I use the text to fill in the HTML? given that the table cells are not in order and some cells can also be empty values

LGPMA - Running inference on any image

I've been trying to detect table structure using LGPMA on the images I provided. To do so, I modified the test_pub.py like this:

import cv2
import json
import jsonlines
import numpy as np
from tqdm import tqdm
from eval_pub.metric import TEDS
from eval_pub.format import format_html
from davarocr.davar_common.apis import inference_model, init_model
import glob

# visualization setting
do_visualize = 1 # whether to visualize
vis_dir = "/content/" # path to save visualization results

# path setting
savepath = "/content/" # path to save prediction
config_file = '/content/DAVAR-Lab-OCR/demo/table_recognition/lgpma/configs/lgpma_pub.py' # config path
checkpoint_file = '/content/maskrcnn-lgpma-pub-e12-pub.pth' # model path

# loading model from config file and pth file
model = init_model(config_file, checkpoint_file)


image_path = '/content/pdf_pages_img/*'

imgs = glob.glob(image_path)
imgs.sort(key = lambda x: int(x.split('_')[-1][:-4]))


# generate prediction of html and save result to savepath
pred_dict = dict()

for im in imgs:
    result = inference_model(model, im)[0]
    pred_dict[im]=result['html']
    # detection results visualization
    print(im)
    if do_visualize:
        img = cv2.imread(im)
        img_name = im.split("/")[-1]
        bboxes = [[b[0], b[1], b[2], b[1], b[2], b[3], b[0], b[3]] for b in result['bboxes']]
        for box in bboxes:
            for j in range(0, len(box), 2):
                cv2.line(img, (box[j], box[j + 1]), (box[(j + 2) % len(box)], box[(j + 3) % len(box)]), (0, 0, 255), 1)
        cv2.imwrite(vis_dir + img_name, img)

with open(savepath+'file.json', "w", encoding="utf-8") as writer:
    json.dump(pred_dict, writer, ensure_ascii=False)

I have attached a sample of the results obtained. I thought the table looked similar enough to the examples provided and thus it could work well. However, it detects as cells text that does not belong to any table and also some cells of the table are not detected.

In order to improve the results, do I need to provide images containing only tables? Or perhaps there is something wrong with the code I provided here?

prueba_25

'Weak' supervision results (Table 5 in the paper)

Thank you for your work.

  1. Are the result of 'Weak' supervision in table 5 evaluated using axis-aligned evaluation protocol?
    (i.e. predictions VS ground-truth axis-aligned-boxes)

  2. Can you provide those results for TotalText 'End-To-End'?
    (As I assume the reported results in this table is for Word-Spotting)

Thanks!

安装时遇到问题 fatal error: cuda_runtime_api.h: No such file or directory

running develop
running egg_info
writing davarocr.egg-info/PKG-INFO
writing dependency_links to davarocr.egg-info/dependency_links.txt
writing top-level names to davarocr.egg-info/top_level.txt
reading manifest file 'davarocr.egg-info/SOURCES.txt'
writing manifest file 'davarocr.egg-info/SOURCES.txt'
running build_ext
Creating /home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/davarocr.egg-link (link to .)
davarocr 0.3.1 is already the active version in easy-install.pth

Installed /home/voyager/ocr/DAVAR-Lab-OCR-main
Processing dependencies for davarocr==0.3.1
Finished processing dependencies for davarocr==0.3.1
10
mkdir: cannot create directory ‘build’: File exists
-- cuda found TRUE
-- Building shared library with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/voyager/ocr/DAVAR-Lab-OCR-main/davarocr/davar_rcg/third_party/warp-ctc-pytorch_bindings/build
[ 11%] Linking CXX shared library libwarpctc.so
[ 33%] Built target warpctc
[ 44%] Linking CXX executable test_cpu
[ 66%] Built target test_cpu
[ 77%] Linking CXX executable test_gpu
[100%] Built target test_gpu
running install
running bdist_egg
running egg_info
writing warpctc_pytorch.egg-info/PKG-INFO
writing dependency_links to warpctc_pytorch.egg-info/dependency_links.txt
writing top-level names to warpctc_pytorch.egg-info/top_level.txt
/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/utils/cpp_extension.py:381: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'warpctc_pytorch.egg-info/SOURCES.txt'
writing manifest file 'warpctc_pytorch.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'warpctc_pytorch._warp_ctc' extension
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/voyager/ocr/DAVAR-Lab-OCR-main/davarocr/davar_rcg/third_party/warp-ctc-pytorch_bindings/include -I/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/include -I/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/include/TH -I/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/include/THC -I:/usr/local/cuda-10.2/include -I/home/voyager/anaconda3/envs/torch_gpu/include/python3.6m -c src/binding.cpp -o build/temp.linux-x86_64-3.6/src/binding.o -std=c++14 -fPIC -DWARPCTC_ENABLE_GPU -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=_warp_ctc -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from src/binding.cpp:9:0:
/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/include/ATen/cuda/CUDAContext.h:5:30: fatal error: cuda_runtime_api.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1

你好,我在安装的时候,遇到“fatal error: cuda_runtime_api.h: No such file or directory” 问题,可以帮忙查看一下吗?
我的环境是:

GPU: RTX2080
CUDA: 10.2
Python: Anaconda env Python3.6
mmcv-full: 1.3.4
Torch: '1.10.1+cu102'
Opencv-Python: 4.5.5

没有gpma_data.so

DAVAR-Lab-OCR/blob/main/davarocr/davarocr/davar_table/datasets/pipelines/gpma_data.py第43行:
if lib_name is None or not os.path.isfile(os.path.join(lib_dir, lib_name)):
# Using default lib
cur_path = os.path.realpath(file)
lib_dir = cur_path.replace('\', '/').split('/')[:-1]
lib_dir = "/".join(lib_dir) + '/lib'
lib_name = "gpma_data.so"
if lib_name is not None and lib_dir is not None:
lib = ctl.load_library(lib_name, lib_dir)

对应的目录DAVAR-Lab-OCR/davarocr/davarocr/davar_table/datasets/pipelines/lib里没有gpma_data.so,导致训练报错

请问如何解决?谢谢

LGPMA training consume unstable

While training lgpma with my own data, the memory and time consume are very unstable. Could you tell me how to improve it?
image
image

请问LGPMA如何调用text recognition module得到完整的结果

LGPMA推理可以得到表格的HTML标签result['html']和所有cell框的位置result['bboxes']。您在README里提到The release model only contains structure-level result. You may use the text recognition module for the complete result. 请问要如何调用text recognition module得到complete result?将result['bboxes']里的每个框顺次输入OCR识别模块,得到的文字顺次填入HTML标签的 中间吗?
谢谢解答

OSError: no file with expected extension

After installing the requirements, and trying to test the model, i am getting error:

(tptron) home@home-lnx:~/programs/DAVAR-Lab-OCR/demo/text_perceptron_det$ python test.py 
Traceback (most recent call last):
  File "test.py", line 27, in <module>
    model = init_detector(config_file, checkpoint_file, device='cuda:0')
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/apis/inference.py", line 34, in init_detector
    model = build_detector(config.model, test_cfg=config.test_cfg)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/models/builder.py", line 43, in build_detector
    return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/models/builder.py", line 15, in build
    return build_from_cfg(cfg, registry, default_args)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/utils/registry.py", line 79, in build_from_cfg
    return obj_cls(**args)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/third_party/text_perceptron/mmdet/models/detectors/text_perceptron_det.py", line 59, in __init__
    self.shape_transform_module = build_roi_extractor(shape_transform_module)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/models/builder.py", line 27, in build_roi_extractor
    return build(cfg, ROI_EXTRACTORS)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/models/builder.py", line 15, in build
    return build_from_cfg(cfg, registry, default_args)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/utils/registry.py", line 79, in build_from_cfg
    return obj_cls(**args)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/third_party/text_perceptron/mmdet/models/shape_transform_module/points_generation.py", line 71, in __init__
    lib = ctl.load_library(lib_name, lib_dir)
  File "/home/home/anaconda3/envs/tptron/lib/python3.7/site-packages/numpy/ctypeslib.py", line 153, in load_library
    raise OSError("no file with expected extension")
OSError: no file with expected extension

Question about MANGO

Thanks your works!

Whether the Total-Text result reported in the paper is Word Spotting result?

VSR单张PDF图片推理问题

请问使用VSR进行图片layout识别能否给些指导?可否使用api里的推理代码,期待您的回复,谢谢

inference images in trie

i have tried using test.sh provided in the repo which works fine on the images with gt information.
how to infer other images with this model

Syntax Error in ReadMe.md

In the readme.md file, the scripts under Installation and Development Instruction should be bash NOT python style.

运行test_pub.py 报错

您好,我们在学习您的表格识别代码。但是无法运行推理脚本,报错如下:
Traceback (most recent call last):
File "/home/ihavc01/Downloads/LG/DAVAR-Lab-OCR-main/demo/table_recognition/lgpma/tools/test_pub.py", line 19, in
from davarocr.davar_common.apis import inference_model, init_model
File "/home/ihavc01/Downloads/LG/DAVAR-Lab-OCR-main/davarocr/davarocr/init.py", line 11, in
from .davar_common import *
File "/home/ihavc01/Downloads/LG/DAVAR-Lab-OCR-main/davarocr/davarocr/davar_common/init.py", line 11, in
from .models import *
File "/home/ihavc01/Downloads/LG/DAVAR-Lab-OCR-main/davarocr/davarocr/davar_common/models/init.py", line 11, in
from mmdet.models.builder import (BACKBONES, DETECTORS, HEADS, LOSSES, NECKS,
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmdet/models/init.py", line 1, in
from .backbones import * # noqa: F401,F403
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmdet/models/backbones/init.py", line 1, in
from .darknet import Darknet
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmdet/models/backbones/darknet.py", line 6, in
from mmcv.cnn import ConvModule, constant_init, kaiming_init
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmcv/cnn/init.py", line 14, in
from .builder import MODELS, build_model_from_cfg
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmcv/cnn/builder.py", line 1, in
from ..runner import Sequential
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmcv/runner/init.py", line 3, in
from .base_runner import BaseRunner
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmcv/runner/base_runner.py", line 13, in
from .checkpoint import load_checkpoint
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 14, in
import torchvision
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/init.py", line 1, in
from torchvision import models
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/models/init.py", line 11, in
from . import detection
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/models/detection/init.py", line 1, in
from .faster_rcnn import *
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/models/detection/faster_rcnn.py", line 7, in
from torchvision.ops import misc as misc_nn_ops
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/ops/init.py", line 1, in
from .boxes import nms, box_iou
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/ops/boxes.py", line 2, in
from torchvision import _C
ImportError: libcudart.so.9.0: cannot open shared object file: No such file or directory

请问能否帮助解答该问题如何解决?谢谢!!

Failed building wheel for mmpycocotools

你好,感谢你们公开项目源码,我在安装的时候,遇到了如下问题。 系统环境是通过conda创建的一个全新的Python环境

Building wheel for mmpycocotools (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/ubuntu/.conda/envs/py3.6/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ulxsr6m2/mmpycocotools_bbc7125ac5c24f7eba50e936fed7b267/setup.py'"'"'; file='"'"'/tmp/pip-install-ulxsr6m2/mmpycocotools_bbc7125ac5c24f7eba50e936fed7b267/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-9y2p8ww8
cwd: /tmp/pip-install-ulxsr6m2/mmpycocotools_bbc7125ac5c24f7eba50e936fed7b267/
Complete output (23 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/pycocotools
copying pycocotools/coco.py -> build/lib.linux-x86_64-3.6/pycocotools
copying pycocotools/mask.py -> build/lib.linux-x86_64-3.6/pycocotools
copying pycocotools/cocoeval.py -> build/lib.linux-x86_64-3.6/pycocotools
copying pycocotools/init.py -> build/lib.linux-x86_64-3.6/pycocotools
running build_ext
building 'pycocotools._mask' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/common
creating build/temp.linux-x86_64-3.6/pycocotools
gcc -pthread -B /home/ubuntu/.conda/envs/py3.6/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ubuntu/.local/lib/python3.6/site-packages/numpy/core/include -Icommon -I/home/ubuntu/.conda/envs/py3.6/include/python3.6m -c common/maskApi.c -o build/temp.linux-x86_64-3.6/common/maskApi.o
common/maskApi.c: In function ‘rleToBbox’:
common/maskApi.c:141:31: warning: ‘xp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if(j%2==0) xp=x; else if(xp<x) { ys=0; ye=h-1; }
^
gcc -pthread -B /home/ubuntu/.conda/envs/py3.6/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ubuntu/.local/lib/python3.6/site-packages/numpy/core/include -Icommon -I/home/ubuntu/.conda/envs/py3.6/include/python3.6m -c pycocotools/_mask.c -o build/temp.linux-x86_64-3.6/pycocotools/_mask.o
gcc: error: pycocotools/_mask.c: No such file or directory
error: command 'gcc' failed with exit status 1

ERROR: Failed building wheel for mmpycocotools

lgpma question

Hi!

First of all, thanks for this wonderful open-source project, also congratulations for all the achievements!

The release model only contains structure-level result. You may use the text recognition module for the complete result.

I can use the bboxes to OCR it into text, but how to match it with the table structure HTML?
Do you have any instructions on how to use some text recognition modules like RF-Learning to extract the text and embed it into the HTML table structure?

用自己的数据训练LGPMA报错ValueError: cannot convert float NaN to integer

您好,我用自己的表格数据集(PubTabNet格式的)训练LGPMA,报了错DAVAR-Lab-OCR/davarocr/davarocr/davar_table/core/mask/lp_mask_target.py", line 55, in get_lpmask_single
middle_x, middle_y = round(np.where(box_text == 1)[1].mean()), round(np.where(box_text == 1)[0].mean())
ValueError: cannot convert float NaN to integer

我准备训练集的方法是:用DAVAR-Lab-OCR/demo/table_recognition/lgpma/tools/convert_html_ann.py将我的数据集转成davar格式的。其中对convert_html_ann.py的html_to_davar函数做了一些修改:"labels"用0和1代替"t-head"和"t-body";content_ann只返回"bboxes"、"cells"和"labels"。转出来的数据格式和您公开的PubTabNet_train_datalist_all.json 格式一致,但是训练时报了上述错误。想请教下可能是哪一步出了问题,谢谢

令:用您公开的PubTabNet_train_datalist_all.json 作训练集,可以正常训练

OSError: no file with expected extension 估计是个numpy ctypeslib.py 问题

(otrpy37) server@server-Z590-UD:~/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr$ python
Python 3.7.0 (default, Oct 9 2018, 10:31:47)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

from davarocr.davar_common.apis import inference_model
east_postprocess.so
/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/core/post_processing/lib
Traceback (most recent call last):
File "", line 1, in
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/init.py", line 12, in
from .davar_det import *
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/init.py", line 11, in
from .datasets import *
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/datasets/init.py", line 12, in
from .text_det_dataset import TextDetDataset
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/datasets/text_det_dataset.py", line 15, in
from ..core.evaluation.hmean import evaluate_method
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/core/init.py", line 13, in
from .post_processing import BasePostDetector, TPPointsGeneration, PostMaskRCNN
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/core/post_processing/init.py", line 14, in
from .post_east import PostEAST
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/core/post_processing/post_east.py", line 28, in
lib = ctl.load_library(lib_name, lib_dir)
File "/home/server/anaconda3/envs/otrpy37/lib/python3.7/site-packages/numpy/ctypeslib.py", line 153, in load_library
raise OSError("no file with expected extension")
OSError: no file with expected extension

Text-layout inference

Thank you very much for your great work to merge CNN and NLP.
Can you please share your test_pipeline during inference or prediction? As the current pipeline is for evaluation.

关于MANGO结果的问题

您好,为什么我用您提供的模型测试的结果与您表中的不一样呢?
比如ICDAR2015 ResNet-50 General,您给的结果是70.8,我用您给的test_ic15.sh跑出来是69.4。有什么需要注意的吗?
谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.