hikopensource / davar-lab-ocr Goto Github PK

View Code? Open in Web Editor NEW

718.0 26.0 152.0 115.93 MB

OCR toolbox from Davar-Lab

License: Apache License 2.0

Python 94.34% C++ 5.45% Shell 0.22%

ocr dar

davar-lab-ocr's Introduction

DAVAR-OCR

This is the opensourced OCR repository of DAVAR Lab, from Hikvision Research Institute, China.

We begin to maintain this code repository to release the implementations of our recent academic publishments and some re-implementations of previous popular algorithms/modules in OCR.

We also provide some of the ablation experiment comparasions for better reproduction.

A short paper introduces DavarOCR is available at arxiv.

Note: Due to the policy limits of the company. All of the codes were re-implemented based on the open-source frameworks, mmdetection-2.11.0 and mmcv-1.3.4, from open-mmlab. The code architecture also refers to mmocr, which means these two frameworks can be well compatible to each other.

Implementations

To date, davarocr contains the following algorithms:

Basic OCR Tasks

Text Detection

EAST (CVPR 2017)
MASK RCNN (ICCV 2017)
Text Perceptron Det (AAAI 2020)

Text Recognition

Text Spotting

Mask RCNN E2E
Text Perceptron E2E (AAAI 2020)
MANGO (AAAI 2021)
DLD (ECCV 2022)

Video Text Spotting

YORO (ACM MM 2019)

Document Understanding Tasks

Information Extraction

Chargrid (EMNLP 2018)
TRIE (ACM MM 2020)

Table Recognition

LGPMA (ICDAR 2021)

Table Understanding

CTUNet (ACMMM 2022)

Layout Recognition

VSR (ICDAR 2021)

Reading Order Detection

GCN-PN (ECCV 2020)

Named Entity Reocognition

Bert-based NER, including BERT+CRF/Span/Softmax
BiLSTM+CRF NER (Arxiv 2016)

Development Environment

The recommended environment requirements can be found in mmdetection. Follows are the lowest compatible environment.

Basic Env	version
Python	3.6+
cuda	10.0+
cudnn	7.6.3+
pytorch	1.3.0+
torchvision	0.4.1+
opencv	3.0.0+

For some of the algorithms (EAST, Text Perceptron), C++ version opencv are required. If you do not need to use these algorithms, you could temporarily ignore the error about 'opencv.hpp' or remove the related codes temporarily.

Installation and Development Instruction

To Download the repository and install the davarocr, please follow the instructions:

git clone https://github.com/hikopensource/DAVAR-Lab-OCR.git
cd DAVAR-Lab-OCR/
bash setup.sh

This script will automatically download and install the "mmdetection" and "mmcv-full". You can also manually install them followinging the official instructions

Going to the specific algorithm's directory to see more details.

Problem solution and collection

For the problems existing in the process of installation and researching, we will reasonably collect them and provide corresponding solutions. Please refer to FAQ.md for details.

Changelog

DavarOCR v0.6.0 was released in 13/07/2022. Please refer to Changelog.md for details and release history.

Citation

If you find this repository is helpful to your research, please feel free to cite us:

@inproceedings{qiao2022davarocr,
  title    ={{DavarOCR:} {A} Toolbox for OCR and Multi-Modal Document Understanding},
  author   ={Liang Qiao and
			  Hui Jiang and
			  Ying Chen and
			  Can Li and
			  Pengfei Li and
			  Zaisheng Li and
			  Baorui Zou and
			  Dashan Guo and
			  Yingda Xu and
			  Yunlu Xu and
			  Zhanzhan Cheng and
			  Yi Niu}
  booktitle    = {ACM MM},
  pages        = {7355--7358},
  year         = {2022}
}

License

This project is released under the Apache 2.0 license

Copyright

The copyright of corresponding contributions of our implementations belongs to Davar-Lab, Hikvision Research Institute, China, and other codes from open source repository follows the original distributive licenses.

Welcome to DAVAR-LAB!

See latest news in DAVAR-Lab. If you have any question and suggestion, please feel free to contact us. Contact email: [email protected], [email protected].

davar-lab-ocr's People

Contributors

Stargazers

Watchers

Forkers

chengzhanzhan jeffreykuang cqray1990 fendaq cuppersd shengzhang90 jeozhao yangliang2017 shengfly liuzhuang1024 askasjoe beyondyourself cylzty110 jxncyym tangsanli5201 icedream2 fireae emberss mrjson1 yeyoe surelyee xlihub baoyuxu wulitaotao1 it-ml-team yanshuang17 floyd809 274869388 ami66 bobycv06fpm baifanysu csk8975 mapstory6788 jkl375 forthing sunxingxingtf qjuse morhaliukol kuankuanren llf10811020205 tangelian ocrworld firstelfin light201212 neverstoplearn lazzzzy maxpark sigdelta zumbalamambo tablerecognitionorg laofeiwei chros425 neosiswork potter2010 johnnguyen14111998 hikvision-research esword618 gds101054108 anhlbt nhlinh99 zaishengli agilepioneers aspnetcs quentin-wang accelextechnology piaoxue88 jordan-barrett-jm mustakin-choudhury mfkiwl hkksimple clscy vishnupriyavr mayhs19 lopsdir fangpanliang iroh97 v-smith chasemonsteraway hoangnguyenkcv xrosliang zyzyzhou eyebies zbbbella thn0000 ai1361720220000 khalilcharfi jm2mllbb lddniupi 920666358 smilelite zhaoge0202 anminhhung smithliuying andyhui711 aniketgurav vincentarthur witzou longkaiaaa sree181 qutrino

davar-lab-ocr's Issues

LGPMA

您好：
仔细拜读过贵机构的论文，感觉这种新颖思路很有创意，但是还对一些疑问能否帮助解答
论文中提到aligned bounding box 和 text region box，并且整个模型是基于mask-rcnn 。有以下几个疑问：
1.请问Mask-Rcnn 的output是aliged bounding box 还是 text region box？如果只用Mask-Rcnn去检测 text region box ，那么 aligned bounding box是怎么得到？如果直接用Mask-Rcnn得到aliged bounding box ，那么aliged bounding box的label是怎么产生的？
2.后续的LPMA是从Mask-Rcnn得到aliged bounding box开始迭代吗？
3.GPMA部分不太明白 global segmentation 的label 怎么产生，以及global pyramid mask 怎么计算的。
感谢！！

OSError: no file with expected extension

Traceback (most recent call last):
File "/home/pengfan/anaconda3/envs/LGPMA/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/home/pengfan/DAVAR-Lab-OCR/davarocr/davarocr/davar_table/datasets/pipelines/gpma_data.py", line 50, in init
lib = ctl.load_library(lib_name, lib_dir)
File "/home/pengfan/anaconda3/envs/LGPMA/lib/python3.7/site-packages/numpy/ctypeslib.py", line 153, in load_library
raise OSError("no file with expected extension")
OSError: no file with expected extension

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/pengfan/anaconda3/envs/LGPMA/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/home/pengfan/DAVAR-Lab-OCR/davarocr/davarocr/davar_common/datasets/davar_custom.py", line 135, in init
self.pipeline = Compose(pipeline)
File "/home/pengfan/anaconda3/envs/LGPMA/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 22, in init
transform = build_from_cfg(transform, PIPELINES)
File "/home/pengfan/anaconda3/envs/LGPMA/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
OSError: GPMADataGeneration: no file with expected extension

how to export model to onnx format?

How to prepare data for table recognition?

Hi,
Thank for your great work.
I want to retrain lgpma model for table recognition task? How can I convert from pubtab dataset format to davar format?

Trained Model Download

Hi,

I was trying to get this up and running.
Sadly I am not able to download the pretrained weights from pan.baidu.com.
Is there any way to download the pth file without having a baidu account or using the BaiduNetdiskNew.exe the page requests me to install?

Best regards 👍

No package 'opencv' found

It seems the setup.sh file is does not include all the required packages!

After running bash setup.sh in the DAVAR-Lab-OCR folder I get the below error:
I installed the opencv from source and added the directory containing `opencv.pc'
to the PKG_CONFIG_PATH environment variable but still get the same error!
Is there a way to fix this issue.

Processing dependencies for davarocr==0.3.0
Finished processing dependencies for davarocr==0.3.0
Package opencv was not found in the pkg-config search path.
Perhaps you should add the directory containing opencv.pc' to the PKG_CONFIG_PATH environment variable No package 'opencv' found ./davarocr/davar_det/datasets/pipelines/lib/tp_data.cpp:23:30: fatal error: opencv2/opencv.hpp: No such file or directory #include <opencv2/opencv.hpp> ^ compilation terminated. Package opencv was not found in the pkg-config search path. Perhaps you should add the directory containing opencv.pc'
to the PKG_CONFIG_PATH environment variable
No package 'opencv' found
./davarocr/davar_det/datasets/pipelines/lib/east_data.cpp:23:30: fatal error: opencv2/opencv.hpp: No such file or directory
#include <opencv2/opencv.hpp>
^
compilation terminated.
Package opencv was not found in the pkg-config search path.
Perhaps you should add the directory containing `opencv.pc'
to the PKG_CONFIG_PATH environment variable
No package 'opencv' found
./davarocr/davar_det/core/post_processing/lib/tp_points_generate.cpp:21:30: fatal error: opencv2/opencv.hpp: No such file or directory
#include <opencv2/opencv.hpp>

code error

在/davarocr/tools/文件下直接运行train.py会出现ImportError: cannot import name 'build' from 'mmdet.models.builder' （/home/yangchengyu/anaconda3/envs/pytorch16/lib/python3.8/site-packages/mmdet/models/builder.py)这个问题是mmdet版本不合适引起的吗

TRIE test issue

i have download wildreciept dataset and set the path of it. but still i am getting following error. Can you please help me?

/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_common/datasets/pipelines/davar_loading.py:228: UserWarning: sensitive type should be in ["lower","upper","same"], but found same other inputs will be treated as "same" automatically
  ' other inputs will be treated as "same" automatically'.format(self.sensitive))
Use load_from_local loader
[                                                  ] 0/472, elapsed: 0s, ETA:/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
  File "./davarocr/tools/test.py", line 261, in <module>
    main()
  File "./davarocr/tools/test.py", line 231, in main
    args.show_score_thr, model_type)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_common/apis/test.py", line 51, in single_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/parallel/data_parallel.py", line 42, in forward
    return super().forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/fp16_utils.py", line 95, in new_func
    return old_func(*args, **kwargs)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_spotting/models/spotters/base.py", line 70, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_spotting/models/spotters/base.py", line 61, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_ie/models/trie/trie_gt.py", line 294, in simple_test
    bieo_labels=None)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_ie/models/connects/multimodal_context_module.py", line 214, in forward
    bieo_labels=bieo_labels)
  File "/content/DAVAR-Lab-OCR/davarocr/davarocr/davar_ie/models/connects/multimodal_context_module.py", line 132, in pack_batch
    b_s = pos_feat[_].size(0)
TypeError: list indices must be integers or slices, not tuple

训练VSR的数据格式

上面这张图是代码里VSR模块中提供的一个标注文件，想问一下上面的这些字段中，cares字段作用是什么呢？还有一个就是labels字段中，标签为0是指页眉页脚吗？

No module named 'mmcv.runner.dist_utils'

Hello~ when to upload dist_utils files?

error in table regocnition test_pub.py

Hi, when I started to run test_pub.py got this error:

Traceback (most recent call last):
File "/home/vaskers5/projects/Genotek/DAVAR-Lab-OCR-0.3.0/demo/table_recognition/lgpma/tools/test_pub.py", line 18, in
from davarocr.davar_common.apis import inference_model, init_model
File "/home/vaskers5/projects/Genotek/DAVAR-Lab-OCR-0.3.0/davarocr/davarocr/init.py", line 11, in
from .davar_common import *
File "/home/vaskers5/projects/Genotek/DAVAR-Lab-OCR-0.3.0/davarocr/davarocr/davar_common/init.py", line 11, in
from .models import *
File "/home/vaskers5/projects/Genotek/DAVAR-Lab-OCR-0.3.0/davarocr/davarocr/davar_common/models/init.py", line 11, in
from mmdet.models.builder import (BACKBONES, DETECTORS, HEADS, LOSSES, NECKS,
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/models/init.py", line 6, in
from .dense_heads import * # noqa: F401,F403
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/models/dense_heads/init.py", line 1, in
from .anchor_free_head import AnchorFreeHead
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/models/dense_heads/anchor_free_head.py", line 8, in
from mmdet.core import multi_apply
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/core/init.py", line 2, in
from .bbox import * # noqa: F401, F403
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/core/bbox/init.py", line 7, in
from .samplers import (BaseSampler, CombinedSampler,
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/core/bbox/samplers/init.py", line 9, in
from .score_hlr_sampler import ScoreHLRSampler
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmdet/core/bbox/samplers/score_hlr_sampler.py", line 2, in
from mmcv.ops import nms_match
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmcv/ops/init.py", line 1, in
from .bbox import bbox_overlaps
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmcv/ops/bbox.py", line 3, in
ext_module = ext_loader.load_ext('_ext', ['bbox_overlaps'])
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/site-packages/mmcv/utils/ext_loader.py", line 11, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "/home/vaskers5/miniconda3/envs/davar/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'mmcv._ext'

Process finished with exit code 1

My env packages list:

Google drive support

Hey you have done a great work. i want to explore it on my testcases but it's takes time to download from baidu can you please upload the model on google drive if possible. Thanks

Training and testing instruction

Will you be able to provide the instruction for training and testing without using ocr (only table / cell detection). It's been really hard to get the code to work using setup.sh.

Thanks.

PubTabNet datalist失效

显示PubTabNet datalist失效了，能否重新放一个链接

which annotation tool is used for video text tracking

which annotation tool is used for video text tracking?

能否提供icdar2015_video测试集json文件

你好，我最近在跟进你们的YORO算法，想问一下能不能提供一下icdar2015_video的测试集json文件

LGPMA

请问测试图片的表格结构输出不太对，这是什么原因？谢谢

How to get HTML with text in Table recognition

I got the html tags, bboxes of cell and labels. I also used the bboxes to OCR the text in the cell.
How do I use the text to fill in the HTML? given that the table cells are not in order and some cells can also be empty values

When to release the evaluation concerning video text spotting evaluation metric

Thank contributors for the great open sources. There are some video text spotting algorithms(YORO, FREE) that will be released in Todo. And I also want to know that whether the corresponding metric also will be released, and when to be released.

Mismatch between the number of bounding boxes and html tags from LGPMA

Hi,

Thank you for sharing your wonderful source code and pre-trained model.
Is it possible that mismatch is occurred between the number of bounding boxes and it showed by html tags?

Best regards 👍

vis for table recognition

Hi,
Could you pls tell me how to generate the vis images like below, tks!

LGPMA只有表格结构的结果吗？

LGPMA没有单元格位置信息？

LGPMA - Running inference on any image

I've been trying to detect table structure using LGPMA on the images I provided. To do so, I modified the test_pub.py like this:

import cv2
import json
import jsonlines
import numpy as np
from tqdm import tqdm
from eval_pub.metric import TEDS
from eval_pub.format import format_html
from davarocr.davar_common.apis import inference_model, init_model
import glob

# visualization setting
do_visualize = 1 # whether to visualize
vis_dir = "/content/" # path to save visualization results

# path setting
savepath = "/content/" # path to save prediction
config_file = '/content/DAVAR-Lab-OCR/demo/table_recognition/lgpma/configs/lgpma_pub.py' # config path
checkpoint_file = '/content/maskrcnn-lgpma-pub-e12-pub.pth' # model path

# loading model from config file and pth file
model = init_model(config_file, checkpoint_file)


image_path = '/content/pdf_pages_img/*'

imgs = glob.glob(image_path)
imgs.sort(key = lambda x: int(x.split('_')[-1][:-4]))


# generate prediction of html and save result to savepath
pred_dict = dict()

for im in imgs:
    result = inference_model(model, im)[0]
    pred_dict[im]=result['html']
    # detection results visualization
    print(im)
    if do_visualize:
        img = cv2.imread(im)
        img_name = im.split("/")[-1]
        bboxes = [[b[0], b[1], b[2], b[1], b[2], b[3], b[0], b[3]] for b in result['bboxes']]
        for box in bboxes:
            for j in range(0, len(box), 2):
                cv2.line(img, (box[j], box[j + 1]), (box[(j + 2) % len(box)], box[(j + 3) % len(box)]), (0, 0, 255), 1)
        cv2.imwrite(vis_dir + img_name, img)

with open(savepath+'file.json', "w", encoding="utf-8") as writer:
    json.dump(pred_dict, writer, ensure_ascii=False)

I have attached a sample of the results obtained. I thought the table looked similar enough to the examples provided and thus it could work well. However, it detects as cells text that does not belong to any table and also some cells of the table are not detected.

In order to improve the results, do I need to provide images containing only tables? Or perhaps there is something wrong with the code I provided here?

Problem with Offline Inference and Evaluation

I am trying to run the test_pub.py on the pubtabnet datasets as given in your description but I get the error below:

'Weak' supervision results (Table 5 in the paper)

Thank you for your work.

Are the result of 'Weak' supervision in table 5 evaluated using axis-aligned evaluation protocol?
(i.e. predictions VS ground-truth axis-aligned-boxes)
Can you provide those results for TotalText 'End-To-End'?
(As I assume the reported results in this table is for Word-Spotting)

Thanks!

安装时遇到问题 fatal error: cuda_runtime_api.h: No such file or directory

running develop
running egg_info
writing davarocr.egg-info/PKG-INFO
writing dependency_links to davarocr.egg-info/dependency_links.txt
writing top-level names to davarocr.egg-info/top_level.txt
reading manifest file 'davarocr.egg-info/SOURCES.txt'
writing manifest file 'davarocr.egg-info/SOURCES.txt'
running build_ext
Creating /home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/davarocr.egg-link (link to .)
davarocr 0.3.1 is already the active version in easy-install.pth

Installed /home/voyager/ocr/DAVAR-Lab-OCR-main
Processing dependencies for davarocr==0.3.1
Finished processing dependencies for davarocr==0.3.1
10
mkdir: cannot create directory ‘build’: File exists
-- cuda found TRUE
-- Building shared library with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/voyager/ocr/DAVAR-Lab-OCR-main/davarocr/davar_rcg/third_party/warp-ctc-pytorch_bindings/build
[ 11%] Linking CXX shared library libwarpctc.so
[ 33%] Built target warpctc
[ 44%] Linking CXX executable test_cpu
[ 66%] Built target test_cpu
[ 77%] Linking CXX executable test_gpu
[100%] Built target test_gpu
running install
running bdist_egg
running egg_info
writing warpctc_pytorch.egg-info/PKG-INFO
writing dependency_links to warpctc_pytorch.egg-info/dependency_links.txt
writing top-level names to warpctc_pytorch.egg-info/top_level.txt
/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/utils/cpp_extension.py:381: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'warpctc_pytorch.egg-info/SOURCES.txt'
writing manifest file 'warpctc_pytorch.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'warpctc_pytorch._warp_ctc' extension
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/voyager/ocr/DAVAR-Lab-OCR-main/davarocr/davar_rcg/third_party/warp-ctc-pytorch_bindings/include -I/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/include -I/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/include/TH -I/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/include/THC -I:/usr/local/cuda-10.2/include -I/home/voyager/anaconda3/envs/torch_gpu/include/python3.6m -c src/binding.cpp -o build/temp.linux-x86_64-3.6/src/binding.o -std=c++14 -fPIC -DWARPCTC_ENABLE_GPU -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=_warp_ctc -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from src/binding.cpp:9:0:
/home/voyager/anaconda3/envs/torch_gpu/lib/python3.6/site-packages/torch/include/ATen/cuda/CUDAContext.h:5:30: fatal error: cuda_runtime_api.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1

你好，我在安装的时候，遇到“fatal error: cuda_runtime_api.h: No such file or directory” 问题，可以帮忙查看一下吗？
我的环境是：

GPU: RTX2080
CUDA: 10.2
Python: Anaconda env Python3.6
mmcv-full: 1.3.4
Torch: '1.10.1+cu102'
Opencv-Python: 4.5.5

Questions about the implementations of RF-Learning

Hi, thanks for your work. I wonder what "RF-Learning visual" means, and why the output dimension of the final FC layer in counting_head is the number of the character classes (since the prediction target is the text length)?

没有gpma_data.so

DAVAR-Lab-OCR/blob/main/davarocr/davarocr/davar_table/datasets/pipelines/gpma_data.py第43行：
if lib_name is None or not os.path.isfile(os.path.join(lib_dir, lib_name)):
# Using default lib
cur_path = os.path.realpath(file)
lib_dir = cur_path.replace('\', '/').split('/')[:-1]
lib_dir = "/".join(lib_dir) + '/lib'
lib_name = "gpma_data.so"
if lib_name is not None and lib_dir is not None:
lib = ctl.load_library(lib_name, lib_dir)

对应的目录DAVAR-Lab-OCR/davarocr/davarocr/davar_table/datasets/pipelines/lib里没有gpma_data.so，导致训练报错

请问如何解决？谢谢

请问下VSR大概什么时候可以开源呢

VSR: A Unified Framework for Document Layout Analysis combining Vision, Semantics and Relations 这篇文章，工作很棒，非常期待！

LGPMA training consume unstable

While training lgpma with my own data, the memory and time consume are very unstable. Could you tell me how to improve it?

Download file from baidu

Hi, thanks for the great repo!

I want to try the LGPMA-table recognition model but can not download the model weight from Baidu. Can you upload the pre-trained model on google drive?

Thanks in advance!

请问LGPMA如何调用text recognition module得到完整的结果

LGPMA推理可以得到表格的HTML标签result['html']和所有cell框的位置result['bboxes']。您在README里提到The release model only contains structure-level result. You may use the text recognition module for the complete result. 请问要如何调用text recognition module得到complete result？将result['bboxes']里的每个框顺次输入OCR识别模块，得到的文字顺次填入HTML标签的中间吗？
谢谢解答

OSError: no file with expected extension

After installing the requirements, and trying to test the model, i am getting error:

(tptron) home@home-lnx:~/programs/DAVAR-Lab-OCR/demo/text_perceptron_det$ python test.py 
Traceback (most recent call last):
  File "test.py", line 27, in <module>
    model = init_detector(config_file, checkpoint_file, device='cuda:0')
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/apis/inference.py", line 34, in init_detector
    model = build_detector(config.model, test_cfg=config.test_cfg)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/models/builder.py", line 43, in build_detector
    return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/models/builder.py", line 15, in build
    return build_from_cfg(cfg, registry, default_args)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/utils/registry.py", line 79, in build_from_cfg
    return obj_cls(**args)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/third_party/text_perceptron/mmdet/models/detectors/text_perceptron_det.py", line 59, in __init__
    self.shape_transform_module = build_roi_extractor(shape_transform_module)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/models/builder.py", line 27, in build_roi_extractor
    return build(cfg, ROI_EXTRACTORS)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/models/builder.py", line 15, in build
    return build_from_cfg(cfg, registry, default_args)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/mmdet/utils/registry.py", line 79, in build_from_cfg
    return obj_cls(**args)
  File "/home/home/programs/DAVAR-Lab-OCR/mmdetection/third_party/text_perceptron/mmdet/models/shape_transform_module/points_generation.py", line 71, in __init__
    lib = ctl.load_library(lib_name, lib_dir)
  File "/home/home/anaconda3/envs/tptron/lib/python3.7/site-packages/numpy/ctypeslib.py", line 153, in load_library
    raise OSError("no file with expected extension")
OSError: no file with expected extension

LGPMA 推理过程并没有使用 global_seg_results ，似乎没找到mask re-score部分的代码

with_global_seg为True,但是在推理过程 lgpma 的 post_processor 里并没有找到哪里使用了MASK类的预测结果

Please add demo for using DAVAR-Lab-OCR in colab

Could you add a demo ipynb notebooks for table_recognition, text_detection, text_ie, text_recognition, text_spotting, videotext to work in colab.

Question about MANGO

Thanks your works!

Whether the Total-Text result reported in the paper is Word Spotting result?

VSR单张PDF图片推理问题

请问使用VSR进行图片layout识别能否给些指导？可否使用api里的推理代码，期待您的回复，谢谢

inference images in trie

i have tried using test.sh provided in the repo which works fine on the images with gt information.
how to infer other images with this model

Syntax Error in ReadMe.md

In the readme.md file, the scripts under Installation and Development Instruction should be bash NOT python style.

运行test_pub.py 报错

您好，我们在学习您的表格识别代码。但是无法运行推理脚本，报错如下：
Traceback (most recent call last):
File "/home/ihavc01/Downloads/LG/DAVAR-Lab-OCR-main/demo/table_recognition/lgpma/tools/test_pub.py", line 19, in
from davarocr.davar_common.apis import inference_model, init_model
File "/home/ihavc01/Downloads/LG/DAVAR-Lab-OCR-main/davarocr/davarocr/init.py", line 11, in
from .davar_common import *
File "/home/ihavc01/Downloads/LG/DAVAR-Lab-OCR-main/davarocr/davarocr/davar_common/init.py", line 11, in
from .models import *
File "/home/ihavc01/Downloads/LG/DAVAR-Lab-OCR-main/davarocr/davarocr/davar_common/models/init.py", line 11, in
from mmdet.models.builder import (BACKBONES, DETECTORS, HEADS, LOSSES, NECKS,
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmdet/models/init.py", line 1, in
from .backbones import * # noqa: F401,F403
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmdet/models/backbones/init.py", line 1, in
from .darknet import Darknet
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmdet/models/backbones/darknet.py", line 6, in
from mmcv.cnn import ConvModule, constant_init, kaiming_init
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmcv/cnn/init.py", line 14, in
from .builder import MODELS, build_model_from_cfg
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmcv/cnn/builder.py", line 1, in
from ..runner import Sequential
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmcv/runner/init.py", line 3, in
from .base_runner import BaseRunner
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmcv/runner/base_runner.py", line 13, in
from .checkpoint import load_checkpoint
File "/home/ihavc01/anaconda3/envs/python36/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 14, in
import torchvision
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/init.py", line 1, in
from torchvision import models
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/models/init.py", line 11, in
from . import detection
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/models/detection/init.py", line 1, in
from .faster_rcnn import *
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/models/detection/faster_rcnn.py", line 7, in
from torchvision.ops import misc as misc_nn_ops
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/ops/init.py", line 1, in
from .boxes import nms, box_iou
File "/home/ihavc01/.local/lib/python3.6/site-packages/torchvision/ops/boxes.py", line 2, in
from torchvision import _C
ImportError: libcudart.so.9.0: cannot open shared object file: No such file or directory

请问能否帮助解答该问题如何解决？谢谢！！

Please, why the result of my test based on given models is not so ideal(especially on IC15 datasets)?May I need to change the value of IoU or something else?

Failed building wheel for mmpycocotools

你好，感谢你们公开项目源码，我在安装的时候，遇到了如下问题。系统环境是通过conda创建的一个全新的Python环境

Building wheel for mmpycocotools (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/ubuntu/.conda/envs/py3.6/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ulxsr6m2/mmpycocotools_bbc7125ac5c24f7eba50e936fed7b267/setup.py'"'"'; file='"'"'/tmp/pip-install-ulxsr6m2/mmpycocotools_bbc7125ac5c24f7eba50e936fed7b267/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-9y2p8ww8
cwd: /tmp/pip-install-ulxsr6m2/mmpycocotools_bbc7125ac5c24f7eba50e936fed7b267/
Complete output (23 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/pycocotools
copying pycocotools/coco.py -> build/lib.linux-x86_64-3.6/pycocotools
copying pycocotools/mask.py -> build/lib.linux-x86_64-3.6/pycocotools
copying pycocotools/cocoeval.py -> build/lib.linux-x86_64-3.6/pycocotools
copying pycocotools/init.py -> build/lib.linux-x86_64-3.6/pycocotools
running build_ext
building 'pycocotools._mask' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/common
creating build/temp.linux-x86_64-3.6/pycocotools
gcc -pthread -B /home/ubuntu/.conda/envs/py3.6/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ubuntu/.local/lib/python3.6/site-packages/numpy/core/include -Icommon -I/home/ubuntu/.conda/envs/py3.6/include/python3.6m -c common/maskApi.c -o build/temp.linux-x86_64-3.6/common/maskApi.o
common/maskApi.c: In function ‘rleToBbox’:
common/maskApi.c:141:31: warning: ‘xp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if(j%2==0) xp=x; else if(xp<x) { ys=0; ye=h-1; }
^
gcc -pthread -B /home/ubuntu/.conda/envs/py3.6/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ubuntu/.local/lib/python3.6/site-packages/numpy/core/include -Icommon -I/home/ubuntu/.conda/envs/py3.6/include/python3.6m -c pycocotools/_mask.c -o build/temp.linux-x86_64-3.6/pycocotools/_mask.o
gcc: error: pycocotools/_mask.c: No such file or directory
error: command 'gcc' failed with exit status 1

ERROR: Failed building wheel for mmpycocotools

lgpma question

Hi!

First of all, thanks for this wonderful open-source project, also congratulations for all the achievements!

The release model only contains structure-level result. You may use the text recognition module for the complete result.

I can use the bboxes to OCR it into text, but how to match it with the table structure HTML?
Do you have any instructions on how to use some text recognition modules like RF-Learning to extract the text and embed it into the HTML table structure?

用自己的数据训练LGPMA报错ValueError: cannot convert float NaN to integer

您好，我用自己的表格数据集（PubTabNet格式的）训练LGPMA，报了错DAVAR-Lab-OCR/davarocr/davarocr/davar_table/core/mask/lp_mask_target.py", line 55, in get_lpmask_single
middle_x, middle_y = round(np.where(box_text == 1)[1].mean()), round(np.where(box_text == 1)[0].mean())
ValueError: cannot convert float NaN to integer

我准备训练集的方法是：用DAVAR-Lab-OCR/demo/table_recognition/lgpma/tools/convert_html_ann.py将我的数据集转成davar格式的。其中对convert_html_ann.py的html_to_davar函数做了一些修改："labels"用0和1代替"t-head"和"t-body"；content_ann只返回"bboxes"、"cells"和"labels"。转出来的数据格式和您公开的PubTabNet_train_datalist_all.json 格式一致，但是训练时报了上述错误。想请教下可能是哪一步出了问题，谢谢

令：用您公开的PubTabNet_train_datalist_all.json 作训练集，可以正常训练

cannot find inference_model in davarocr.davar_videotext.apis

cannot find inference_model in davarocr.davar_videotext.apis，unable to run yoro/det/test.py

OSError: no file with expected extension 估计是个numpy ctypeslib.py 问题

(otrpy37) server@server-Z590-UD:~/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr$ python
Python 3.7.0 (default, Oct 9 2018, 10:31:47)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

from davarocr.davar_common.apis import inference_model
east_postprocess.so
/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/core/post_processing/lib
Traceback (most recent call last):
File "", line 1, in
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/init.py", line 12, in
from .davar_det import *
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/init.py", line 11, in
from .datasets import *
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/datasets/init.py", line 12, in
from .text_det_dataset import TextDetDataset
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/datasets/text_det_dataset.py", line 15, in
from ..core.evaluation.hmean import evaluate_method
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/core/init.py", line 13, in
from .post_processing import BasePostDetector, TPPointsGeneration, PostMaskRCNN
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/core/post_processing/init.py", line 14, in
from .post_east import PostEAST
File "/home/server/Desktop/ocr_piles/DAVAR-Lab-OCR/davarocr/davarocr/davar_det/core/post_processing/post_east.py", line 28, in
lib = ctl.load_library(lib_name, lib_dir)
File "/home/server/anaconda3/envs/otrpy37/lib/python3.7/site-packages/numpy/ctypeslib.py", line 153, in load_library
raise OSError("no file with expected extension")
OSError: no file with expected extension