Comments (2)
The released model is not a general table recognition model. It was only trained based on the PubTabNet dataset (containing images of cropped tables) to provide a training/testing demo. You may test it on the image that only contains table. To better fit your dataset, you may fintune the model on your own dataset.
We will consider to provide a general table recognition model in the future.
from davar-lab-ocr.
I've been trying to detect table structure using LGPMA on the images I provided. To do so, I modified the test_pub.py like this:
import cv2 import json import jsonlines import numpy as np from tqdm import tqdm from eval_pub.metric import TEDS from eval_pub.format import format_html from davarocr.davar_common.apis import inference_model, init_model import glob # visualization setting do_visualize = 1 # whether to visualize vis_dir = "/content/" # path to save visualization results # path setting savepath = "/content/" # path to save prediction config_file = '/content/DAVAR-Lab-OCR/demo/table_recognition/lgpma/configs/lgpma_pub.py' # config path checkpoint_file = '/content/maskrcnn-lgpma-pub-e12-pub.pth' # model path # loading model from config file and pth file model = init_model(config_file, checkpoint_file) image_path = '/content/pdf_pages_img/*' imgs = glob.glob(image_path) imgs.sort(key = lambda x: int(x.split('_')[-1][:-4])) # generate prediction of html and save result to savepath pred_dict = dict() for im in imgs: result = inference_model(model, im)[0] pred_dict[im]=result['html'] # detection results visualization print(im) if do_visualize: img = cv2.imread(im) img_name = im.split("/")[-1] bboxes = [[b[0], b[1], b[2], b[1], b[2], b[3], b[0], b[3]] for b in result['bboxes']] for box in bboxes: for j in range(0, len(box), 2): cv2.line(img, (box[j], box[j + 1]), (box[(j + 2) % len(box)], box[(j + 3) % len(box)]), (0, 0, 255), 1) cv2.imwrite(vis_dir + img_name, img) with open(savepath+'file.json', "w", encoding="utf-8") as writer: json.dump(pred_dict, writer, ensure_ascii=False)
I have attached a sample of the results obtained. I thought the table looked similar enough to the examples provided and thus it could work well. However, it detects as cells text that does not belong to any table and also some cells of the table are not detected.
In order to improve the results, do I need to provide images containing only tables? Or perhaps there is something wrong with the code I provided here?
@nfoguet Can you please mention which PyTorch, MMCV and MMDET version you used to get the prediction?
from davar-lab-ocr.
Related Issues (20)
- spin train.py 报错
- Question about the code in DLD
- 训练自己数据到第三轮报错 HOT 4
- result['content_ann']['bboxes'
- When will the CTUNet code and datasets be released?It's a wonderful work!
- how to i use inference?
- train with 1 gpu HOT 1
- 生成的表格
- 18
- how to convert LGPMA to onnx HOT 2
- 2 dead links on LPGMA page HOT 1
- DI 数据集使用问题,可视化"labels",在纵向阅读顺序多列情况遇到的问题
- The pretrained model cannot be downloaded HOT 2
- 请问CTUNet为何要将ComFintab的中文部分和英文部分分别训练
- Can CTUNet opensource the inference code without gt_bbox、gt_text
- Does CTUNet 's masked self-attention mechanism in the graph attention network USE the neighbours relation in Structural Graph Construction part? HOT 1
- In order to reproduce the accuracy in the LGPMA paper, is it necessary to turn on refine_bboxes during post-processing?
- 关于LGPMA训练数据格式的疑问
- Insufficient space for training VSR
- MMdistributedDataparallel 多卡加载数据报错,单卡没有问题
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from davar-lab-ocr.