LGPMA - Running inference on any image about davar-lab-ocr HOT 2 CLOSED

hikopensource commented on June 27, 2024

LGPMA - Running inference on any image

from davar-lab-ocr.

Comments (2)

qiaoliang6 commented on June 27, 2024

The released model is not a general table recognition model. It was only trained based on the PubTabNet dataset (containing images of cropped tables) to provide a training/testing demo. You may test it on the image that only contains table. To better fit your dataset, you may fintune the model on your own dataset.

We will consider to provide a general table recognition model in the future.

from davar-lab-ocr.

jigsawcoder commented on June 27, 2024

I've been trying to detect table structure using LGPMA on the images I provided. To do so, I modified the test_pub.py like this:

import cv2
import json
import jsonlines
import numpy as np
from tqdm import tqdm
from eval_pub.metric import TEDS
from eval_pub.format import format_html
from davarocr.davar_common.apis import inference_model, init_model
import glob

# visualization setting
do_visualize = 1 # whether to visualize
vis_dir = "/content/" # path to save visualization results

# path setting
savepath = "/content/" # path to save prediction
config_file = '/content/DAVAR-Lab-OCR/demo/table_recognition/lgpma/configs/lgpma_pub.py' # config path
checkpoint_file = '/content/maskrcnn-lgpma-pub-e12-pub.pth' # model path

# loading model from config file and pth file
model = init_model(config_file, checkpoint_file)


image_path = '/content/pdf_pages_img/*'

imgs = glob.glob(image_path)
imgs.sort(key = lambda x: int(x.split('_')[-1][:-4]))


# generate prediction of html and save result to savepath
pred_dict = dict()

for im in imgs:
    result = inference_model(model, im)[0]
    pred_dict[im]=result['html']
    # detection results visualization
    print(im)
    if do_visualize:
        img = cv2.imread(im)
        img_name = im.split("/")[-1]
        bboxes = [[b[0], b[1], b[2], b[1], b[2], b[3], b[0], b[3]] for b in result['bboxes']]
        for box in bboxes:
            for j in range(0, len(box), 2):
                cv2.line(img, (box[j], box[j + 1]), (box[(j + 2) % len(box)], box[(j + 3) % len(box)]), (0, 0, 255), 1)
        cv2.imwrite(vis_dir + img_name, img)

with open(savepath+'file.json', "w", encoding="utf-8") as writer:
    json.dump(pred_dict, writer, ensure_ascii=False)

I have attached a sample of the results obtained. I thought the table looked similar enough to the examples provided and thus it could work well. However, it detects as cells text that does not belong to any table and also some cells of the table are not detected.

In order to improve the results, do I need to provide images containing only tables? Or perhaps there is something wrong with the code I provided here?

@nfoguet Can you please mention which PyTorch, MMCV and MMDET version you used to get the prediction?

from davar-lab-ocr.

LGPMA - Running inference on any image about davar-lab-ocr HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent