Git Product home page Git Product logo

Comments (2)

qiaoliang6 avatar qiaoliang6 commented on June 27, 2024

The released model is not a general table recognition model. It was only trained based on the PubTabNet dataset (containing images of cropped tables) to provide a training/testing demo. You may test it on the image that only contains table. To better fit your dataset, you may fintune the model on your own dataset.

We will consider to provide a general table recognition model in the future.

from davar-lab-ocr.

jigsawcoder avatar jigsawcoder commented on June 27, 2024

I've been trying to detect table structure using LGPMA on the images I provided. To do so, I modified the test_pub.py like this:

import cv2
import json
import jsonlines
import numpy as np
from tqdm import tqdm
from eval_pub.metric import TEDS
from eval_pub.format import format_html
from davarocr.davar_common.apis import inference_model, init_model
import glob

# visualization setting
do_visualize = 1 # whether to visualize
vis_dir = "/content/" # path to save visualization results

# path setting
savepath = "/content/" # path to save prediction
config_file = '/content/DAVAR-Lab-OCR/demo/table_recognition/lgpma/configs/lgpma_pub.py' # config path
checkpoint_file = '/content/maskrcnn-lgpma-pub-e12-pub.pth' # model path

# loading model from config file and pth file
model = init_model(config_file, checkpoint_file)


image_path = '/content/pdf_pages_img/*'

imgs = glob.glob(image_path)
imgs.sort(key = lambda x: int(x.split('_')[-1][:-4]))


# generate prediction of html and save result to savepath
pred_dict = dict()

for im in imgs:
    result = inference_model(model, im)[0]
    pred_dict[im]=result['html']
    # detection results visualization
    print(im)
    if do_visualize:
        img = cv2.imread(im)
        img_name = im.split("/")[-1]
        bboxes = [[b[0], b[1], b[2], b[1], b[2], b[3], b[0], b[3]] for b in result['bboxes']]
        for box in bboxes:
            for j in range(0, len(box), 2):
                cv2.line(img, (box[j], box[j + 1]), (box[(j + 2) % len(box)], box[(j + 3) % len(box)]), (0, 0, 255), 1)
        cv2.imwrite(vis_dir + img_name, img)

with open(savepath+'file.json', "w", encoding="utf-8") as writer:
    json.dump(pred_dict, writer, ensure_ascii=False)

I have attached a sample of the results obtained. I thought the table looked similar enough to the examples provided and thus it could work well. However, it detects as cells text that does not belong to any table and also some cells of the table are not detected.

In order to improve the results, do I need to provide images containing only tables? Or perhaps there is something wrong with the code I provided here?

prueba_25

@nfoguet Can you please mention which PyTorch, MMCV and MMDET version you used to get the prediction?

from davar-lab-ocr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.