I have run catalog_30s.py, on one of my pdfs which has some text on the top and bottom

You should get acquainted with the parameters of <a href="https://opencv-python-tutroa

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Not able to create vertical lines and recognize clusters,about wzbsocialsciencecenter/pdftabextract

Comments (3)

internaut commented on May 26, 2024

You should get acquainted with the parameters of OpenCV's hough transform and probably experiment with the hough_votes_thresh parameter of the detect_lines method (see the example), i.e. probably set it lower in order to detect more lines. The canny_* parameters can also be helpful, but a lower value of hough_votes_thresh should be enough.
Another note: MIN_COL_WIDTH should be the approx. minimum expected column width in pixels, measured in the scanned page image. I guess your left column's width is smaller, isn't it?

from pdftabextract.

skadambala commented on May 26, 2024

@internaut Thanks for your reply.

MIN_COL_WIDTH is the width of left column measured in pixels using GIMP measurement tool. It gave me that value.

Sure, I will experiment with lowering the value of hough_votes_thresh.

Is it possible to extract tables like this, a table with only horizontal lines and no vertical lines. To a human eye, this looks like a table and can be read. Can the script help to extract such tables?

from pdftabextract.

internaut commented on May 26, 2024

When there are no column borders, they can of course not be detected by Computer Vision algorithms like Hough transform.
You'll probably have to use the distribution of x/y coordinates of the text boxes in order to find regularities (i.e. they will cluster together around certain x-positions) and hence detect the columns.

from pdftabextract.

Recommend Projects

Not able to create vertical lines and recognize clusters about pdftabextract HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent