Git Product home page Git Product logo

otr's Introduction

OTR

Optical table recognition - recognize tables in scan images using OpenCV.

OTR uses a raster-based method to recognize tables, even if they have e.g. dashed lines or the page is slightly skewed (such as when scanning a book). OTR can not be used for tables without a visible raster!

Install

Install OpenCV for Python3 e.g. sudo apt-get install python3-opencv in Ubuntu 18.04.

I recommend to install numpy & scipy from apt if you use a deb-based linux system to speed up the dependency install: sudo apt-get install python3-scipy python3-numpy

cv_algorithms is one of the dependencies. See there for some of the algorithms used in OTR in a reusable form.

sudo pip3 install -r requirements.txt

Run

Get a test image, e.g. google for images like "Old naval log table" and select one with a table. Can't share one here due to copyright but if you know a public domain one, please add it via a pull request.

python3 test-otr.py <image filename>

It's currently only a proof of concept. See Algorithm.pdf for details on how it works.

otr's People

Contributors

munikarmanish avatar ulikoehler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

otr's Issues

how can we detect a table which doesn't have strict tables

Hi
how can we recognize the table in the image which has not a strict table but table like pattern. for example please find the attachment in that image we have a table like structure we need to draw a bounding box around it. can you please help me with that how can we approach this.
0212_175

AttributeError: 'DiDegreeView' object has no attribute 'items'

AttributeError Traceback (most recent call last)
in ()
4 contour_analyzer.filter_contours(min_area=400)
5 contour_analyzer.build_graph()
----> 6 contour_analyzer.remove_non_table_nodes()
7 contour_analyzer.compute_contour_bounding_boxes(e)
8 contour_analyzer.separate_supernode(f)

~/transfer_learning/table_data/OTR/TableRecognition.py in remove_non_table_nodes(self)
208 self.supernode_idx = max(self.g.degree().items(), key=operator.itemgetter(1))[0]
209 for i in range(len(self.contours)):
--> 210 if self.contours[i] is None: continue
211 nxt, prev, first_child, parent = self.hierarchy[0, i]
212 # Remove node if it has a non-supernode node as parent

AttributeError: 'DiDegreeView' object has no attribute 'items'

draw contours error

Traceback (most recent call last):
File "/home/temp/Desktop/OTR-master/MIME/contour1.py", line 41, in
contour_analyzer.visualize_contours(img)
File "/home/temp/Desktop/OTR-master/MIME/TableRecognition.py", line 442, in visualize_contours
cv2.drawContours(img, self.contours_bbox, -1, (0,255,0), thickness)
cv2.error: OpenCV(4.1.0) /io/opencv/modules/imgproc/src/drawing.cpp:2606: error: (-215:Assertion failed) reader.ptr != NULL in function 'cvDrawContours'

is this the opencv version conflict problem ?

not enough values to unpack

file:
http://www.nasflmuseum.com/uploads/4/9/5/8/4958573/_7848632_orig.jpg

python3 test-otr.py test2.jpg


Traceback (most recent call last):
  File "test-otr.py", line 56, in <module>
    img = runOTR(args.infile)
  File "test-otr.py", line 18, in runOTR
    contour_analyzer = TableRecognition.ContourAnalyzer(imgDil)
  File "/home/piotr/OTR/TableRecognition.py", line 76, in __init__
    im2, contours, hierarchy = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_TC89_KCOS, **kwargs)
ValueError: not enough values to unpack (expected 3, got 2)

Python 3.6.3
Distributor ID: Ubuntu
Description: Ubuntu 17.10
Release: 17.10
Codename: artful
OpenCV: 3.1.0

Order of table coordinates

Dear Uli,
Thank you for this great this package, it's made my life much easier. I am using it to recover information on how Peru's congresspersons' voted for any given law. I'm planning to publish this info to hopefully make the next elections more transparent. This is how the documents' fingerprint looks:

prueba

I've been able to extract the information for some pages. However, in others i've failed. If you look closely at the fingerprint the last column of row 21 is labeled as row 22. Do you have any recommendation on how to ensure table recognition chooses the correct x, y coordinates? Thank you so much!

How can we get extra info about detected tables?

Right now, all I get is an image out.png that shows the cells and their grid numbers (shown below).

out.png

Is there a way to programmatically get the table structure (rows, cols, spans if any) and bounding boxes of each cells? I would like to then perform OCR and extract the table data from the image.

Thanks in advance. :)

Error when loading image

I run module python test-otr.py 0110_099.png and get this error

Traceback (most recent call last):
File "C:/Users/User/Documents/проект по иммунологии/OTR/test-otr.py", line 56, in
img = runOTR("0110_099.png")
File "C:/Users/User/Documents/проект по иммунологии/OTR/test-otr.py", line 27, in runOTR
contour_analyzer.find_corner_clusters()
File "C:\Users\User\Documents\проект по иммунологии\OTR\TableRecognition.py", line 293, in find_corner_clusters
distmat = scipy.spatial.distance.cdist(corners, corners, 'euclidean')
File "C:\Users\User\Anaconda3\lib\site-packages\scipy\spatial\distance.py", line 2369, in cdist
raise ValueError('XA must be a 2-dimensional array.')
ValueError: XA must be a 2-dimensional array.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.