Git Product home page Git Product logo

fyp's Introduction

TableNet Implementation using PyTorch in Jupyter Notebook

This repo is forked (obviously) and my intent is only to convert this whole project into several Jupyter Notebooks. Currently, the project is distributed into weird .py files which I'm unable to properly run (I hate package management in Python).

TableNet using PyTorch

In this repo, you can have an implementation of the TableNet with Pytorch

Goal

My goal is here to get a dataframe from an image, the image of a scanned document holding tabular data I will want to detect the image tables, crop the tables, and then extract the tabular data into a dataframe

Data:

To populate the DummyDatabase folder you can refer to the following links:

Model:

I will use a TableNet model with DenseNet121 as the main encoder.

I tried different encoders like VGG-19, ResNet, DenseNet121, efficientNet_B0, efficientNet and I got the best results with DenseNet121

Note model itself is not uploaded because it's too big for GitHub uploads.

Model Predictions:

Predictions of the images in the folder DummyDatabase/test_images can be found in DummyDatabase/predictions

Improvement idea:

The tables the model will detect and be any of the following:

  1. Tables with full gridlines
  2. Tables with only horizontal/vertical gridlines
  3. Tables with only parts of horizontal/vertical gridlines
  4. Tables without any gridlines drawn

So I had an idea which is, no matter what the table is of the above, remove all of the horizontal and vertical gridlines (if you find any), and then apply an OpenCV algorithm to detect the proper locations of all the gridlines and draw them artificially (The idea was implemented with help from StackOverflow).

You can find this idea implemented in the folder called GridlinesImprovement.

Extract Tabular Data using pytesseract

Using the library pytesseract extract and process the tabular data and convert it into a dataframe.


Author: Lidor ES

fyp's People

Contributors

thisisamish avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.