Comments (5)
@akshowhini This is easy to do when the number and name of columns are the same, which doesn't happen very often. A robust way to do this would be to group multiple tables from different pages by partially matching the column names (based on some threshold) and concatenate them.
from camelot.
The way I'm Doing this in my personal project is with
pd.concat(self.list_of_dfs)
The only problem I see is when tables have different column-names. So I just Rename them
names = self.list_of_dfs[0].columns.tolist() for df in self.list_of_dfs: df.columns = names
from camelot.
This issue is low priority as there's no general way to merge tables spanning multiple pages across millions of different types of table structures.
from camelot.
@vinayak-mehta I would like to contribute to this. However, I would like to know your expectations on the scenarios and how to handle those.
from camelot.
Has this thread seen any progress. I was looking through the same issues haven't got any permanent solution to merge tables spanning multiple pages.
from camelot.
Related Issues (20)
- two bug!
- Installing camelot with "pip install -U 'camelot-py[base]'" installs version 0.9, instead of 0.11 HOT 2
- Match size of Lines mask with Image Table
- mac m1 Ghostscript is not installed. HOT 3
- Difficulties with Multi-line headers. Rows shifted down. HOT 5
- OSS-Fuzz Integration
- Error in PyPDF2 3.0.0 HOT 5
- Updated documentation idea / installation screencasts HOT 2
- Release 0.11.0 uses deprecated pandas encoding parameter
- [Feature Request] Replace text
- Strip more than 1 string
- Test failures on ppc64el (PowerPC architecture), linux
- [Feature Request / Question] Use different OCR engine
- fail when detect abnormal border table HOT 1
- IndexError in lattice HOT 1
- Tables ignored in lattice mode HOT 1
- if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8: ZeroDivisionError: float division by zero HOT 1
- How to combine tabular and non-tabular content from a PDF?
- CLI: Margins option not processed
- Camelot Dependency Tree
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from camelot.