Git Product home page Git Product logo

pdf-itemslist-extractor's Introduction

๐Ÿ“„ PDF Items List Extractor and CSV Utility Tool

A versatile tool designed to streamline the extraction of list items from PDF documents and the merging of CSV files, ensuring unique identification across datasets.

๐Ÿ› ๏ธ Features

  • Extract Items from PDF: Convert list-like structures in PDF documents into structured CSV format.
  • Merge CSV Files: Combine multiple CSV files into a single file, maintaining unique IDs through a newly generated sequential ID column.

๐Ÿ–ฅ๏ธ Prerequisites

  • Python 3.6+
  • PyMuPDF (fitz)
  • Pandas
  • Typer

๐Ÿš€ Installation

Clone the repository and install dependencies:

git clone https://github.com/GeroZayas/PDF-itemslist-extractor.git

cd PDF-itemslist-extractor

pip install -r requirements.txt

๐Ÿ“ Usage

Extract Items from PDF

python your_script_name.py extract_and_save./path/to/your/pdf/file.pdf./desired/output/path/

Merge Multiple CSV Files

python your_script_name.py merge_csv_files./file1.csv./file2.csv./merged_output.csv

๐Ÿ“ Example

Assuming you have a PDF named example.pdf and two CSV files named data1.csv and data2.csv, you can extract items from the PDF and merge the CSV files as follows:

python your_script_name.py extract_and_save./example.pdf./extracted_items.csv

python your_script_name.py merge_csv_files./data1.csv./data2.csv./merged_data.csv

๐ŸŽฏ Contributing

Contributions are welcome Feel free to submit a pull request or open an issue to discuss improvements or report bugs.

๐Ÿ‘ค Author

Gero Zayas - @gerozayas

๐Ÿ“ง Contact

๐Ÿ“ง [email protected]

๐ŸŒ Gero Zayas Portfolio

pdf-itemslist-extractor's People

Contributors

gerozayas avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.