senthilsweb / invoicepdf2data Goto Github PK
View Code? Open in Web Editor NEWThis project forked from taher5253/invoicepdf2data
Extracts text from PDF files using different techniques, like pdftotext, pdfminer or OCR – tesseract, tesseract4 or gvision (Google Cloud Vision). searches for regex in the result using a YAML-based template system saves results as CSV, JSON or XML or renames PDF files to match the content.