Git Product home page Git Product logo

ocr-backend's Introduction

ProConverter

Project Documentation: https://5mbl.github.io/proconverter-docs/docs/start/

How to Run the Code

Installation Steps:

  1. Clone the Repository:
    git clone https://github.com/5mbl/ocr-backend.git
  2. Install Dependencies:
  • Pytesseract: ¹

    • Download and install tesseract-ocr-w64-setup-5.3.3.20231005.exe.

    • After installation, locate the installed directory (e.g., C:\Program Files (x86)\Tesseract-OCR).

    • Download the German training data file from here.

    • Copy the downloaded file into the "tessdata" directory within the installed Tesseract directory (e.g., C:\Program Files (x86)\Tesseract-OCR\tessdata).

    • Specifying tesseract.exe Path in Code:

      # app.py (line 35)
      
      #pytesseract for OCR
      pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
  • Poppler: ²

    • Go to Poppler Windows Releases.

    • Under Release 23.11.0-0 Latest v23.11.0-0, navigate to Assets 3

    • Download Release-23.11.0-0.zip.

    • Extract the downloaded zip file.

    • Add the extracted Poppler folder to a location such as: C:\Users\UserName\Downloads\Release-23.11.0-0.

    • Add the path C:\Users\UserName\Downloads\Release-21.11.0-0 to the system variable Path in the Environment Variables.

    • Specifying Poppler Path in Code:

      # app.py (line 38)
      
      # poppler for pdf2img
      poppler_path = r"C:\poppler-23.11.0\Library\bin"
  • Activating the virtual enviroment:

    • Terminal: python -m venv myenv
  • Install requirements.txt:

    • pip install -r requirements.txt
  • Start the Application

    • flask run

References

¹ https://smartextract.ai/pytesseract/

² https://stackoverflow.com/questions/53481088/poppler-in-path-for-pdf2image

ocr-backend's People

Stargazers

 avatar

Watchers

 avatar  avatar

ocr-backend's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.