Git Product home page Git Product logo

tesseract-ocr-5-docker's Introduction

License Issues Last Commit

Docker Docker Docker

Tesseract-OCR-5-Docker

Docker Image with latest Tesseract OCR Version 5.x.x built from sources.

The sources are pulled from the latest main branch and latest releases of the Tesseract OCR project.

Docker Hub: https://hub.docker.com/r/franky1/tesseract

Usage

Pull Docker Image

Pull the docker image from Docker Hub:

docker pull franky1/tesseract

Run Docker Container

Mount your image data to the /tmp directory and run Tesseract OCR container with the required command line options, for example, run Tesseract OCR container with test image:

docker run -it -v ${PWD}/testdata:/tmp --rm franky1/tesseract \
  tesseract english.png output --oem 1 -l eng

For the Tesseract command line options, please refer to the Tesseract Manual

Mount more languages

Test if the mounted languages from your local subfolder /tessdata are available in the Docker container. Be aware that the local languages overwrite the installed languages in the Docker image. Example here with french language:

docker run -it -v ${PWD}/testdata:/tmp \
  -v ${PWD}/tessdata:/usr/local/share/tessdata/ \
  --rm franky1/tesseract

Test the mounted languages in the Docker container with a sample image. Example here with french language:

docker run -it -v ${PWD}/testdata:/tmp \
  -v ${PWD}/tessdata:/usr/local/share/tessdata/ \
  --rm franky1/tesseract \
  tesseract french.jpg output --oem 1 -l fra

Alternatively, you can build a new Docker image if you want other languages, see next section.

Build Docker Image yourself

For details have a look into the Dockerfile.

  1. Git clone this repo.
  2. Add your required languages to the languages.txt file.
  3. (a) Build the docker image from scratch, if you want the latest sources from the main branch.
docker build --tag tesseract .
  1. (b) Build the docker image from scratch, if you want a specific release version.
docker build --tag tesseract --build-arg TESSERACT_VERSION=5.0.0 .
  1. Run Tesseract OCR container with test image:
docker run -it --name tesseract -v ${PWD}/testdata:/tmp --rm \
  tesseract tesseract english.png output --oem 1 -l eng

Image conditions

  • Only supported target for this docker image currently is linux/amd64.
  • Working directory for ocr images is /tmp inside the container. See example above.
  • Directory for trained data is /usr/local/share/tessdata/ inside the container. See example above.
  • This image was built without the Tesseract training tools.
  • This image currently includes only the following languages:
    • English: tessdata_best > eng.traineddata
    • German: tessdata_best > deu.traineddata
    • If you need other languages, you have to build your own image or mount trained data to the /usr/local/share/tessdata/ directory. See example above.

Tesseract Trained Data for all available langauges

Further documentation

ToDo

  • Update README.md to latest Dockerfile and Usage
  • add workflow_dispatch to github workflows
  • Add dependabot on Github
  • Add vulnerability scanning in Github Actions with Snyk
  • Add GitHub Action for check container efficiency with Dive https://github.com/MartinHeinz/dive-action
  • Add badges to README.md
  • Add documentation for GitHub Actions Workflow
  • Add more inline comments in GitHub Actions related files
  • Build image for more targets
  • Building Tesseract with TensorFlow?
  • Building Tesseract with Training tools?
  • Change build in Dockerfile according to instructions in Compiling-GitInstallation.md

Issues

  • 27.07.2022 currently the build of the main source branch fails, reason is unknown

If you have any bugs or requests regarding this Docker image, please post an issue in this Github Repository.

Project status

27.07.2022: Docker Image is ready for usage, still some slight improvements possible, sometimes build issues

tesseract-ocr-5-docker's People

Contributors

dependabot[bot] avatar franky1 avatar github-actions[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.