tesseract-ocr-aze's Introduction

Tesseract-OCR-AZE

Tesseract on few examples, such as template matching for different languages (AZE)

Tesseract OCR

About

This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.

Note: Even though tested on an actual Azerbaijani example, I got the Turkish auto detection. So, I also used TUR traindata from Tesseract. Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box".

Tesseract supports various image formats including PNG, JPEG and TIFF.

Tesseract supports various output formats: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV and ALTO (the last one - since version 4.1.0).

You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the image you are giving Tesseract.

Tesseract can be trained to recognize other languages. See Tesseract Training for more information.

Installing Tesseract

You can either Install Tesseract via pre-built binary package or build it from source.

License

The code in this repository is licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

tesseract-ocr-aze's People

Contributors

Stargazers

Watchers