I have recently started learning and experimenting with Tesseract OCR. I have done a t

Thank you for your response <a class="user-mention notranslate" data-hovercard-type="u

Thank you for your response <a class="user-mention notranslate" data-hove

Training Tesseract OCR for a specific document about tesstrain HOT 4 OPEN

mumarsyal commented on May 24, 2024

Training Tesseract OCR for a specific document

from tesstrain.

Comments (4)

stefan6419846 commented on May 24, 2024

Could you please elaborate on what you are trying to achieve by training a specific document (type)? What do you expect to change compared to using the existing models?

from tesstrain.

mumarsyal commented on May 24, 2024

Thank you for your response @stefan6419846 .

I ran Tesseract default English model on this image and the output is very bad. So, I want to train Tesseract specifically for this document to improve the output but I don't know how I can generate the training dataset(line images, *.gt.txt & box files) from these images. If you could suggest me some tools to create the dataset from these images, that would be wonderful.

from tesstrain.

stefan6419846 commented on May 24, 2024

I have not tried it, but I would argue that better preprocessing on your side (feeding Tesseract with specific ROIs with appropriate preprocessing per ROI instead of the whole page, ...) might be easier and sufficient.

from tesstrain.

linxyu1 commented on May 24, 2024

Thank you for your response @stefan6419846 .

I ran Tesseract default English model on this image and the output is very bad. So, I want to train Tesseract specifically for this document to improve the output but I don't know how I can generate the training dataset(line images, *.gt.txt & box files) from these images. If you could suggest me some tools to create the dataset from these images, that would be wonderful.

hello,maybe you can use jtessboxeditor.but it is heavy workload.

from tesstrain.

Recommend Projects

Training Tesseract OCR for a specific document about tesstrain HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent