Git Product home page Git Product logo

Comments (6)

bertsky avatar bertsky commented on July 23, 2024

AFAIK the Compute CTC targets failed for is merely a problem on a single line pair and not directly indicative of a problem with the network topology. Did you inspect those samples visually?

Also, why do you say these are terminal? Does training not continue?

You already linked to the VGSL docs, which are pretty comprehensive. Here is the implementation (the spec parser).

In my experience, the problem with custom net specs is more with getting training to converge to low error rates at all. Usually it stays in the high nineties percentage BCER. Once you found a workable spec, you may still need to set a large max iterations to even see the initial drop in error rate, esp. if you have many CNN layers. (Note that Tesseract has no 1d or 2d dropout, so training large networks is much harder, perhaps best attempted via append/impact strategy ...)

Your configurations should be fine IMO – what kind of material are you trying?

from tesstrain.

yaofuzhou avatar yaofuzhou commented on July 23, 2024

@bertsky Thank you for sharing your insights and the references! They are all very relevant to my project.

Back to the topic -

  1. Sorry for the confusion. By "Terminal," I just meant the shell console for Macbook. I was merely showing you the error message from Tesstrain.

  2. Yes, I visually inspected the .box, .gt.txt, .png files, which were procedurely generated for my project. The .lstmf files were generated during make lists, and were not modified since the successful run of NET_SPEC- NET_SPEC := [1,48,0,1 Ct3,3,16 Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 O1c\#\#\#]

  3. I read in a recent update that .lstmf files are no longer needed for the training process. If so, after removing the .lstmf files from my training set directory, what info do I write to list.train and list.eval? For me, these two list files used to list all .lstmf files, which are based on .png and .box files I provided. What and how to tell the training process if I want to train on the .png and .box files?

  4. To your last question, again, I am trying to generate a more powerful OCR model for a mix of Chinese language and some mathematical symbols. The goal is to have a model that is more capable of dealing with various noises. I have therefore procedurally generated 10 million text lines (I modified the makefile to accommodate a more complex directory structure to host this many files) with varying fonts, tilt, background gridlines, printer/ink/camera effects. All images were precisely labeled for each character in the .box file during the generation process.

I started by fine-tuning the existing chi_sim model, and it plateaued at about 3% error rate for a while before I decided to try a larger model which will hopefully be able to absorb the added complexity of my training information.

Then this net spec NET_SPEC := [1,64,0,1 Ct5,5,32 Mp3,3 Lfys128 Lfx256 Lrx256 Lfx1024 O1c\#\#\#] got about 35% after a few hundreds of thousands of iterations and entered a long plateau. That was when I wanted to try the 5 proposed net specs. However, from your feedback, the issue may not be entirely the size of the net, and the lack of the dropout mechanism may be a major factor. Should I do some hacking and implement the dropout mechanism myself?

Any experience or insights in training more powerful than official Tesseract OCR models will be greatly appreciated.

from tesstrain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.