Comments (18)
Which version of the library are you using? If you're using the 2.2 CocoaPod, there is a known memory leak in that version, but it has since been fixed. The CocoaPod has just not been updated, for some unknown reason (see #49). You can use the latest version of the repo via CocoaPods by using this line in your Podfile:
pod 'TesseractOCRiOS', :git => 'https://github.com/gali8/Tesseract-OCR-iOS.git'
from tesseract-ocr-ios.
I use the latest version 3.03
from tesseract-ocr-ios.
Can you post a code snippet or link to your code so I can try reproducing the high memory usage?
from tesseract-ocr-ios.
I download the source code as Zip on the gitHub, and then copy the TesseractOCR.framework in the product folder to my project.
from tesseract-ocr-ios.
Tesseract* tesseract = [[Tesseract alloc] initWithLanguage:@"eng"];
tesseract.delegate = self;
[tesseract setVariableValue:@"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijkhlmnopqrstuvwxyz" forKey:@"tessedit_char_whitelist"]; //limit search
UIImage* bwImage = [image blackAndWhite];
[tesseract setImage:bwImage]; //image to check
BOOL result = [tesseract recognize];
if (!result) {
[self performSelectorOnMainThread:@selector(uiShowMessage:) withObject:NSLocalizedString(@"OCRFailed", @"OCR Failed.") waitUntilDone:YES];
return;
}
if (text == nil || [text compare:@""] == NSOrderedSame) {
return;
}
from tesseract-ocr-ios.
That looks like it should work okay. I guess as a santiy check you could try using CocoaPods in your project and see if you still have the high memory usage (using the Podfile entry I mentioned above).
I can look at the memory usage for your project if you are okay with emailing me your code (kcon AT stanford DOT edu), or you can use this guide to diagnose it yourself: http://www.raywenderlich.com/23037/how-to-use-instruments-in-xcode
from tesseract-ocr-ios.
ok, thank you.
from tesseract-ocr-ios.
by the way, what is the size of your image?
from tesseract-ocr-ios.
The images I typically recognize in Tesseract are quite small because I require the user to crop what they're interested in to a small rectangle, so my image dimensions are 90 x 70 (width x height). But the Template Framework Project in this repo recognizes on this much larger image (https://raw.githubusercontent.com/gali8/Tesseract-OCR-iOS/master/Template%20Framework%20Project/Template%20Framework%20Project/image_sample.jpg), during which I don't see the kind of memory usage you are reporting.
from tesseract-ocr-ios.
But i got the same result when i launch the TemplateFramework on my iPod,after finishing recognize, the memory usage will kept at 43M.
from tesseract-ocr-ios.
from tesseract-ocr-ios.
i use the image captured by camera
from tesseract-ocr-ios.
Oh I see, sorry I misread your first post because I thought you were saying the memory was growing by that amount (uncontrollably), but you're just pointing out that it's the memory in use after Tesseract recognizes any image.
I was able to reproduce your result, and although it's unfortunate, I'm afraid this is an artifact of Tesseract in general, so there's nothing we can do about it for this wrapper library. See this issue on the main Tesseract project, where someone reports a similar static memory usage from Valgrind and also where one person comments:
"Some of the dawgs are held statically to minimize the time consumed by deleting and re-creating
apis, and memory consumed running them in parallel from multiple threads. This isn't a real leak,
and memory actually used will not grow over time as a result."
This matches the results of a profile I ran of the Template Framework project memory usage. See how the function that reads the DAWG data uses 17.78 MB all by itself:
Your first post said the app started at 12 MB and then rose to 38 MB. Well 38 - 12 = 16 which is about 17.78 MB, so I think this explains the issue.
from tesseract-ocr-ios.
One other thing worth mentioning is that the size of the DAWG is related to the specific language file you are using.
I'm assuming we both used "eng" in our tests, but I can confirm that using a custom language file (with less training data) can reduce the static memory usage. In my test just now, my custom language file (for a custom font I am recognizing) used 30 MB less memory than the "eng" language file.
from tesseract-ocr-ios.
how to make the custom language file? is there any tutorial?
from tesseract-ocr-ios.
The upstream project will be able to help you out with custom language files:
from tesseract-ocr-ios.
You may also find this tutorial useful if the font you want to train on is a font you have installed on your computer (so you can create the training document in Microsoft Word): http://michaeljaylissner.com/posts/2012/02/11/adding-new-fonts-to-tesseract-3-ocr-engine/
If the font is not one you can install on your computer, then you have to basically make the image of the training characters yourself, whether that means taking pictures of the characters or drawing the characters yourself in a drawing program.
from tesseract-ocr-ios.
I'm tagging this as "wontfix", although really it should be "can'tfix" since this is just how the upstream Tesseract library works.
from tesseract-ocr-ios.
Related Issues (20)
- .
- User-Words file not loaded: Error: failed to load user-words HOT 5
- doesn't work with 64bpp images
- pod failed HOT 1
- Does not detects images 100%
- What's means with *.cube.* ? e.g. eng.cube.fold
- Why does the testsdata folder exist in the project? HOT 2
- Why don't we upgrade to the latest Tesseract?
- cocoapods 5.0.1 error
- Xcode 12.0.1 build is failing when trying to run it on the simulator. HOT 2
- RecognitionQuestion
- Apple Silicon (arm64; M1) support HOT 2
- Thread 1: EXC_BAD_ACCESS (code=1, address=0x163940000)
- Framework not found for Mac OS HOT 1
- 你好 这句一定闪退
- PDF creation not working
- self.tesseract!.recognize() is crashing in Acuant MRZ
- How to update Tesseract version
- Target 'TesseractOCRiOS' (project 'Pods') has copy command HOT 1
- cannot load language 'eng' HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesseract-ocr-ios.