Git Product home page Git Product logo

tesseract.js's People

Contributors

antimatter15 avatar balearica avatar bijection avatar ckuetbach avatar clearlyclaire avatar dependabot[bot] avatar dna2github avatar fdawgs avatar hemanth avatar holdyourwaffle avatar imsukmin avatar jeromewu avatar loderunner avatar mackncheesiest avatar maikm3 avatar mikewesthad avatar monkeywithacupcake avatar nathanbabcock avatar nisarhassan12 avatar plantain-00 avatar reda-alaoui avatar rogerxaic avatar rowasc avatar sacramentix avatar simple7575 avatar susandoggie avatar tmcw avatar tomaszferens avatar webreflection avatar zzarcon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tesseract.js's Issues

Different results on different devices - unusable on iPhone

Hi, thanks for porting tesseract to client site OCR – awesome!

I have one serious problem. I created a simple <input type="file" capture="camera" accept="image/*"> which holds an image and tesseract.js works perfectly when running on a desktop browser.

As soon as I try to use it with the same picture on mobile safari (ios 9) the result is a horrible messy string.

I tried using jquery 1.x and 2.x, both returning the same messy result on the iphone, again same picture, same codebase works when accessing on desktop browsers

No Proper Documentation

Lack of proper documentation. Unable to use this. Please add a Documentation or a proper example so that it can be used.

Testsuite ✅

I guys, I really the idea of this library, how powerful it is and easy to use. But I was taking a look at the codebase and I realised that there's no current testsuite.

So I think will be great if we can setup an initial testsuite, we can discuss the tools to use on it, Im pretty much happy which any runner or assertion library but since this has to work both in Node.js and in the browser, maybe Karma-runner will be a good fit for it? Also a lot of developers are used to work with it and gives you some cool things out of the box :).

Also I think it's important for other developers that want to contribute to the project, because right now they don't know if their changes are actually breaking any of the current behaviours.

Later we can easily integrate the testsuite in Travis and have fast feedback on any change 🚀.

What do you think? Thanks!

Crashing on iPhone

Tesseract.js crashes on safari on the iphone 6. We suspect this is a memory issue and are looking into ways to work around it.

Does not get text from image if image contains some background objects

Im building mobile app for Motivation qoutes, where any one can add qoutes and send image link that contains qoutes in it. I used the following image.
img3

it gives me text with proper line breaks, happy to see this.
"YOU LEARNED
TO LAUGH
BEFORE
YOU LEARNED
TO TALK."

But when I use this image
img

It gives me following text.

w?-
3 <5"
I! r

  • WRI‘Cf' v EALWAvs“
    TEA WUERIGHT
    1“ CE 0N5
    *‘ A‘ £ I

Just want to ask if the lib only works with image without any background.

Add support or note about node versions

Module is only usable with node v6.8.0+ with const, let variable declaration support. A note of this should be included in the README. I'll begin working on making it compatible with other node versions using babel.

Hi

Modify the directory permission: sudo chmod 755 /media/mtp;

Ready?

Hi,

Does "Nothing to see here" in your project description mean that your project is not ready yet? If not, will this be the correct repo to watch for a version of Tesseract that will work in the browser (via Emscripten)?

Completely locally-based deployment with trained data

When trying to keep everything local, especially when no internet access is available, for the externally hosted traineddata language packs, things do not function (obviously).

If we pull the tessdata, where do we include it in order for it to be preloaded without having the call go out? There is a mention of an environment variable, (tessdata_prefix) but that appears to be a leave-over from the original c.

Where can we stick the tessdata relative and specify that in the worker.js file?

edit: Well, I've found a way via web worker messaging and attaching a new parameter (rootURL) to the base tesseractinit object, then modifying the xhr connect string to take the rooturl as the base of the path. It works really well and really makes this flexible.

Not sure if it'd be something you want me to fork and pull request back to you though.

Not seeing any results

Using the sample code provided in the readme, I'm not seeing output of any kind. The process just seems to hang.

I created a sample repo. To reproduce:

git clone https://github.com/zeke/ocr-test/
cd ocr-test
npm install
npm start

My setup:

❯ node -v
v6.8.1

❯ uname -a
Darwin C02R41WSFVH8 15.6.0 Darwin Kernel Version 15.6.0: Thu Jun 23 18:25:34 PDT 2016; root:xnu-3248.60.10~1/RELEASE_X86_64 x86_64

❯ npm ls tesseract.js
ocr-test@ /Users/zeke/Desktop/ocr-test
└── [email protected] 

PS Out of curiosity, why is node>=6.8 required?

Doesn't work for Arabic

Sample image:

image

Error:

image

Code

Tesseract.recognize(myImageA2,{lang: 'ara'})
    .then(function(result){console.log(result); alert(result.text);})

Lacking documentation

This library seems promising and very capable. The demo page is great but the documentation is severely lacking - so lacking that I've wondered if it even works.

It's a shame that such a good library risk not getting used by people since they won't understand it without looking through code.

FR: img URL

It would be useful if we added support to accept an image URL?

react-native support?

Hey guys!! I wonder if you have considered bringing support for frameworks like react-native through node. I was working on a tesseract wrapper for react-native but your lib looks much better. (Considering that now the wrapper is only implemented on android)

So, I tryed to create a test using yours but I'm getting this error

rsz_14632600_1552933568054084_273631139_o

Why my code didn't work...

My code:

const path = require('path');
const Tesseract = require('tesseract.js');
const myImage = path.resolve(__dirname, 'out.png');
Tesseract.recognize(myImage)
    .progress(message => console.log(message))
    .catch(err => console.error(err))
    .then(result => console.log(result))
    .finally(resultOrError => console.log(resultOrError))

Issues:

/Users/xuyan/Documents/my/node/login/node_modules/tesseract.js/src/index.js:15
class TesseractWorker {
^^^^^

SyntaxError: Block-scoped declarations (let, const, function, class) not yet supported outside strict mode
    at exports.runInThisContext (vm.js:53:16)
    at Module._compile (module.js:404:25)
    at Object.Module._extensions..js (module.js:432:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:313:12)
    at Module.require (module.js:366:17)
    at require (module.js:385:17)
    at Object.<anonymous> (/Users/xuyan/Documents/my/node/login/testPng.js:3:19)
    at Module._compile (module.js:425:26)
    at Object.Module._extensions..js (module.js:432:10)

Terrible performance when using a jpeg in combination with node.js

I seem to be having performance issues when using tesseract.js on node in combination with a jpeg. When I run the basic.js example using npm installed modules instead of the local modules I get pretty good results: "Benchmark took 3034.437842 miliseconds"

However as soon as I run the same example using a jpeg of 912 × 2121 px the results are very poor:
"Benchmark took 41944.719605 miliseconds"

If I run the same jpeg image in the browser example it gives the same performance-ish as the .png on node. I think it has something to do with the way loadImage is implemented in the node.js version but I haven been able to pin it down.

Edit: The delay is past the loading of the image itself and is probably related to the interprocess communication. The decoded raw array that is being sent is huge. This doesn't cause a delay in the browser however.

Bower package

Hi, why don't you create a bower.json file to use with bower ? Normaly npm packages are for backend and bower for frontend, any problem if I send a PR ?

Is handwritten supported?

I didn't see it anywhere so I tried it with an online random note and the results were... quite bad:

hand

So I suggest adding a disclaimer stating that it doesn't work with handwritten text

Circular references in result

Would be awesome if you removed the circular references in the result object. The references make it impossible to easily convert to a JSON object.

problem with Arabic recognition

So I'm using this code provided by @zeke, English text gets recognized fine but this error appears whenever I am using the Arabic recognizer.

Code

const path = require('path')
const Tesseract = require('tesseract.js')

const file = require('fs').readFileSync(path.join(__dirname, 'arabic-3.png'))

Tesseract.recognize(file, {lang: 'ara'})
  .progress(message => console.log(message))
  .catch(err => console.error(err))
  .then(result => console.log(result))

Error

/Users/macbook/Development/projects/nodejs/ocr-test/node_modules/tesseract.js/node_modules/tesseract.js-core/index.js:4
function f(a){throw a;}var h=void 0,i=!0,j=null,k=!1;function aa(){return function(){}}function ba(a){return function(){return a}}var n,Module;Module||(Module=eval("(function() { try { return TesseractCore || {} } catch(e) { return {} } })()"));var ca={},da;for(da in Module)Module.hasOwnProperty(da)&&(ca[da]=Module[da]);var ea=i,fa=!ea&&i;
              ^
abort(5) at Error
    at Error (native)
    at Na (/Users/macbook/Development/projects/nodejs/ocr-test/node_modules/tesseract.js/node_modules/tesseract.js-core/index.js:32:26)
    at ka (/Users/macbook/Development/projects/nodejs/ocr-test/node_modules/tesseract.js/node_modules/tesseract.js-core/index.js:507:108)
    at Array.LHa (/Users/macbook/Development/projects/nodejs/ocr-test/node_modules/tesseract.js/node_modules/tesseract.js-core/index.js:402:25912)
    at Qpa (/Users/macbook/Development/projects/nodejs/ocr-test/node_modules/tesseract.js/node_modules/tesseract.js-core/index.js:388:44877)
    at kpa (/Users/macbook/Development/projects/nodejs/ocr-test/node_modules/tesseract.js/node_modules/tesseract.js-core/index.js:388:23029)
    at jpa (/Users/macbook/Development/projects/nodejs/ocr-test/node_modules/tesseract.js/node_modules/tesseract.js-core/index.js:388:22303)
    at lT (/Users/macbook/Development/projects/nodejs/ocr-test/node_modules/tesseract.js/node_modules/tesseract.js-core/index.js:387:80568)
    at mT (/Users/macbook/Development/projects/nodejs/ocr-test/node_modules/tesseract.js/node_modules/tesseract.js-core/index.js:387:80700)
    at Array.BS (/Users/macbook/Development/projects/nodejs/ocr-test/node_modules/tesseract.js/node_modules/tesseract.js-core/index.js:387:69011)
If this abort() is unexpected, build with -s ASSERTIONS=1 which can give more information.

Error found manipulating PNG for #53 - Cannot enlarge memory arrays

Cannot enlarge memory arrays. Either (1) compile with -s TOTAL_MEMORY=X with X higher than the current value 100663296, (2) compile with ALLOW_MEMORY_GROWTH which adjusts the size at runtime but prevents some optimizations, or (3) set Module.TOTAL_MEMORY before the program runs.
Cannot enlarge memory arrays. Either (1) compile with -s TOTAL_MEMORY=X with X higher than the current value 100663296, (2) compile with ALLOW_MEMORY_GROWTH which adjusts the size at runtime but prevents some optimizations, or (3) set Module.TOTAL_MEMORY before the program runs.

/Test-Projects/tesseract.js/node_modules/tesseract.js-core/index.js:4 function f(a){throw a;}var h=void 0,i=!0,j=null,k=!1;function aa(){return function(){}}function ba(a){return function(){return a}}var n,Module;Module||(Module=eval("(function() { try { return TesseractCore || {} } catch(e) { return {} } })()"));var ca={},da;for(da in Module)Module.hasOwnProperty(da)&&(ca[da]=Module[da]);var ea=i,fa=!ea&&i;

abort("Cannot enlarge memory arrays. Either (1) compile with -s TOTAL_MEMORY=X with X higher than the current value 100663296, (2) compile with ALLOW_MEMORY_GROWTH which adjusts the size at runtime but prevents some optimizations, or (3) set Module.TOTAL_MEMORY before the program runs.") at Error at Error (native) at Na (/Test-Projects/tesseract.js/node_modules/tesseract.js-core/index.js:32:26) at ka (/Test-Projects/tesseract.js/node_modules/tesseract.js-core/index.js:507:108) at Function.pb (/Test-Projects/tesseract.js/node_modules/tesseract.js-core/index.js:12:26) at vd (/Test-Projects/tesseract.js/node_modules/tesseract.js-core/index.js:331:190) at UFa (/Test-Projects/tesseract.js/node_modules/tesseract.js-core/index.js:396:56010) at WEa (/Test-Projects/tesseract.js/node_modules/tesseract.js-core/index.js:396:39452) at Gra (/Test-Projects/tesseract.js/node_modules/tesseract.js-core/index.js:388:78184) at Mpa (/Test-Projects/tesseract.js/node_modules/tesseract.js-core/index.js:388:42487) at Rpa (/Test-Projects/tesseract.js/node_modules/tesseract.js-core/index.js:388:45819) If this abort() is unexpected, build with -s ASSERTIONS=1 which can give more information.

CDN Tesseract 1.0.8 not found

I copied the <script src='https://cdn.rawgit.com/naptha/tesseract.js/1.0.8/dist/tesseract.js'></script> from the README but it threw error 404. I couldn't access the file neither via wget or my browser.

What helps is if you change the 1.0.8 part to 0.2.0 which should be correct version of tesseract.js according to your releases. So to fix this issue I propose either add correct file so the url works or change the readme to 0.2.0 url.

Working URL: https://cdn.rawgit.com/naptha/tesseract.js/0.2.0/dist/tesseract.js

Debugger port conflict when you fork children

I'm debugging my app and get a EADDRINUSE error when Tesseract.recognize is called. I hunted this down and I think its because you don't specify execArgv. See the following issue others have with debugging and forking in general:

nodejs/node#3469

As it stands, I can't debug to inspect the result.

console.debug not in NodeJS

node v6.8.1 supports:

Console {
  log: [Function: bound ],
  info: [Function: bound ],
  warn: [Function: bound ],
  error: [Function: bound ],
  dir: [Function: bound ],
  time: [Function: bound ],
  timeEnd: [Function: bound ],
  trace: [Function: bound trace],
  assert: [Function: bound ],
  Console: [Function: Console] }

Tesseract fails on line 123 when it attempts to call the debug method.

var Tesseract = require('tesseract.js');
var myImage = './screenshot_05.png';
Tesseract.detect(myImage)
    .progress(function(message){
        console.log('progress is: ', message)
    });
 ~/code/tesseract_test $ node index.js
progress is:  { status: 'loading tesseract core' }
progress is:  { status: 'loaded tesseract core' }
progress is:  { status: 'initializing tesseract', progress: 0 }
pre-main prep time: 67 ms
progress is:  { status: 'initializing tesseract', progress: 1 }
progress is:  { status: 'loading osd.traineddata', progress: 1 }
Number of blobs post-filtering = 91
Number of blobs to try = 91
/Volumes/HOME/Users/pj/code/tesseract_test/node_modules/tesseract.js/src/index.js:123
            if(this._resolve.length === 0) console.debug(data);
                                                   ^

TypeError: console.debug is not a function
    at TesseractJob._handle (/Volumes/HOME/Users/pj/code/tesseract_test/node_modules/tesseract.js/src/index.js:123:43)
    at TesseractWorker._recv (/Volumes/HOME/Users/pj/code/tesseract_test/node_modules/tesseract.js/src/index.js:71:21)
    at ChildProcess.<anonymous> (/Volumes/HOME/Users/pj/code/tesseract_test/node_modules/tesseract.js/src/node/index.js:14:18)
    at emitTwo (events.js:106:13)
    at ChildProcess.emit (events.js:191:7)
    at process.nextTick (internal/child_process.js:744:12)
    at _combinedTickCallback (internal/process/next_tick.js:67:7)
    at process._tickCallback (internal/process/next_tick.js:98:9)

DIdnt work for me

<html>
<head>

</head>
<body>
<img src="dream.jpg" class="to_ocr">
<div class="prog"><div>
<canvas class="display"></canvas>
<script src="http://tenso.rs/tesseract.js"></script>
<script src="https://code.jquery.com/jquery-2.2.1.min.js"></script>

<script>
    var img = $('body').find(".to_ocr");
    Tesseract.recognize( img, { progress: 'prog', lang: 'eng'} )
        .then( 'display' );
</script>
</body>
</html>

getting:
Uncaught (in promise) DOMException: Failed to execute 'postMessage' on 'Worker': An object could not be cloned.

Different versions on GitHub and npm?

Hi all,

I'm Peter from @cdnjs, I'm going to host this awesome js tool on cdnjs.com, but just found that the versioning is v1.0.x on npm but v0.x.0 on GitHub, do you have any suggestion about which one should I use?

Thanks,
Peter

Incorrectly work recognize function in polish language

Incorrectly work recognize function in polish language.

My test image:

tess

Tesseract-text (marked with bad words, only a few of the beginning)

Śród takrcn pól pned raty nad brzegrem ruczaru
Na pagórku mewrerkrm, we brzozowym garu
Sial dwór szlachecki, z dnewa, recz podmurawany
Śwrecrny się z daleka panrelane ściany
Tym brelsze ze odbrte od cremner zrelem
Tupoh, co gu bronią od wiatrów Jesrem
Dom mreszkalny ruewrelkrlecz zewsząd cnęaogrI stodołę mar wrerka r plzy me] trzy stogr\n\n

Tesseract recognize, differs significantly from the text in the image. The letters are converted to the other by what words does unintelligible string.

zrzut ekranu z 2016-10-23 12-30-38

add beerpay badge

by adding beerpay's badge, people will be know how to support this great project easiest.
thanks for this great work guys!
cheers!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.