fitraditya / node-pdf2img Goto Github PK
View Code? Open in Web Editor NEWA nodejs module for converting pdf into image file
License: MIT License
A nodejs module for converting pdf into image file
License: MIT License
I am trying to use live url of PDF. it is showing file not found
When I try to convert two PDFs (multiple pages) at the same time, only one file gets converted successfully, and the other only gets the same response as first (instead of its own pages conversion).
So basically, say a file A.pdf (3 pages) and B.pdf (5 pages) are simultaneously converted. A.pdf gets converted into say: A_1.jpg, A_2.jpg, A_3.jpg. There is no images for B.pdf. But both instances gets the same response as: A_1.jpg, A_2.jpg, A_3.jpg.
<--- Last few GCs --->
11144 ms: Scavenge 969.5 (1008.3) -> 969.5 (1008.3) MB, 0.3 / 0 ms (+ 2.2 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
11256 ms: Mark-sweep 969.5 (1008.3) -> 585.1 (625.8) MB, 112.6 / 0 ms (+ 2.4 ms in 2 steps since start of marking, biggest step 2.2 ms) [last resort gc].
11329 ms: Mark-sweep 585.1 (625.8) -> 585.1 (623.8) MB, 72.9 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x298005e3ac1
2: /* anonymous / [/Documents/Nodedemo/pdf2image/node_modules/pdf2img/lib/pdf2img.js:~51] [pc=0x279bdc3b082](this=0x29800504101 ,pageCount=0x6097fb75189 <Number: 1.41414e+27>,callback=0x6097fb750f9 <JS Function %28SharedFunctionInfo 0x6097fb200a1%29)
4: / anonymous */ [/Documents/Nodedemo/pdf2image/node_modules/async/lib/async.js:638] [pc...
FATAL ERROR: invalid array length Allocation failed - process out of memory
Abort trap: 6
Under NodeJs v0.10.36
Error: [Error: File is not a PDF]
[email protected] test /[MY_PATH]/node-pdf2img
./node_modules/.bin/mocha --reporter spec
Split and covert pdf into images
/[MY_PATH]/node-pdf2img/test/test.pdf
[Error: File is not a PDF]
0 passing (2m)
2 failing
Split and covert pdf into images Create png files:
Error: timeout of 60000ms exceeded. Ensure the done() callback is being called in this test.
Split and covert pdf into images Create jpg files:
Error: timeout of 60000ms exceeded. Ensure the done() callback is being called in this test.
npm ERR! weird error 2
npm ERR! not ok code 0
Could you please fix that ?
I was having an issue with AWS Lambda where the ImageMagick line (72-73) was failing while trying to do a pagecount:
gm(input).identify("%p ", function (err, value) {
var pageCount = String(value).split(' ');
I removed ImageMagick (I also removed the import) and replaced line 72/73 with:
gs()
.executablePath('lambda-ghostscript/bin/./gs')
.input(input)
.pagecount(function (err, pageCount) {
and now it runs ok again
I just started using this to convert pdf to images, but currently I am having problem with this error.
/bin/sh: identify: command not found
child_process.js:508
throw err;
^
Error: Command failed: identify -format %n upload/2117c56efbd65908ae368201ec6517651468295928045.pdf
/bin/sh: identify: command not found
I've installed xpdf, using node v4.4.7 and trying to use local pdf file. Is there any requirements like imagemagick
?
Does not work when the pdf path or name contains spaces in it.
For directory the walk around would be simply
dir.split(' ').join('\\ ')
but same trick won't work when the filename contains a space.
I've just installed pdf2img on my iMAC:
Running: macOS Sierra version 10.12.4
Node version 7.8.0
I'm trying to write a robust PDF to text package, I first use pdf2json to parse the file, if it contains no text nodes with content I then try the following:
pdf2img = require("pdf2img")
var strFile = __dirname + "/" + aryOptions[0];
console.log(strFile);
pdf2img.setOptions({type:"png"
,size:8192
,density:600
,outputdir:__dirname + "/output"
,targetname:"pdf"});
pdf2img.convert(strFile, function(err, info) {
if ( err ) {
console.log(err);
} else {
console.log(info);
}
});
Unfortunately this fails with:
/bin/sh: identify: command not found
child_process.js:524
throw err;
^
Error: Command failed: identify -format %n /docfire/EON.pdf
/bin/sh: identify: command not found
at checkExecSyncError (child_process.js:481:13)
at execSync (child_process.js:521:13)
at async.waterfall.pages (/docfire/node_modules/pdf2img/lib/pdf2img.js:47:32)
at fn (/docfire/node_modules/async/lib/async.js:638:34)
at Immediate.<anonymous> (/docfire/node_modules/async/lib/async.js:554:34)
at runCallback (timers.js:672:20)
at tryOnImmediate (timers.js:645:5)
at processImmediate [as _immediateCallback] (timers.js:617:5)
[Edit] I have made some progress...after installing the dependencies which I missed initially, the output is now:
execvp failed, errno = 2 (No such file or directory)
gm identify: "gs" "-q" "-dBATCH" "-dSAFER" "-dMaxBitmap=50000000" "-dNOPAUSE" "-sDEVICE=ppmraw" "-dTextAlphaBits=4" "-dGraphicsAlphaBits=4" "-r72x72" "-sOutputFile=/var/folders/q7/hnxl054d71q277m5c3j25vdh0000gn/T/gmretZGf" "--" "/var/folders/q7/hnxl054d71q277m5c3j25vdh0000gn/T/gmdtH8SW" "-c" "quit".
gm identify: Postscript delegate failed (/docfire/EON.pdf).
gm identify: Request did not return an image.
child_process.js:524
throw err;
^
Error: Command failed: gm identify -format %n /docfire/EON.pdf
execvp failed, errno = 2 (No such file or directory)
gm identify: "gs" "-q" "-dBATCH" "-dSAFER" "-dMaxBitmap=50000000" "-dNOPAUSE" "-sDEVICE=ppmraw" "-dTextAlphaBits=4" "-dGraphicsAlphaBits=4" "-r72x72" "-sOutputFile=/var/folders/q7/hnxl054d71q277m5c3j25vdh0000gn/T/gmretZGf" "--" "/var/folders/q7/hnxl054d71q277m5c3j25vdh0000gn/T/gmdtH8SW" "-c" "quit".
gm identify: Postscript delegate failed (/docfire/EON.pdf).
gm identify: Request did not return an image.
at checkExecSyncError (child_process.js:481:13)
at execSync (child_process.js:521:13)
at async.waterfall.pages (/docfire/node_modules/pdf2img/lib/pdf2img.js:47:32)
at fn (/docfire/node_modules/async/lib/async.js:638:34)
at Immediate.<anonymous> (/docfire/node_modules/async/lib/async.js:554:34)
at runCallback (timers.js:672:20)
at tryOnImmediate (timers.js:645:5)
at processImmediate [as _immediateCallback] (timers.js:617:5)
I've checked both the path and the file name, both are correct and exist.
After searching around I found that I had to also install:
brew install ghostscript
I am a beginner plz help me.
C:\Users\kaushal-pc\Desktop\nodejs\pdf2img>node pdf2img.js
'gm' is not recognized as an internal or external command,
operable program or batch file.
child_process.js:533
throw err;
^
Error: Command failed: gm identify -format "%p " "C:\Users\kaushal-pc\Desktop\nodejs\pdf2img/bharati.pdf"
'gm' is not recognized as an internal or external command,
operable program or batch file.
at checkExecSyncError (child_process.js:490:13)
at execSync (child_process.js:530:13)
at C:\Users\kaushal-pc\Desktop\nodejs\node_modules\pdf2img\lib\pdf2img.js:72:23
at fn (C:\Users\kaushal-pc\Desktop\nodejs\node_modules\pdf2img\node_modules\async\lib\async.js:638:34)
at Immediate.<anonymous> (C:\Users\kaushal-pc\Desktop\nodejs\node_modules\pdf2img\node_modules\async\lib\async.js:554:34)
at runCallback (timers.js:651:20)
at tryOnImmediate (timers.js:624:5)
at processImmediate [as _immediateCallback] (timers.js:596:5)
==========================pdf2img.js============================
var fs = require('fs');
var path = require('path');
var pdf2img = require('pdf2img');
var input = __dirname+'/bharati.pdf';
pdf2img.setOptions({
type:'png',
size:1024,
denesity:600,
output:'testx'
});
pdf2img.convert(input,function (err, info) {
if(err)
console.log(err);
else
console.log(info);
});
Does this library support stream processing?
I like to use it in combination with multer s3 stream upload.
I'm on Ubuntu, with xpdf installed. PDF converts but the images isn't working.
var express = require('express');
var router = express.Router();
var multer = require('multer');
var storage = multer.diskStorage({
destination: function (req, file, cb) {
cb(null, 'public/uploads/');
},
filename: function (req, file, cb) {
cb(null, file.originalname);
}
});
var upload = multer({ storage: storage });
var path = require("path");
var fs = require('fs');
var pdf2img = require('pdf2img');
/* POST upload. */
router.post('/', upload.single('pdfGabarito'), function(req,res,next){
console.log(req.file);
var input = path.join(__dirname, '../public/uploads/',req.file.originalname);
//var input = __dirname + '/../public/uploads/'+req.file.originalname;
console.log(input);
pdf2img.setOptions({
type: 'png', // png or jpeg, default png
size: 1024, // default 1024
density: 300, // default 600
outputdir: path.join(__dirname, '../public/uploads/'+req.file.originalname.split('.')[0]+'/'), // mandatory, outputdir must be absolute path
targetname: req.file.originalname.split('.')[0] // the prefix for the generated files, optional
});
pdf2img.convert(input, function(err, info) {
if (err) console.log(err)
else console.log(info);
});
res.redirect('/');
});
When i try to convert a pdf into a png or jpeg the result is an empty image but no errors in the console. Someone has the same problem or has a solution? thx.
Line 81 in lib/pdf2img.js :
if (options.page < pageCount.length) {
SHOULD BE
if (options.page <= pageCount.length) {
Those changes (the merged pull request) doesn't arrive here ... and they are not contained in Tag 0.1.1
Could you move those changes from issue#1 to Tag 0.1.1 or create a new one ?
Thank you!
my node version is 7.3
what I want to do is convert pdf file to png file
the pdf file inside only have one picture
and the problem is when I run convert method
is not success every time, and the convert callback never working
both success and failed are not, but sometime I can get the right png file.
I try to check the code
and I guess that because the "gm" module "write" method have problem
I try put console.log in write method callback function, and it never work
so that why I guess. any body had same problem ?
I try to convert a 42 pages pdf and It kill my laptop. I think that can be fix it if we add a property to limit the max quantity pages that can be process at time.
I put the about 5.4mb pdf convert to png is work. But I put 6.1mb or above pdf is no any response.
Thanks for this, it works just fine if you follow the instructions for install!
Also note that ImageMagick does NOT work on other platforms than Linux (it works fine in Ubuntu subsystem on Windows 10 though).
I'd like to request a feature to return the created image as base64 instead of saving it to file on disk.
E.g.:
{ result: 'success',
message:
[ { page: 1,
name: 'test_1',
size: 17.275,
content: '/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z'
},
{ page: 2,
name: 'test_2',
size: 24.518,
content: '/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z'
}
]
}
when I test the example. throw the error. Error: Command failed: gm identify -format "%p " "/Users/renzhiwen/Desktop/node_modules/pdf2img/test/test.pdf".
Hi,i'm using this package and i got the error like this:
F:\parseWord>node pdf2img.js
'identify' �����ڲ����ⲿ���Ҳ���ǿ����еij���
�������ļ�
child_process.js:508
throw err;
^
Error: Command failed: identify -format %n tmp/testpdf.pdf
'identify' �����ڲ����ⲿ���Ҳ���ǿ����еij���
���������ļ���
at checkExecSyncError (c
```hild_process.js:465:13)
at execSync (child_process.js:505:13)
at async.waterfall.pages (F:\parseWord\node_modules\pdf2img\lib\pdf2img.js:47:32)
at fn (F:\parseWord\node_modules\pdf2img\node_modules\async\lib\async.js:638:34)
at Immediate._onImmediate (F:\parseWord\node_modules\pdf2img\node_modules\async\lib\async.js:554:34)
at processImmediate [as _immediateCallback] (timers.js:383:17)
What can i do to figure out this problem?
Hi,
I tried using id on a free tier ec2 instance and everything is stuck.
What are the basic requirements for using this library?
Thanks
There seem to be a whole bunch of people experiencing the same problem, in that PDF's will not be converted to images.
I have a bunch of 0 byte png files and the error:
/docfire/node_modules/gm/lib/command.js:228
proc.stdin.once('error', cb);
^
TypeError: Cannot read property 'once' of undefined
at gm._spawn (/docfire/node_modules/gm/lib/command.js:228:15)
at /docfire/node_modules/gm/lib/command.js:140:19
at series (/docfire/node_modules/array-series/index.js:11:36)
at gm._preprocess (/docfire/node_modules/gm/lib/command.js:177:5)
at gm.stream (/docfire/node_modules/gm/lib/command.js:138:10)
at convertPdf2Img (/docfire/node_modules/pdf2img/lib/pdf2img.js:92:6)
at /docfire/node_modules/pdf2img/lib/pdf2img.js:67:9
at /docfire/node_modules/async/lib/async.js:246:17
at /docfire/node_modules/async/lib/async.js:122:13
at _each (/docfire/node_modules/async/lib/async.js:46:13)
I've modified command.js inserting:
if ( !(typeof proc == "object"
&& typeof proc.stdin == "object"
&& typeof proc.stdin.once == "function") ) {
return cb(new Error("imageMagick, WTF is going on?"))
}
Just before:
proc.stdin.once('error', cb);
Now the error and exception is:
events.js:163
throw er; // Unhandled 'error' event
^
Error: imageMagick, WTF is going on?
at gm._spawn (/docfire/node_modules/gm/lib/command.js:231:13)
at /docfire/node_modules/gm/lib/command.js:140:19
at series (/docfire/node_modules/array-series/index.js:11:36)
at gm._preprocess (/docfire/node_modules/gm/lib/command.js:177:5)
at gm.stream (/docfire/node_modules/gm/lib/command.js:138:10)
at convertPdf2Img (/docfire/node_modules/pdf2img/lib/pdf2img.js:92:6)
at /docfire/node_modules/pdf2img/lib/pdf2img.js:67:9
at /docfire/node_modules/async/lib/async.js:246:17
at /docfire/node_modules/async/lib/async.js:122:13
at _each (/docfire/node_modules/async/lib/async.js:46:13)
From all those struggling with the same problem, you might want to try:
https://www.npmjs.com/package/pdf-image
Much easier, and it works!
Is there a way to do this? I've tried passing an array and a comma-separated string, but I get a page error. Would be a useful feature when only a few pages and needed out of a large pdf (~100 page pfd for me, need 25 pages)
Hi is there a way to pass a parameter to just use a specific page?
I only need 1 image from 1 page.
Thanks in advance.
Hello :)
Please republish the latest code to the npm package. It seems that the version on the npm is not the latest.
e.g.
github - lib/pdf2img.js
line 70 uses gm
library
npm - lib/pdf2img.js
line 70 uses terminal command for gm
Encountered this error while trying to run in my system. Is there a solution for this issue?
`events.js:141
throw er; // Unhandled 'error' event
^
Error: spawn pdfinfo ENOENT
at exports._errnoException (util.js:870:11)
`
Hey, everything works but i have problem with async mode when i try to convert multiple files.
I'm having this but it converts all pdf at the same time and give a bad result (img from differents pdf in folder) :
`
traitementPDF = function(fichier, callback){
pdf2img.setOptions({
type: 'png', // png or jpg, default jpg
size: 1480, // default 1024
density: 600, // default 600
outputdir: fichier[1], // output folder, default null (if null given, then it will create folder name same as file name)
outputname: fichier[0] // output file name, dafault null (if null given, then it will create image name same as input name)
});
pdf2img.convert(fichier[2], function(err, info) {
if (err){
console.log(err)
callback(false)
}
else {
console.log(info);
callback(true)
}
});
}`
and
uploadedFiles.forEach(function(fichier){ traitementPDF(fichier, function(result){ console.log("Ouvrage: " + fichier[0] + "Repertoire: " + fichier[1] + "\nChemin: " + fichier[2]); console.log("Retour de callback: " + result) }); });
thanks
Why do I get this error?
[Error: Command failed: gm identify -format "%p "
I have gm and gs packages installed.
I am running locally on a Windows machine, but istalling GraphicsMajik's exe did not help.
Any suggestions? Ultimately my goal is to get it running in an Azure Function.
Thanks,
Donnie
Somehow if i invoke the program directly like node app.js
, it works fine.
But if i start it with a process manager like pm2, it says that the native dependency gm is not found.
Any idea why ? Any knows workarounds ?
I have found a files that fails conversion
TypeError: Cannot set property 'page' of null
at /[app_path]/node_modules/pdf2img/lib/pdf2img.js:66:23
If I echo out error right after convertPdf2Img has execute it show that the "datasize < 127" check fails, thus passing an error back, but the error is unhandled and crashes the server.
Need some kind of error handling wrapper around
result.page = page; // <-- This fails as result is not set when an error occures in convertPdf2Img
callbackmap(null, result);
Problem example basic document converted from Word, attached
1455011779_56191.pdf
)
Every time an error is passed from convertPdf2Img it causes this unhandled exception to crash the server
Pointers as to what is wrong with the file are also welcome :)
Hi, im gettin this error even when i try "matteocontrin" solution...
Uncaught Error: Command failed: identify -format %n C:/Users/Username/Desktop/prueba.pdf
"identify" no se reconoce como un comando interno o externo,
programa o archivo por lotes ejecutable.
Can u help me?
@fitraditya thank you so much for this module. I wanted to know which exact versions of GM and IM should I be using. I'm using it on heroku's node app which runs on Linux 14.04. Would you be kind enough to specify a buildpack for GM and IM with the appropriate version or the correct versions of gm and IM. Much appreciated. Thanks!
Node: v5.10.0
System: Yosemite
pdf2img: 0.1.2
code:
var target_path = path.resolve(__dirname, '../', '../', '../', 'builds/', 'development/', 'upload/', 'images/');
var input = req.files.file.path;
console.log('input: ' + input);
console.log('target_path: ' + target_path);
pdf2img.setOptions({
type: 'png', // png or jpeg, default png
size: 1024, // default 1024
density: 600, // default 600
outputdir: target_path
});
pdf2img.convert(input, function(info) {
console.log(info);
});
Error:
Uncaught Exception:
TypeError: Bad argument
at TypeError (native)
at ChildProcess.spawn (internal/child_process.js:274:26)
at exports.spawn (child_process.js:343:9)
at PDF.exec (/app/node_modules/pdfinfo/lib/pdfinfo.js:62:17)
at async.waterfall.pages (/app/node_modules/pdf2img/lib/pdf2img.js:43:18)
at fn (/app/node_modules/pdf2img/node_modules/async/lib/async.js:638:34)
at Immediate._onImmediate (/app/node_modules/pdf2img/node_modules/async/lib/async.js:554:34)
Logs:
input: /app/builds/development/upload/images/e874b364713af5f7f7f1bc8e8a924fcb.pdf
target_path: /app/builds/development/upload/images
As seen in the code, the output filename is hard-wired to "test_" (+pagenumber).
This is somewhat inconvenient and should be configurable in the output options.
This feature is listed in the Todo's
I added callbackreturn(null, {result: 'page_processed', message: result})
inside the async.eachSeries
loop.
This callback would be useful for those who want a server to provide progress feedback to someone uploading a pdf via web system.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.