Git Product home page Git Product logo

pdf-img-convert.js's Introduction

pdf-img-convert.js

A pure javascript package to convert a PDF into images

This package is powered mainly by Mozilla's PDF.js

Motivation

There are a lot of solutions for converting PDFs with javascript already but they all make excessive use of the filesystem in the form of temporary files and use non-native binaries like ghostscript.

This solution solely uses javascript arrays, cleaning up the pipeline significantly and (hopefully) making it faster.

Installation

npm install pdf-img-convert

Usage

The package returns an Array of Uint8Array objects, each of which represents an image encoded in png format.

Here are some examples of its usage - obviously import the module first:

var pdf2img = require('pdf-img-convert');

The package has 1 function - convert. It accepts the following pdf formats as input:

  • URL of a PDF (e.g. www.example.com/a.pdf)

  • Path to a local pdf file (e.g. ../example.pdf)

  • A Buffer object containing PDF data

  • A Uint8Array object containing PDF data

  • Base64-encoded PDF data

NB: it is an asynchronous function so returns a promise object.

The output can be manipulated using the conversion_config argument mentioned below.

Here's an example of how to use it in synchronous code:

// Both HTTP and local paths are supported
var outputImages1 = pdf2img.convert('http://www.example.com/pdf_online.pdf');
var outputImages2 = pdf2img.convert('../pdf_in_local_filesystem.pdf');

// From here, the images can be used for other stuff or just saved if that's required:

var fs = require('fs');

outputImages1.then(function(outputImages) {
    for (i = 0; i < outputImages.length; i++)
        fs.writeFile("output"+i+".png", outputImages[i], function (error) {
          if (error) { console.error("Error: " + error); }
        });
    });

It's a lot easier and cleaner to implement inside an async function using await:

(async function () {
  pdfArray = await pdf2img.convert('http://www.example.com/pdf_online.pdf');
  console.log("saving");
  for (i = 0; i < pdfArray.length; i++){
    fs.writeFile("output"+i+".png", pdfArray[i], function (error) {
      if (error) { console.error("Error: " + error); }
    }); //writeFile
  } // for
})();

There is also an optional second conversion_config argument which accepts an object like this:

{
  width: 100 //Number in px
  height: 100 // Number in px
  page_numbers: [1, 2, 3] // A list of pages to render instead of all of them
  base64: True,
  scale: 2.0
}

(Any of these attributes can be omitted from the object - they're all optional)

  • width or height control the scale of the output images - One or the other, it ignores height if width is supplied too.

  • page_numbers controls which pages are rendered - pages are 1-indexed.

  • base64 should be set to true if a base64-encoded image output is required. Otherwise it'll just output an array of Uint8Arrays.

  • scale is the viewport scale ratio, which defaults to 1 (original width and height).

Contributing

If you'd like to contribute, please do submit a pull request!

Once you've finalised your changes, please include a summary of these changes under the [Unreleased] section of CHANGELOG.md in this format.

pdf-img-convert.js's People

Contributors

ol-th avatar imrdjai avatar github-actions[bot] avatar dependabot[bot] avatar sahilnarain avatar rickyl2 avatar tmoran-stenoa avatar

Stargazers

Alexander Niebuhr avatar Vladimir Daskalov avatar Dominik avatar 森尹 avatar lucas gelfond avatar Redek-dp avatar Fred Camacho avatar SEOKWON HONG avatar  avatar 王叨叨 avatar Robert Marshall Adams avatar Neto Jocelino avatar Thierry Santos avatar Akecel avatar Marcos Viana avatar  avatar Michael Connor avatar Corey Damocles avatar Arpad Gabor avatar xiaohou avatar  avatar Bruce avatar ANSIMA Elvis avatar  avatar Nelson Zheng avatar clutchJoe avatar Lock Phoorichet avatar Oskar Boëthius Lissheim avatar Gabriel Pozzi avatar Brett Klassen avatar gsemy avatar Christophe Hache avatar  avatar 曾浩 avatar Andreas Steinkellner avatar Roman Pavlovskyi avatar David Fernando avatar João Paulo avatar Albert Kim avatar Rupert avatar João Palmeiro avatar Nektarios Konstantinidis avatar Albrecht avatar Yaozhi Wang avatar  avatar Imad abdulkarim avatar  avatar Omar Faruk avatar  avatar reckertz avatar 余森 avatar Levi.Lu avatar Leo Alho avatar  avatar William Burns avatar  avatar Tuclicks inc. avatar Tu Nguyen avatar Fityan avatar Czar Pino avatar trungta avatar grantzile avatar  avatar Vedant Kulkarni avatar Josh avatar Tyler M. Neher avatar Kei Ohtani avatar Nick Mazuk avatar Surya Sanchez avatar Masoud H avatar Gulam Shakir avatar  avatar Cabber avatar Sevil Rasulova avatar Amine Elameri avatar Marek Lisy avatar Sahand avatar Cristopher avatar dev-guf avatar Dipesh Acharya avatar Bogdan avatar Johnny Hauser avatar Laurence Louis Trippen avatar Marcus Vinicius Monteiro de Souza avatar Stupid Dev avatar Roman Hossain Shaon avatar  avatar Dash avatar Christopher Dosin avatar fanxing avatar  avatar SeongjaeMoon avatar Renan Paixão avatar Yevhenii Hyzyla avatar Sammy avatar Mike Bieronski avatar Martin avatar Aaron Leopold avatar Igor avatar  avatar

Watchers

 avatar  avatar

pdf-img-convert.js's Issues

Dependencies are too large

I tried to use the package in an AWS Lambda but couldn't deploy it because, when extracted, function code combined with layers exceeds the maximum allowed size.
I've checked that after installation of this package ./node_modules/ grows by more than 120Mb, which seems quite a lot.
Is it possible to reduce it?

TS compabilty -> .then does not exist | not so useless - useless await

First example definitly not running with ts:
Property 'then' does not exist on type '(string | Uint8Array)[]'.ts(2339)

Second example runs, once we enter some var declarations:
'await' has no effect on the type of this expression.ts(80007)

So here my little fix to your d.ts file:
Line 13 -> ): Promise<string[]|Uint8Array[]>

As you correctly mentioned your async function returns a resolveable Promise, we have to tell that ts.

Thanks for your work and sharing!

memory leak problem with larger in size or the files with many page

<--- Last few GCs --->

[55692:0x6b35cb0] 18711 ms: Mark-sweep (reduce) 29.5 (49.9) -> 29.4 (32.6) MB, 22.0 / 0.0 ms (average mu = 0.892, current mu = 0.001) external memory pressure GC in old space requested
[55692:0x6b35cb0] 18726 ms: Mark-sweep (reduce) 29.4 (32.6) -> 29.3 (31.9) MB, 14.7 / 0.0 ms (average mu = 0.818, current mu = 0.003) external memory pressure GC in old space requested

<--- JS stacktrace --->

FATAL ERROR: v8::ArrayBuffer::New Allocation failed - process out of memory
1: 0xb06730 node::Abort() [node]
2: 0xa1b6d0 [node]
3: 0xce1e60 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
4: 0xce2207 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
5: 0xce22eb [node]
6: 0xcf2f25 [node]
7: 0x7fd851c39634 Context2d::GetImageData(Nan::FunctionCallbackInfov8::Value const&) [path_to_project/node_modules/canvas/build/Release/canvas.node]
8: 0x7fd851c2ae58 [path_to_project/node_modules/canvas/build/Release/canvas.node]
9: 0xd3e3ce [node]
10: 0xd3f7ef v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x15da239 [node]
Aborted (core dumped)

Converts PDF without Fonts/Text

I'm trying to convert some shipping labels to png, it converts the barcodes and images, but no text/fonts. I already installed Font fix but it doesn't works.

Fillable PDF Forms

Loving the package - the only issue I've ran into so far is form fields that are filled out in a PDF are converted into blank fields after the PDF is converted into an image.

I was wondering if anyone knows of any fixes or work arounds for this - or if I'm just doing something wrong lol - thanks in advance!

Error

Anyone with this error?
I'm using Nodejs 16.

UnhandledRejection: {(intermediate value)(intermediate value)(intermediate value)} is not a function
at processTicksAndRejections (node:internal/process/task_queues:96:5)

invalid ELF header

I'm running into this issue after installing the module. It's crashing my docker container. Not sure if this error output is useful to you... also not sure what exactly is the issue here as I'm not too familiar with developing node modules. :P

payoff_1      | internal/modules/cjs/loader.js:1194
payoff_1      |   return process.dlopen(module, path.toNamespacedPath(filename));
payoff_1      |                  ^
payoff_1      | 
payoff_1      | Error: /app/node_modules/canvas/build/Release/canvas.node: invalid ELF header
payoff_1      |     at Object.Module._extensions..node (internal/modules/cjs/loader.js:1194:18)
payoff_1      |     at Module.load (internal/modules/cjs/loader.js:993:32)
payoff_1      |     at Function.Module._load (internal/modules/cjs/loader.js:892:14)
payoff_1      |     at Module.require (internal/modules/cjs/loader.js:1033:19)
payoff_1      |     at require (internal/modules/cjs/helpers.js:72:18)
payoff_1      |     at Object.<anonymous> (/app/node_modules/canvas/lib/bindings.js:3:18)
payoff_1      |     at Module._compile (internal/modules/cjs/loader.js:1144:30)
payoff_1      |     at Object.Module._extensions..js (internal/modules/cjs/loader.js:1164:10)
payoff_1      |     at Module.load (internal/modules/cjs/loader.js:993:32)
payoff_1      |     at Function.Module._load (internal/modules/cjs/loader.js:892:14)
payoff_1      |     at Module.require (internal/modules/cjs/loader.js:1033:19)
payoff_1      |     at require (internal/modules/cjs/helpers.js:72:18)
payoff_1      |     at Object.<anonymous> (/app/node_modules/canvas/lib/canvas.js:9:18)
payoff_1      |     at Module._compile (internal/modules/cjs/loader.js:1144:30)
payoff_1      |     at Object.Module._extensions..js (internal/modules/cjs/loader.js:1164:10)
payoff_1      |     at Module.load (internal/modules/cjs/loader.js:993:32)
payoff_1      |     at Function.Module._load (internal/modules/cjs/loader.js:892:14)
payoff_1      |     at Module.require (internal/modules/cjs/loader.js:1033:19)
payoff_1      |     at require (internal/modules/cjs/helpers.js:72:18)
payoff_1      |     at Object.<anonymous> (/app/node_modules/canvas/index.js:1:16)
payoff_1      |     at Module._compile (internal/modules/cjs/loader.js:1144:30)
payoff_1      |     at Object.Module._extensions..js (internal/modules/cjs/loader.js:1164:10)
payoff_1      | [nodemon] app crashed - waiting for file changes before starting...

Does not work with electron

Description

I like the idea of this lib, and I was surprised to see someone else working on the same thing few hours ago 😄

Apparently the dependence to canvas caused issues with other flavor of JS like electron
Seems like only node binding are managed

Error: The module '\\reader\node_modules\canvas\build\Release\canvas.node'
was compiled against a different Node.js version using
NODE_MODULE_VERSION 72. This version of Node.js requires
NODE_MODULE_VERSION 76. Please try re-compiling or re-installing
the module (for instance, using `npm rebuild` or `npm install`).
    at process.func [as dlopen] (electron/js2c/asar.js:140:31)
    at Object.Module._extensions..node (internal/modules/cjs/loader.js:1016:18)
    at Object.func [as .node] (electron/js2c/asar.js:140:31)
    at Module.load (internal/modules/cjs/loader.js:816:32)
    at Module._load (internal/modules/cjs/loader.js:728:14)
    at Function.Module._load (electron/js2c/asar.js:748:26)
    at Module.require (internal/modules/cjs/loader.js:853:19)
    at require (internal/modules/cjs/helpers.js:74:18)
    at Object.<anonymous> (\reader\node_modules\canvas\lib\bindings.js:3:18)
    at Module._compile (internal/modules/cjs/loader.js:968:30)

Does this work in Next js?

I can get this working in node, but when I try to convert it to a Next js 14 server action, I get this error:

⨯ node_modules\pdf-img-convert\node_modules\pdfjs-dist\legacy\build\pdf.js (5903:35) @ eval
⨯ Error: Setting up fake worker failed: "Cannot find module './pdf.worker.js'

`Invalid PDF structure.` with `node-latex` generated contents

Expected: An image buffer to be created, got:

/path/to/node_modules/pdf-img-convert/node_modules/pdfjs-dist/legacy/build/pdf.js:543
  BaseException.prototype = new Error();
                            ^
Error
    at BaseExceptionClosure (/path/to/node_modules/pdf-img-convert/node_modules/pdfjs-dist/legacy/build/pdf.js:543:29)
    at Array.<anonymous> (/path/to/node_modules/pdf-img-convert/node_modules/pdfjs-dist/legacy/build/pdf.js:546:2)
    at __w_pdfjs_require__ (//path/to/node_modules/pdf-img-convert/node_modules/pdfjs-dist/legacy/build/pdf.js:24153:41)
    at /path/to/node_modules/pdf-img-convert/node_modules/pdfjs-dist/legacy/build/pdf.js:24393:13
    at /path/to/node_modules/pdf-img-convert/node_modules/pdfjs-dist/legacy/build/pdf.js:24444:3
    at /path/to/node_modules/pdf-img-convert/node_modules/pdfjs-dist/legacy/build/pdf.js:24447:12
    at webpackUniversalModuleDefinition (/path/to/node_modules/pdf-img-convert/node_modules/pdfjs-dist/legacy/build/pdf.js:25:20)
    at Object.<anonymous> (/path/to/node_modules/pdf-img-convert/node_modules/pdfjs-dist/legacy/build/pdf.js:32:3)
    at Module._compile (node:internal/modules/cjs/loader:1376:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1435:10) {
  message: 'Invalid PDF structure.',
  name: 'InvalidPDFException'
}

Steps to Reproduce:

import { Readable, Writable } from 'stream'
import latex from 'node-latex'
import { convert } from 'pdf-img-convert'

function latex_to_string(pdf) {
    let buffer = ''

    return new Promise((res, rej) => {
        pdf.pipe(new Writable({
            write: function (chunk, encoding, next) {
                buffer += chunk.toString()
                next()
            }
        }))

        pdf.on('error', rej)
        pdf.on('finish', () => res(buffer))
    })
}

const pdf = await latex_to_pdf(`
%\\documentclass{document}
\\documentclass[12pt, letterpaper]{article}

\\begin{document}
    Test
\\end{document}
`)

const img = await convert(Buffer.from(pdf))

Environment:

Mac Ventura

TeX 3.141592653 (TeX Live 2023)
kpathsea version 6.3.5

pdfTeX 3.141592653-2.6-1.40.25 (TeX Live 2023)
kpathsea version 6.3.5

PDF-1.5 created

PDF Resolution (PPI)

How does this handle PDF Resolution? My best guess is its defaulting to 96 PPI? E.g. what if I wanted to process the PDF and convert with a resolution of 300 PPI 3in x 3in.

There can be some pretty high detailed print-ready pdf documents that can lose detail if we can't control the resolution.

canvas install OSX is failing

People can probably figure this out, but if you are on a M1 or M2 Mac, you will get this error:

node_modules/canvas npm ERR! command failed

You can solve this with:

brew install pkg-config cairo pango libpng jpeg giflib librsvg

FATAL ERROR: v8::ToLocalChecked Empty MaybeLocal.

Hello my friend. When I want to load a PDF file with 208 pages, the project stands as in the code given below.

How can we move forward?

I would also like to point out that it actually translated another 240-page PDF file well. It doesn't always give this error.

FATAL ERROR: v8::ToLocalChecked Empty MaybeLocal.
 1: 00007FF644B807BF node_api_throw_syntax_error+175823
 2: 00007FF644B05796 EVP_MD_meth_get_input_blocksize+59654
 3: 00007FF644B0758C node::OnFatalError+252
 4: 00007FF6455B6745 v8::api_internal::ToLocalEmpty+53
 5: 00007FF9E9FFA059 Canvas::getHeight+87113
 6: 00007FF9E9FE4308 Backend::getHeight+1512
 7: 00007FF64556687D v8::internal::Builtins::code+248221
 8: 00007FF645566489 v8::internal::Builtins::code+247209
 9: 00007FF64556674C v8::internal::Builtins::code+247916
10: 00007FF6455665B0 v8::internal::Builtins::code+247504
11: 00007FF64564B2F1 v8::internal::SetupIsolateDelegate::SetupHeap+558449

Warnings

hi, when I convert the pdf in the console I get multiple alerts that say something like this
Warning: getPathGenerator - ignoring character: "Error: Requesting object that isn't resolved yet TrebuchetMS_path_1.".
maybe more 10

Operator precedence issue

I think operator precedence will execute ! before instanceof... and line 92 should be:

  else if (!(pdf instanceof Uint8Array)) {

instead of

  else if (!pdf instanceof Uint8Array) {

Problem with canvas.node

I have a problem running the package on a windows machine and I get this error every time

"
node:internal/modules/cjs/loader:1141:19)
at require (node:internal/modules/cjs/helpers:110:18)
at Object. (...\node_modules\canvas\lib\bindings.js:3:18)
at Module._compile (node:internal/modules/cjs/loader:1254:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)
at Module.load (node:internal/modules/cjs/loader:1117:32)
at Module._load (node:internal/modules/cjs/loader:958:12) {
code: 'ERR_DLOPEN_FAILED'
}

Node.js v18.16.0
"

Fonts missing

So the conversion is super quick and accurate for all images as far as I can tell. However, none of the text is readable. Instead it is a bunch of blocky characters. So it seems that the fonts are missing.

Any plans to support fonts properly?

Getting invalid pdf structure when running in Render

Code:
https://github.com/pathikrit/newswall

This runs fine locally.

But, when I deploy the app on render: https://newswall.onrender.com/

I get the following error:

Dec 14 02:19:16 PM  Could not convert ./.newspapers/2022-12-14/NY_NYT.pdf to png Error
Dec 14 02:19:16 PM      at BaseExceptionClosure (/opt/render/project/src/node_modules/pdfjs-dist/legacy/build/pdf.js:543:29)
Dec 14 02:19:16 PM      at Array.<anonymous> (/opt/render/project/src/node_modules/pdfjs-dist/legacy/build/pdf.js:546:2)
Dec 14 02:19:16 PM      at __w_pdfjs_require__ (/opt/render/project/src/node_modules/pdfjs-dist/legacy/build/pdf.js:24153:41)
Dec 14 02:19:16 PM      at /opt/render/project/src/node_modules/pdfjs-dist/legacy/build/pdf.js:24393:13
Dec 14 02:19:16 PM      at /opt/render/project/src/node_modules/pdfjs-dist/legacy/build/pdf.js:24444:3
Dec 14 02:19:16 PM      at /opt/render/project/src/node_modules/pdfjs-dist/legacy/build/pdf.js:24447:12
Dec 14 02:19:16 PM      at webpackUniversalModuleDefinition (/opt/render/project/src/node_modules/pdfjs-dist/legacy/build/pdf.js:25:20)
Dec 14 02:19:16 PM      at Object.<anonymous> (/opt/render/project/src/node_modules/pdfjs-dist/legacy/build/pdf.js:32:3)
Dec 14 02:19:16 PM      at Module._compile (internal/modules/cjs/loader.js:1068:30)
Dec 14 02:19:16 PM      at Object.Module._extensions..js (internal/modules/cjs/loader.js:1097:10) {
Dec 14 02:19:16 PM    message: 'Invalid PDF structure.',
Dec 14 02:19:16 PM    name: 'InvalidPDFException'
Dec 14 02:19:16 PM  }

You can find the offending pdf here:
https://newswall.onrender.com/static/2022-12-14

You can see the WSJ.png worked fine but not the NYT one ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.