shawnliujw / puppeteer-lambda Goto Github PK

Module for using Headless-Chrome by Puppeteer on AWS Lambda.

License: MIT License

JavaScript 81.08% Dockerfile 10.45% Shell 8.46%

lambda chrome puppeteer puppeteer-lambda headless-chrome headless

puppeteer-lambda's Introduction

Puppeteer Lambda

Module for using Headless-Chrome by Puppeteer on AWS Lambda.
Idea from Puppeteer Lambda Starter Kit , thanks Taiki Sakamoto

How to use

npm install puppeteer-lambda
add --registry=https://registry.npm.taobao.org/ if you can't download the chromnium in China

(async () => {
    const puppeteerLambda = require('puppeteer-lambda');
    const browser = await puppeteerLambda.getBrowser({
    headless: true
    });
    const page = await browser.newPage();
    await page.goto('https://example.com');
    await page.screenshot({path: 'example.png'});

    await browser.close(); 
})();

NOTE: Suggest not to close browser in Lambda ENV, if close it , the Browser object is considered disposed and cannot be used anymore.

Packaging & Deploy

Lambda's memory needs to be set to at least 384 MB, but the more memory, the better the performance of any operations.

512MB -> goto(youtube): 6.481s
1536MB -> goto(youtube): 2.154s

You should also set a environment variable in lambda:

CUSTOM_CHROME = true

NOTE: This project uses puppeteer so don't forget to set PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true before run npm install when you prepare the package for lambda.

1.chrome in package (recommended)

run PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true CUSTOM_CHROME=true npm install puppeteer-lambda ,then deploy the package to lambda and set the following env variables in lambda.

CUSTOM_CHROME(required): tell the progress to use the custom chrome(locale version or download from s3 automatically)

node_modules/puppeteer-lambda should like:

puppeteer-lambda
│   README.md
│   ...    
│
└───chrome
│   │   headless_shell.tar.gz
│   
└───node_modules
    │   ...
│   
└───src
    │   ...
│   
└───test
    │   ...

2.chrome NOT in package

Due to the large size of Chrome, it may exceed the Lambda package size limit (50MB) depending on the other module to include. In that case, put Chrome Binary in S3 and download it at container startup so startup time will be longer.
You can also download the specific version of chrome from Serverless Chrome

Run PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true npm install puppeteer-lambda, deploy the package , and set following env valiables on Lambda.

CHROME_BUCKET(required): S3 bucket where Chrome is put
CHROME_KEY(optional): S3 key. default: stable-headless-chromium-amazonlinux-2017-03.zip

How to Test

1. run test from your localhost

run npm run test

2 run in aws lambda simulation environment

test nodejs 8.10 npm run test-node8
test nodejs 10.x npm run test-node10

Q&A

Why not use `puppeteer-core`?

In development mode ,we still need chromnium for debugging , so better to puppeteer which will install chromnium automatically

How do we use `puppeteer-lambda` with TypeScript?

puppeteer-lambda type definitions depends on @types/puppeteer definition. You must add @types/puppeteer in your project.

npm install @types/puppeteer .

AWS Lambda Version .

now the prebuilt chromium v1.0.0-55 doesn't support AWS Lambda Nodejs version 10.x , if please use nodejs8.10 , if u prefer to use node10.x , please follow the instruction to build your own chromium and modify the configuration here NOTE: also please have a look this issue , seems lambda is changing their lambda environment, i tried built from amazonlinux 2 which is the base image for nodejs10x , but it still can not fund from lambda:nodejs10.x .

puppeteer-lambda's People

Contributors

Stargazers

Watchers

Forkers

moemachef alfonsograna julianwyz lockedon optimizely janakg lmedeiros mattaurich asiellb science37 yamachu matdurand beakyn michaelfamarques partnernetsoftware liambutler-lawrence xxxalice geekwolverine firewallach

puppeteer-lambda's Issues

Unzip failure with v1.0.15

Hi,

We've been running v1.0.14 with no problems. We just updated to v1.0.15 and now its failing on invocation with:

Error: invalid signature: 0x88b1f\n at /var/task/node_modules/unzip/lib/parse.js:59:13\n at runCallback (timers.js:794:20)\n at tryOnImmediate (timers.js:752:5)\n at processImmediate [as _immediateCallback] (timers.js:729:5)

Any ideas why? Seems like its related to unzipping the browser zip file maybe? Its the same file we've been using since v1.0.13

won't execute setupS3Chrome or getBrowser()

I cant pass puppeteerLambda.getBrowser() for few days.So I will be greatfull if anyone of the creators help or discuss. Evrything works fine I connect to db and stuff.
Debug: setup s3 chrome. .... which is called by :

const setupS3Chrome = () => { return new Promise((resolve, reject) => { const params = { Bucket: config.remoteChromeS3Bucket, Key: config.remoteChromeS3Key, };

I have passed CHROME_BUCKET and CHROME_KEY

async function parseEngine(commandArray, inPage, iterator, parameters) { try { let page = inPage; if (!page) { console.log('getting browser'); const browser = await puppeteerLambda.getBrowser({ headless: true, slowMo: 100, args: ['--no-sandbox', '--disable-setuid-sandbox', '--start-fullscreen', '--window-size=1413,749']}); console.log('opening page'); page = await browser.newPage(); await page.setViewport({ width: 1413, height: 749 }); page.setUserAgent(config.USER_AGENT); }

puppeteer-core

Suggestion: Update to use puppeteer core rather than puppeteer; you wont need to get people to define PUPPETEER_SKIP_CHROMIUM_DOWNLOAD.

https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#puppeteer-vs-puppeteer-core

Chromium version incompatible with puppeteer version?

I'm running "puppeteer-lambda": "1.1.3", on Node 8.10 of lambda, and I'm getting this error from within chromium during a request:

{ Error: Protocol error (Fetch.enable): 'Fetch.enable' wasn't found
    at Promise (/var/task/node_modules/puppeteer/lib/Connection.js:183:56)
    at new Promise (<anonymous>)
    at CDPSession.send (/var/task/node_modules/puppeteer/lib/Connection.js:182:12)
    at NetworkManager._updateProtocolRequestInterception (/var/task/node_modules/puppeteer/lib/NetworkManager.js:139:22)
    at NetworkManager.setRequestInterception (/var/task/node_modules/puppeteer/lib/NetworkManager.js:128:16)
    at Page.setRequestInterception (/var/task/node_modules/puppeteer/lib/Page.js:289:48)
    at Page.<anonymous> (/var/task/node_modules/puppeteer/lib/helper.js:112:23)
    at makePage (/var/task/robots/capture.js:29:16)
    at <anonymous>
    at process._tickDomainCallback (internal/process/next_tick.js:228:7)
  message: 'Protocol error (Fetch.enable): \'Fetch.enable\' wasn\'t found' }
2019-10-03T01:23:43.126Z	edf24267-fab0-4261-b22a-db2c1a0a779e	results [ [ 'Error: Protocol error (Fetch.enable): \'Fetch.enable\' wasn\'t found\n    at Promise (/var/task/node_modules/puppeteer/lib/Connection.js:183:56)\n    at new Promise (<anonymous>)\n    at CDPSession.send (/var/task/node_modules/puppeteer/lib/Connection.js:182:12)\n    at NetworkManager._updateProtocolRequestInterception (/var/task/node_modules/puppeteer/lib/NetworkManager.js:139:22)\n    at NetworkManager.setRequestInterception (/var/task/node_modules/puppeteer/lib/NetworkManager.js:128:16)\n    at Page.setRequestInterception (/var/task/node_modules/puppeteer/lib/Page.js:289:48)\n    at Page.<anonymous> (/var/task/node_modules/puppeteer/lib/helper.js:112:23)\n    at makePage (/var/task/robots/capture.js:29:16)\n    at <anonymous>\n    at process._tickDomainCallback (internal/process/next_tick.js:228:7)'

It looks like there's a closed issue on puppeteer puppeteer/puppeteer#4542 and their suggestion is make sure the chromium version matches the puppeteer version.

Is it possible that without pinning libraries, the current version of this library won't have matching chromium / puppeteer?

If not that, then anything else you can think of would be appreciated. Thank you.

Missing required key 'Bucket' in params

Hi. I am using puppeteer-lambda. I set up CUSTOM_CHROME as a lambda variable, but when i try to execute the function I have this error. Can anyone please tell me where to look or what does it means:
2018-10-08T09:35:51.790Z 8ae5a981-cadd-11e8-bbbf-c7ce2c26cd5d { MissingRequiredParameter: Missing required key 'Bucket' in params at ParamValidator.fail (/var/task/node_modules/aws-sdk/lib/param_validator.js:50:37) at ParamValidator.validateStructure (/var/task/node_modules/aws-sdk/lib/param_validator.js:61:14) at ParamValidator.validateMember (/var/task/node_modules/aws-sdk/lib/param_validator.js:88:21) at ParamValidator.validate (/var/task/node_modules/aws-sdk/lib/param_validator.js:34:10) at Request.VALIDATE_PARAMETERS (/var/task/node_modules/aws-sdk/lib/event_listeners.js:125:42) at Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:109:20) at callNextListener (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:99:12) at /var/task/node_modules/aws-sdk/lib/event_listeners.js:85:9 at finish (/var/task/node_modules/aws-sdk/lib/config.js:322:7) at /var/task/node_modules/aws-sdk/lib/config.js:340:9 at EnvironmentCredentials.get (/var/task/node_modules/aws-sdk/lib/credentials.js:126:7) at getAsyncCredentials (/var/task/node_modules/aws-sdk/lib/config.js:334:24) at Config.getCredentials (/var/task/node_modules/aws-sdk/lib/config.js:354:9) at Request.VALIDATE_CREDENTIALS (/var/task/node_modules/aws-sdk/lib/event_listeners.js:80:26) at Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:105:18) at Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:81:10) message: 'Missing required key \'Bucket\' in params', code: 'MissingRequiredParameter', time: 2018-10-08T09:35:51.788Z }

Error with unzipper

Hi,

Thanks for this awesome work. I am trying to deploy it with my chrome in s3. but I am receiving an error while unzipping.

{"errorMessage":"invalid signature: 0x88b1f","errorType":"Error","stackTrace":["/opt/nodejs/node_modules/unzipper/lib/parse.js:62:26","tryCatcher (/opt/nodejs/node_modules/unzipper/node_modules/bluebird/js/release/util.js:16:23)"

Thanks in advance.

Assertion error

Hello,

I am attempting to use your package here. However I get this error:

{ "errorMessage": "Chromium revision is not downloaded. Run \"npm install\" or \"yarn install\"", "errorType": "AssertionError [ERR_ASSERTION]", "stackTrace": [ "Console.assert (console.js:194:23)", "Function.launch (/var/task/node_modules/puppeteer/lib/Launcher.js:97:15)", "<anonymous>" ] }

In particular this is the stack trace with the debug flag on
TypeError: Cannot read property 'version' of null at _callee3$ (~/node_modules/puppeteer-lambda/src/index.bundle.js:210:40) at tryCatch (~/node_modules/regenerator-runtime/runtime.js:62:40) at Generator.invoke [as _invoke] (~/node_modules/regenerator-runtime/runtime.js:296:22) at Generator.prototype.(anonymous function) [as next] (~/node_modules/regenerator-runtime/runtime.js:114:21) at step (~/node_modules/babel-runtime/helpers/asyncToGenerator.js:17:30) at ~/node_modules/babel-runtime/helpers/asyncToGenerator.js:35:14 at new Promise (<anonymous>) at new F (~/node_modules/core-js/library/modules/_export.js:36:28) at ~/node_modules/babel-runtime/helpers/asyncToGenerator.js:14:12 at isBrowserAvailable (~/node_modules/puppeteer-lambda/src/index.bundle.js:235:22)

It looks like the isBrowesrAvailable is not handling the case when it gets a false and looks for a version.

Thank you

Timeout error when navigating

Hey, I deployed to AWS and successfully got a page title but I got a timeout error when navigating. There is no issue running in a normal node env.

Navigation Timeout Exceeded: 30000ms exceeded

page.click(SEARCH_SELECTOR); 
await page.waitForNavigation();

libnss3.so

#24 same error, solution was to use node 8.x but aws does not allow that version anymore

i got error on page.title

thanks your library.

when i try, i got this error.
do you have any idea?

code on lamda


exports.handler = async (event) => {
    const puppeteerLambda = require('puppeteer-lambda');
    const browser = await puppeteerLambda.getBrowser({
    headless: true
    });
    const page = await browser.newPage();
    await page.goto('https://www.yahoo.co.jp');
    let title = await page.title;

    await browser.close(); 

    // TODO implement
    const response = {
        statusCode: 200,
        body: JSON.stringify(`hello ${title}`),
    };
    return response;
};

response

{
  "statusCode": 200,
  "body": "\"hello function (...args) {\\n        const syncStack = new Error();\\n        return method.call(this, ...args).catch(e => {\\n          const stack = syncStack.stack.substring(syncStack.stack.indexOf('\\\\n') + 1);\\n          const clientStack = stack.substring(stack.indexOf('\\\\n'));\\n          if (!e.stack.includes(clientStack))\\n            e.stack += '\\\\n  -- ASYNC --\\\\n' + stack;\\n          throw e;\\n        });\\n      }\""
}

log

START RequestId: 621187f0-e6f0-4508-b3dd-75028fd1a93d Version: $LATEST
2019-03-02T00:32:51.136Z	621187f0-e6f0-4508-b3dd-75028fd1a93d	Launch chrome: HeadlessChrome/69.0.3497.81
2019-03-02T00:32:52.738Z	621187f0-e6f0-4508-b3dd-75028fd1a93d	******************************************************************************************************************************  
2019-03-02T00:32:52.739Z	621187f0-e6f0-4508-b3dd-75028fd1a93d	Suggest not to close browser in Lambda ENV, if close it , the Browser object is considered disposed and cannot be used anymore. 
2019-03-02T00:32:52.739Z	621187f0-e6f0-4508-b3dd-75028fd1a93d	******************************************************************************************************************************  
END RequestId: 621187f0-e6f0-4508-b3dd-75028fd1a93d
REPORT RequestId: 621187f0-e6f0-4508-b3dd-75028fd1a93d	Duration: 7945.49 ms	Billed Duration: 8000 ms 	Memory Size: 768 MB	Max Memory Used: 433 MB

i set env

CUSTOM_CHROME true

_getBrowser in src/index.js stuck in infinite loop

In src/index.js the _getBrowser routine gets stuck in an infinite loop when the browser element is closed and opened again by another lambda instance

browser.close();

// Lambda invoked again with reused context
puppeteerLambda.getBrowser({});    //Stuck forever

AWS Lambda reuses execution context, so if invoked too soon again, it gets stuck in a loop.

In src/index.js

if (null === globalBrowser && !getting) {
    //....;
}else{
    //infinite loop;
}

The if condition always fails if globalBrowser is null and the value of getting is true. On further inspection, the value of getting is not being set to false ever. Adding getting = false inside the if condition after the browser is successfully launched fixes the issue

error during run

when I run on lambda I get the following error:

/tmp/headless-chromium: error while loading shared libraries: libnss3.so: cannot open shared object file: No such file or directory

TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md

at onClose (/var/task/node_modules/puppeteer/lib/Launcher.js:342:14)
at Interface.helper.addEventListener (/var/task/node_modules/puppeteer/lib/Launcher.js:331:50)
at Interface.emit (events.js:194:15)
at Interface.EventEmitter.emit (domain.js:441:20)
at Interface.close (readline.js:379:8)
at Socket.onend (readline.js:157:10)
at Socket.emit (events.js:194:15)
at Socket.EventEmitter.emit (domain.js:441:20)
at endReadableNT (_stream_readable.js:1125:12)
at process._tickCallback (internal/process/next_tick.js:63:19) Failed to launch chrome!
/tmp/headless-chromium: error while loading shared libraries: libnss3.so: cannot open shared object file: No such file or directory

TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md

Unzipped size limit exceeded when trying to deploy

Thanks for the great package.

We've manage to get everything prepared for running our script with lambda and things work just fine locally using the lambda-local npm package.

When trying to deploy on lambda we're faced with an "Unzipped size must be smaller than" error.

What's the solution to overcome this limitation?

Thanks!

HTTPS links in HTML cause 502 response

HTML content with HTTPS links like this cause lambda to respond with a 502 error:

I used this to render the HTML:
await page.goto(data:text/html;base64,${Buffer.from(html).toString('base64')}, {
waitUntil: 'networkidle0'
});

I also tried this:
await page.setContent(html);

When I changed the links to HTTP, the HTML rendered as expected.

I wrangled with this for several hours and then moved to chrome-aws-lambda and the issue went away without changing the puppeteer code.

Unzipped size must be smaller than xxxxxbytes

I try to deploy my lambda project with puppeteer-lambda, it's ok in local but it failed when I deploy on aws:

An error occurred: ConvertToPdfLambdaFunction - Unzipped size must be smaller than 262144000 bytes (Service: AWSLambda; Status Code: 400; Error Code: InvalidParameterValueException
I use this command for deployment :

PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true && yarn install && yarn test && SLS_DEBUG=* yarn build && serverless deploy --stage dev

have you an idea how to fix that ?

_getBrowser : SyntaxError: Unexpected token (

I have this issue with a basic example:

node_modules/puppeteer-lambda/src/index.js:11
const _getBrowser = async (options) => {
^ SyntaxError: Unexpected token (

It happens on the line

const puppeteerLambda = require("puppeteer-lambda");

Need's Documentation on how to use

Awesome work here! Let's make this easy to use with a readme!

Chromium revision is not downloaded

Hi, I have tried multiple options to run a puppeteer lambda function.
If anyone can help me I will be so greatfull.
this is my package.json
{ "name": "scraper-surveilance", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" && exit 1", "update": "claudia update --keep --use-s3-bucket claudia-lambda-uploads --set-env-from-json test-environment.json" }, "config": { "PUPPETEER_SKIP_CHROMIUM_DOWNLOAD": true }, "keywords": [], "author": "Ratko Korlevski", "license": "ISC", "dependencies": { "assert": "^1.4.1", "mongodb": "^3.1.6", "puppeteer-lambda": "^1.0.12" }, "devDependencies": { "claudia": "^5.1.2" }, "optionalDependencies": { "aws-sdk": "^2.327.0" } }
When i run test on my Lambda function I get this error:
Error: Chromium revision is not downloaded. Run "npm install" or "yarn install" at Launcher.launch (/var/task/node_modules/puppeteer/lib/Launcher.js:112:15) at <anonymous>

Error: WebSocket is not open: readyState 3 (CLOSED)

I get this error exactly every other time I run the lambda. Only running the code below.

(async () => {
    const browser = await puppeteer.getBrowser({
      headless: true
    });
    await browser.close();
    callback(null, { test: 'hello' });
  })();

START RequestId: 35bb8dae-fe2c-11e8-be11-63fb084f25a3 Version: $LATEST
2018-12-12T16:37:26.901Z	35bb8dae-fe2c-11e8-be11-63fb084f25a3	Error: WebSocket is not open: readyState 3 (CLOSED)
    at WebSocket.send (/var/task/node_modules/ws/lib/websocket.js:322:19)
    at WebSocketTransport.send (/var/task/node_modules/puppeteer/lib/WebSocketTransport.js:57:14)
    at Connection.send (/var/task/node_modules/puppeteer/lib/Connection.js:71:21)
    at Browser._getVersion (/var/task/node_modules/puppeteer/lib/Browser.js:267:29)
    at Browser.version (/var/task/node_modules/puppeteer/lib/Browser.js:242:32)
    at Browser.<anonymous> (/var/task/node_modules/puppeteer/lib/helper.js:145:23)
    at isBrowserAvailable (/var/task/node_modules/puppeteer-lambda/src/index.js:53:29)
    at _getBrowser (/var/task/node_modules/puppeteer-lambda/src/index.js:12:41)
    at Object.exports.getBrowser (/var/task/node_modules/puppeteer-lambda/src/index.js:47:11)
    at /var/task/index.js:15:37
    at exports.handler (/var/task/index.js:34:5)
END RequestId: 35bb8dae-fe2c-11e8-be11-63fb084f25a3
REPORT RequestId: 35bb8dae-fe2c-11e8-be11-63fb084f25a3	Duration: 20020.45 ms	Billed Duration: 20000 ms 	Memory Size: 1024 MB	Max Memory Used: 215 MB	
2018-12-12T16:37:46.895Z 35bb8dae-fe2c-11e8-be11-63fb084f25a3 Task timed out after 20.02 seconds

AWS SDK is not needed when publishing to lambda

When trying to use this package for the first time,
I couldn't get the size below 52MB and almost called it a lost cause.

However, apparently AWS Lambda does not require AWS SDK in node modules to run correctly, after deleting the node_modules inside puppeteer-lambda, the zip size was 47MB and everything worked correctly.

A note in the readme to help out other people may be a good idea.

Aside from that, great work, worked first time for me where the other packages failed.

Missing default fonts

Hello,
I managed to deploy fine, but default system fonts as times New Roman seem to be missing when opening a local html file.
I'm using "chrome in package" and it was downloaded from: https://raw.githubusercontent.com/shawnLiujianwei/puppeteer-lambda-binary/master/chrome/headless_shell.tar.gz

And put in puppeteer-lambda> chrome> headless_shell.tar.gz

Failed to install puppeteer-lambda on mac

I followed the instruction on readme to install this dependency but it strucks on download. I run this command on Mac. Node is v8.9.0 npm is 5.6.0. Below is the output. I wonder whether I need to set the env CUSTOM_CHROME=true on local command or set it in lambda function?

$CUSTOM_CHROME=true npm install puppeteer-lambda
> [email protected] install /Users/joeyzhao/dev/bigcrunch/TheBigCrunch/TheLambda/lambdas/node_modules/puppeteer-core
> node install.js


> [email protected] install /Users/joeyzhao/dev/bigcrunch/TheBigCrunch/TheLambda/lambdas/node_modules/puppeteer-lambda
> node src/install.js

downloading chrome ...

disable logs

Could we have an option to disable the logs?
They are a bit expensive in AWS lambda
Thank you

Is there a way to make this library work in Mac?

I am using puppeteer-lambda on a Mac but I found it is hard to use this library. I get permission error about /tmp directory Error: EACCES: permission denied, unlink '/tmp'. And where can I find headless chrome for Mac? I searched this project https://github.com/adieuadieu/serverless-chrome/releases but it doesn't have Mac version.