Git Product home page Git Product logo

browserless's Introduction

browserless

Last version Build Status Coverage Status Dependency status Dev Dependencies Status NPM Status

A puppeter-like Node.js library for interacting with Headless production scenarios.

Why

Although you can think puppeteer could be enough, there is a set of use cases that make sense built on top of puppeteer and they are necessary to support into robust production scenario, like:

  • Sensible good defaults, aborting unnecessary requests based of what you are doing (e.g, aborting image request if you just want to get .html content).
  • Easily create a pool of instance (via @browserless/pool).
  • Built-in adblocker for aborting ads requests.

Install

browserless is built on top of puppeteer, so you need to install it as well.

$ npm install puppeteer browserless --save

You can use browserless together with puppeteer, puppeteer-core or puppeteer-firefox.

Internally, the library is divided into different packages based on the functionality

Usage

The browserless API is like puppeteer, but doing more things under the hood (not too much, I promise).

For example, if you want to take an screenshot, just do:

const browserless = require('browserless')()

browserless
  .screenshot('http://example.com', { device: 'iPhone 6' })
  .then(buffer => {
    console.log(`your screenshot is here!`)
  })

You can see more common recipes at @browserless/examples.

Basic

All methods follow the same interface:

  • <url>: The target URL. It's required.
  • [options]: Specific settings for the method. It's optional.

The methods returns a Promise or a Node.js callback if pass an additional function as the last parameter.

.constructor(options)

It creates the browser instance, using puppeter.launch method.

// Creating a simple instance
const browserless = require('browserless')()

or passing specific launchers options:

// Creating an instance for running it at AWS Lambda
const browserless = require('browserless')({
  ignoreHTTPSErrors: true,
  args: [
    '--disable-gpu',
    '--single-process',
    '--no-zygote',
    '--no-sandbox',
    '--hide-scrollbars'
  ]
})

options

See puppeteer.launch#options.

Additionally, you can setup:

timeout

type: number
default: 30000

This setting will change the default maximum navigation time.

puppeteer

type: Puppeteer
default: puppeteer|puppeteer-core|puppeteer-firefox

It's automatically detected based on your dependencies being supported puppeteer, puppeteer-core or puppeteer-firefox.

Alternatively, you can pass it.

incognito

type: boolean
default: false

Every time a new page is created, it will be an incognito page.

An incognito page will not share cookies/cache with other browser pages.

.html(url, options)

It serializes the content from the target url into HTML.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const html = await browserless.html(url)
  console.log(html)
})()

options

See browserless#goto.

.text(url, options)

It serializes the content from the target url into plain text.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const text = await browserless.text(url)
  console.log(text)
})()

options

See browserless#goto.

.pdf(url, options)

It generates the PDF version of a website behind an url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const buffer = await browserless.pdf(url)
  console.log(`PDF generated!`)
})()

options

See browserless#goto.

Additionally, you can setup:

media

type: string
default: 'screen'

Changes the CSS media type of the page using page.emulateMedia.

.screenshot(url, options)

It takes a screenshot from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const buffer = await browserless.screenshot(url)
  console.log(`Screenshot taken!`)
})()

options

See browserless#goto.

Additionally, you can setup:

hide

type: stringstring[]

Hide DOM elements matching the given CSS selectors.

Can be useful for cleaning up the page.

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    hide: ['.crisp-client', '#cookies-policy']
  })
})()

This sets visibility: hidden on the matched elements.

click

type: stringstring[]

Click the DOM element matching the given CSS selector.

disableAnimations

Type: boolean
Default: false

Disable CSS animations and transitions.

modules

type: stringstring[]

Inject JavaScript modules into the page.

Accepts an array of inline code, absolute URLs, and local file paths (must have a .js extension).

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    modules: ['https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js', 'local-file.js', `document.body.style.backgroundColor = 'red`]
  })
})()
scripts

type: stringstring[]

Same as the modules option, but instead injects the code as <script> instead of <script type="module">. Prefer the modules option whenever possible.

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    scripts: ['https://cdn.jsdelivr.net/npm/[email protected]/dist/jquery.min.js', 'local-file.js', `document.body.style.backgroundColor = 'red`]
  })
})()
styles

type: stringstring[]

Inject CSS styles into the page.

Accepts an array of inline code, absolute URLs, and local file paths (must have a .css extension).

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    styles: ['https://cdn.jsdelivr.net/npm/[email protected]/dist/dark.css', 'local-file.css', `body { background: red; }`, ``]
  })
})()
scrollTo

type: string | object

Scroll to the DOM element matching the given CSS selector.

overlay

type: object

After the screenshot has been taken, this option allows you to place the screenshot into a fancy overlay

You can configure the overlay specifying:

  • browser: It sets the browser image overlay to use, being safari-light and safari-dark supported values.

  • background: It sets the background to use, being supported to pass:

    • An hexadecimal/rgb/rgba color code, eg. #c1c1c1.
    • A CSS gradient, eg. linear-gradient(225deg, #FF057C 0%, #8D0B93 50%, #321575 100%)
    • An image url, eg. https://source.unsplash.com/random/1920x1080.
;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    hide: ['.crisp-client', '#cookies-policy'],
    overlay: {
      browser: 'safari-dark',
      background: 'linear-gradient(45deg, rgba(255,18,223,1) 0%, rgba(69,59,128,1) 66%, rgba(69,59,128,1) 100%)'
    }
  })
})()

.devices

List of all available devices preconfigured with deviceName, viewport and userAgent settings.

These devices are used for emulation purposes.

.getDevice(deviceName)

Get a specific device descriptor settings by descriptor name.

const browserless = require('browserless')

browserless.getDevice('Macbook Pro 15')

// {
//   name: 'Macbook Pro 15',
//   userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X …',
//   viewport: {
//     width: 1440,
//     height: 900,
//     deviceScaleFactor: 1,
//     isMobile: false,
//     hasTouch: false,
//     isLandscape: false
//   }
// }

Advanced

The following methods are exposed to be used in scenarios where you need more granularity control and less magic.

.browser

It returns the internal browser instance used as singleton.

const browserless = require('browserless')

;(async () => {
  const browserInstance = await browserless.browser
})()

.evaluate(page, response)

It exposes an interface for creating your own evaluate function, passing you the page and response.

const browserless = require('browserless')()

const getUrlInfo = browserless.evaluate((page, response) => ({
  statusCode: response.status(),
  url: response.url(),
  redirectUrls: response.request().redirectChain()
}))

;(async () => {
  const url = 'https://example.com'
  const info = await getUrlInfo(url)

  console.log(info)
  // {
  //   "statusCode": 200,
  //   "url": "https://example.com/",
  //   "redirectUrls": []
  // }
})()

Note you don't need to close the page; It will be done under the hood.

Internally the method performs a .goto.

.goto(page, options)

It performs a smart page.goto, blocking ads and trackers requests and other requests based on resourceType.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
  await browserless.goto(page, { url: 'http://savevideo.me' })
})()

options

Any option passed here will bypass to page.goto.

Additionally, you can setup:

url

type: string

The target URL

adblock

type: boolean
default: true

It will be abort requests detected as ads.

headers

type: object

An object containing additional HTTP headers to be sent with every request.

waitFor

type:string|function|number
default: 0

Wait a quantity of time, selector or function using page.waitFor.

waitUntil

type: stringstring[]
default: ['networkidle0']

Specify a list of events until consider navigation succeeded, using page.waitForNavigation.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

cookies

type: object[]

A collection of cookie's object to set in the requests send.

device

type: string
default: 'macbook pro 13'

It specifies the device descriptor to use in order to retrieve userAgent and viewport

.page()

It returns a standalone browser new page.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
})()

Pool of Instances

browserless uses internally a singleton browser instance.

You can use a pool instances using @browserless/pool package.

const createBrowserless = require('@browserless/pool')
const browserlessPool = createBrowserless({
  poolOpts: {
    max: 15,
    min: 2
  }
})

The API is the same than browserless. now the constructor is accepting an extra option called poolOpts.

This setting is used for initializing the pool properly. You can see what you can specify there at node-pool#opts.

Also, you can interact with a standalone browserless instance of your pool.

const createBrowserless = require('browserless')
const browserlessPool = createBrowserless.pool()

// get a browserless instance from the pool
browserlessPool(async browserless => {
  // get a page from the browser instance
  const page = await browserless.page()
  await browserless.goto(page, { url: url.toString() })
  const html = await page.content()
  console.log(html)
  process.exit()
})

You don't need to think about the acquire/release step: It's done automagically ✨.

Packages

browserless is internally divided into multiple packages for ensuring just use the mininum quantity of code necessary for your user case.

Package Version Dependencies
browserless npm Dependency Status
@browserless/benchmark npm Dependency Status
@browserless/devices npm Dependency Status
@browserless/examples npm Dependency Status
@browserless/goto npm Dependency Status
@browserless/pool npm Dependency Status
@browserless/screenshot npm Dependency Status

Benchmark

For testing different approach, we included a tiny benchmark tool called @browserless/benchmark.

FAQ

Q: Why use browserless over Puppeteer?

browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.

Q: Why do you block ads scripts by default?

Headless navigation is expensive compared with just fetch the content from a website.

In order to speed up the process, we block ads scripts by default because they are so bloat.

Q: My output is different from the expected

Probably browserless was too smart and it blocked a request that you need.

You can active debug mode using DEBUG=browserless environment variable in order to see what is happening behind the code:

DEBUG=browserless node index.js

Consider open an issue with the debug trace.

Q: Can I use browserless with my AWS Lambda like project?

Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.

License

browserless © Kiko Beats, Released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.

logo designed by xinh studio.

kikobeats.com · GitHub Kiko Beats · Twitter @kikobeats

browserless's People

Contributors

dependabot-preview[bot] avatar greenkeeper[bot] avatar imgbot[bot] avatar imgbotapp avatar kikobeats avatar remusao avatar staabm avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.