Git Product home page Git Product logo

observable-prerender's Introduction

observable-prerender

Pre-render Observable notebooks with Puppeteer! Inspired by d3-pre

Why tho

Observable notebooks run in the browser and use browser APIs, like SVG, canvas, webgl, and so much more. Sometimes, you may want to script or automate Observable notebooks in some way. For example, you may want to:

  • Create a bar chart with custom data
  • Generate a SVG map for every county in California
  • Render frames for a MP4 screencast of a custom animation

If you wanted to do this before, you'd have to manually open a browser, re-write code, upload different file attachments, download cells, and repeat it all many times. Now you can script it all!

Examples

Check out examples/ for workable code.

Create a bar chart with your own data

const { load } = require("@alex.garcia/observable-prerender");
(async () => {
  const notebook = await load("@d3/bar-chart", ["chart", "data"]);
  const data = [
    { name: "alex", value: 20 },
    { name: "brian", value: 30 },
    { name: "craig", value: 10 },
  ];
  await notebook.redefine("data", data);
  await notebook.screenshot("chart", "bar-chart.png");
  await notebook.browser.close();
})();

Result: Screenshot of a bar chart with 3 bars, with labels alex, brian and craig, with values 20, 30, and 10, respectively.

Create a map of every county in California

const { load } = require("@alex.garcia/observable-prerender");
(async () => {
  const notebook = await load(
    "@datadesk/base-maps-for-all-58-california-counties",
    ["chart"]
  );
  const counties = await notebook.value("counties");
  for await (let county of counties) {
    await notebook.redefine("county", county.fips);
    await notebook.screenshot("chart", `${county.name}.png`);
    await notebook.svg("chart", `${county.name}.svg`);
  }
  await notebook.browser.close();
})();

Some of the resulting PNGs:

- -
Picture of a simple map of Los Angeles county. Picture of a simple map of Merced county.
Picture of a simple map of Sacramento county. Picture of a simple map of San Diegoo county.

Create frames for an animated GIF

Create PNG frames with observable-prerender:

const { load } = require("@alex.garcia/observable-prerender");
(async () => {
  const notebook = await load("@asg017/sunrise-and-sunset-worldwide", [
    "graphic",
    "controller",
  ]);
  const times = await notebook.value("times");
  for (let i = 0; i < times.length; i++) {
    await notebook.redefine("timeI", i);
    await notebook.waitFor("controller");
    await notebook.screenshot("graphic", `sun${i}.png`);
  }
  await notebook.browser.close();
})();

Then use something like ffmpeg to create a MP4 video with those frames!

 ffmpeg.exe -framerate 30 -i sun%03d.png -c:v libx264  -pix_fmt yuv420p out.mp4

Result (as a GIF, since GitHub only supports gifs):

Screencast of a animation of sunlight time in Los Angeles during the year.

Working with puppeteer-cluster

You can pass in raw Puppeteer browser/page objects into load(), which works really well with 3rd party Puppeteer tools like puppeteer-cluster. Here's an example where we have a cluster of Puppeteer workers that take screenshots of the chart cells of various D3 examples:

const { Cluster } = require("puppeteer-cluster");
const { load } = require("@alex.garcia/observable-prerender");

(async () => {
  const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_CONTEXT,
    maxConcurrency: 2,
  });

  await cluster.task(async ({ page, data: notebookId }) => {
    const notebook = await load(notebookId, ["chart"], { page });
    await notebook.screenshot("chart", `${notebookId}.png`.replace("/", "_"));
  });

  cluster.queue("@d3/bar-chart");
  cluster.queue("@d3/line-chart");
  cluster.queue("@d3/directed-chord-diagram");
  cluster.queue("@d3/spike-map");
  cluster.queue("@d3/fan-chart");

  await cluster.idle();
  await cluster.close();
})();

The observable-prerender CLI

Check out the /cli-examples directory for bash scripts that show off the different arguments of the bundled CLI programs.

Install

npm install @alex.garcia/observable-prerender

API Reference

Although not required, a solid understanding of the Observable notebook runtime and the embedding process could help greatly when building with this tool. Here's some resources you could use to learn more:

prerender.load(notebook, targets, config)

Load the given notebook into a page in a browser.

  • notebook <[string]> ID of the notebook on observablehq.com, like @d3/bar-chart or @asg017/bitmoji. For unlisted notebooks, be sure to include the d/ prefix (e.g. d/27a0b05d777304bd).

  • targets <[Array]<[string]>> array of cell names that will be evaluated. Every cell in targets (and the cells they depend on) will be evaluated and render to the page's DOM. If not supplied, then all cells (including anonymous ones) will be evaluated by default.

  • config is an object with key/values for more control over how to load the notebook.

Key Value
browser Supply a Puppeteer Browser object instead of creating a new one. Good for headless:false debugging.
page Supply a Puppeteer Page object instead of creating a new browser or page. Good for use in something like puppeteer-cluster
OBSERVABLEHQ_API_KEY Supply an ObservableHQ API Key to load in private notebooks. NOTE: This library uses the api_key URL query parameter to supply the key to Observable, which according to their guide, is meant for testing and development.
height Number, height of the Puppeteer browser that will be created. If browser is also passed, this will be ignored. Default 675.
width Number, idth of the Puppeteer browser that will be created. If browser is also passed, this will be ignored. Default 1200.
headless Boolean, whether the Puppeteer browser should be "headless" or not. great for debugging. Default true.

.load() returns a Notebook object. A Notebook has page and browser properties, which are the Puppeteer page and browser objects that the notebook is loaded with. This gives a lower-level API to the underlying Puppeteer objects that render the notebook, in case you want more fine-grain API access for more control.

notebook.value(cell)

Returns a Promise that resolves value of the given cell for the book. For example, if the @d3/bar-chart notebook is loaded, then .value("color") would return "steelblue", .value("height") would return 500, and .value("data) would return the 26-length JS array containing the data.

Keep in mind that the value return is serialized from the browser to Node, see below for details.

notebook.redefine(cell, value)

Redefine a specific cell in the Notebook runtime to a new value. cell is the name of the cell that will be redefined, and value is the value that cell will be redefined as. If cell is an object, then all of the object's keys/values will be redefined on the notebook (e.g. cell={a:1, b:2} would redefine cell a to 1 and b to 2).

Keep in mind that the value return is serialized from the browser to Node, see below for details.

notebook.screenshot(cell, path, options)

Take a screenshot of the container of the element that contains the rendered value of cell. path is the path of the saved screenshot (PNG), and options is any extra options that get added to the underlying Puppeteer .screenshot() function (list of options here). For example, if the @d3/bar-chart notebook is loaded, notebook.screenshot('chart')

notebook.svg(cell, path)

If cell is a SVG cell, this will save that cell's SVG into path, like .screenshot(). Keep in mind, the browser's CSS won't be exported into the SVG, so beware of styling with class.

notebook.pdf(path, options)

Use Puppeteer's .pdf() function to render the entire page as a PDF. path is the path of the PDF to save to, options will be passed into Puppeteer's .pdf() function. This will wait for all the cells in the notebook to be fulfilled. Note, this can't be used on a non-headless browser.

notebook.waitFor(cell, status)

Returns a Promise that resolves when the cell named cell is "fulfilled" (see the Observable inspector documentation for more details). The default is fulfilled, but status could also be "pending" or "rejected". Use this function to ensure that youre redefined changes propagate to dependent cells. If no parameters are passed in, then the Promise will wait all the cells, including un-named ones, to finish executing.

notebook.fileAttachments(files)

Replace the FileAttachments of the notebook with those defined in files. files is an object where the keys are the names of the FileAttachment, and the values are the absolute paths to the files that will replace the FileAttachments.

notebook.$(cell)

Returns the ElementHandle of the container HTML element for the given observed cell. Can be used to call .click(), .screenshot(), .evaluate(), or any other method to have more control of a specfic rendered cell.

CLI Reference

observable-prerender also comes bundled with 2 CLI programs, observable-prerender and observable-prerender-animate, that allow you to more quickly pre-render notebooks and integrate with local files and other CLI tools.

observable-prerender [options] <notebook> [cells...]

Pre-render the given notebook and take screenshots of the given cells. <notebook> is the observablehq.com ID of the notebook to load, same argument as the 1st argument in .load(). [cells...] is the list of cells that will be screenshotted from the notebook. By default, the screenshots will be saved as <cell_name>.<format> in the current directory.

Run observable-prerender --help to get a full list of options.

observable-prerender-animate [options] <notebook> [cells...] --iter cell:cellIterator

Pre-render the given notebook, iterate through the values of the cellIterator cell on the cell cell, and take screenshots of the argument cells. <notebook> is the observablehq.com ID of the notebook to load, same argument as the 1st argument in .load(). [cells...] is the list of cells that will be screenshotted from the notebook. --iter is the only required option, in the format of cell:cellIterator, where cell is the cell that will change on every loop, and cellIterator will be the cell that contains all the values.

Run observable-prerender-animate --help to get a full list of options.

Caveats

Beta

This library is mostly a proof of concept, and probably will change in the future. Follow Issue #2 to know when the stable v1 library will be complete. As always, feedback, bug reports, and ideas will make v1 even better!

Serialization

There is a Puppeteer serialization process when switching from browser JS data to Node. Returning primitives like arrays, plain JS objects, numbers, and strings will work fine, but custom objects, HTML elements, Date objects, and some typed arrays may not. Which means that some methods like .value() or .redefine() may be limited or may not work as expected, causing subtle bugs. Check out the Puppeteer docs for more info about this.

Animation is hard

You won't be able to make neat screencasts from all Observable notebooks. Puppeteer doesn't support taking a video recording of a browser, so instead, the suggested method is to take several PNG screenshots, then stitch them all together into a gif/mp4 using ffmpeg or some other service.

So what should you screenshot, exactly? It depends on your notebook. You probably need to have some counter/index/pointer that changes the graph when updated (see scrubber). You can programmatically redefine that cell using notebook.redefine in some loop, then screenshot the graph once the changes propagate (notebook.waitFor). But keep in mind, this may work for JS transitions, but CSS animations may not render properly or in time, so it really depends on how you built your notebook. it's super hard to get it right without some real digging.

If you run into any issues getting frames for a animation, feel free to open an issue!

"Benchmarking"

In this project, "Benchmarking" can refer to three different things: the op-benchmark CLI tool, internal benchmarks for the package, and external benchmarks for comparing against other embedding options.

op-benchmark for Benchmarking Notebooks

op-benchmark is a CLI tool bundled with observable-prerender that measures how long every cell's execution time for a given notebook. It's meant to be used by anyone to test their own notebooks, and is part of the observable-prerender suite of tools.

Internal Benchmarking

/benchmark-internal is a series of tests performed against observable-prerender to ensure observable-prerender runs as fast as possible, and that new changes to drastically effect the performace of the tool. This is meant to be used by observable-prerender developers, not by users of the observable-prerender tool.

External Benchmarking

/benchmark-external contains serveral tests to compare observable-prerender with other Observable notebook embeding options. A common use-case for observable-prerender is to pre-render Observable notebooks for faster performance for end users, so these tests are to ensure and measure how much faster observable-prerender actually is. This is meant for observable-prerender developers, not for general users.

observable-prerender's People

Contributors

asg017 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

observable-prerender's Issues

Bug in `notebook.redefine` when trying examples/barchart.js

Hi,

When trying out the package (which I'm very excited about), I come across an error when I execute

node examples/barchart.js

The error message being

13:29:28 [email protected] observable-prerender main node examples/barchart.js 
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

Error: Evaluation failed: RuntimeError
    at ExecutionContext._evaluateInternal (/Users/sweaver/Programming/observablehq-deno/observable-prerender/node_modules/puppeteer/lib/cjs/common/ExecutionContext.js:217:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async ExecutionContext.evaluate (/Users/sweaver/Programming/observablehq-deno/observable-prerender/node_modules/puppeteer/lib/cjs/common/ExecutionContext.js:106:16)
    at async Notebook.redefine (/Users/sweaver/Programming/observablehq-deno/observable-prerender/src/index.js:219:7)
    at async /Users/sweaver/Programming/observablehq-deno/observable-prerender/examples/barchart.js:21:3
  -- ASYNC --
    at ExecutionContext.<anonymous> (/Users/sweaver/Programming/observablehq-deno/observable-prerender/node_modules/puppeteer/lib/cjs/common/helper.js:109:19)
    at DOMWorld.evaluate (/Users/sweaver/Programming/observablehq-deno/observable-prerender/node_modules/puppeteer/lib/cjs/common/DOMWorld.js:84:24)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
  -- ASYNC --
    at Frame.<anonymous> (/Users/sweaver/Programming/observablehq-deno/observable-prerender/node_modules/puppeteer/lib/cjs/common/helper.js:109:19)
    at Page.evaluate (/Users/sweaver/Programming/observablehq-deno/observable-prerender/node_modules/puppeteer/lib/cjs/common/Page.js:883:14)
    at Page.<anonymous> (/Users/sweaver/Programming/observablehq-deno/observable-prerender/node_modules/puppeteer/lib/cjs/common/helper.js:110:27)
    at Notebook.redefine (/Users/sweaver/Programming/observablehq-deno/observable-prerender/src/index.js:219:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /Users/sweaver/Programming/observablehq-deno/observable-prerender/examples/barchart.js:21:3

Node.js v18.2.0

Best,
Steven

Error Handling

Catch and give clearer error messages when:

  • Given notebook does not exist/isnt accessible.
  • A given cell does not exist.
  • An operation/ .value call returns an error.
  • .svg() a cell with no SVG element

Notebook Cell Element Selector `notebook.$("cell")`

Would return an ElementHandle of the cell's container, to allow ppl to screenshot the element directly if they want (or anything else really).

// TODO Error handling
async def $(cellName) {
  return this.page.$('#notebook-${serializeCellName(cellName)}`);
}

Mainly bc if optimizations are done for "screenshoting" canvas cells, I'd want to give ppl the option to still screenshot manually if they wanted to. Also cuz it'll help as an internal method caller

observable-prerender-animate Progress Bar

Logs currently look like this

[139/257] Waiting for [ 'update' ]
[139/257] Screenshotting mapSVG
[140/257] Redefining
[140/257] Waiting for [ 'update' ]
[140/257] Screenshotting mapSVG
[141/257] Redefining
[141/257] Waiting for [ 'update' ]
[141/257] Screenshotting mapSVG
[142/257] Redefining
[142/257] Waiting for [ 'update' ]
[142/257] Screenshotting mapSVG
[143/257] Redefining
[143/257] Waiting for [ 'update' ]
[143/257] Screenshotting mapSVG
[144/257] Redefining
[144/257] Waiting for [ 'update' ]
[144/257] Screenshotting mapSVG
[145/257] Redefining
[145/257] Waiting for [ 'update' ]

Something like this would be pleasant:

[=========>                        ]  141/257 - 55%  - Waiting for ['update'] 
[=========>                        ]  141/257 - 55%  - Screenshotting mapSVG
[=========>                        ]  142/257 - 56%  - Waiting for ['update'] 

dont do it if it's too complex or if the progress bar dep is too large.

CLI changes:

op-animate @asg017/sunrise-and-sunset-worldwide graphic \
    --iter-waitfor controller \
    --iter-index \
    --iter timeI:times \
    --out-dir $OUTDIR/frames 

Actually, let's just have animate default to progress bar, and opt-in --verbose that prints progress line-by-line. Since you would print animate results to stdout.... right?

Feature request/idea - server compute...

I am now very curious about whether you can imagine using this to offload computing to a server instead of your own computer... since it is headless Chrome, conceivably, this could be a way to have a notebook run remotely...
Just a thought.

Maybe change puppeteer->playwright

Came across Playwright today, seems pretty cool. The API is very similar to puppeteer so it wont be too big of a change. Here's why I think paywright could be nice:

  1. Be able to define a chromium, firefox, or webkit browser
  2. Change geolocation, permissions of the browser
  3. Chose the type of device to emulate (iphones, android, etc.)
  4. There's some capability for video recording.

Reasons to keep puppeteer only:

  1. playwright is much larger. There is a playwright-core, but ppl would still have to download playwright and it downloads 3 browsers vs puppeteer's 1
  2. The video recording (for now) records the entire screen and it seems like there's not many options to control like a screenshot.
  3. Only having to worry about 1 browser instead of 3 would be less of a headache. But it's not like this library is a headache to maintain

Print a notebook

How print the notebook?

For example, I have a notebook with one button, and when the user clicks on the button I would like to print the whole page.
Not the code, on the outputs.

Do you have any example?

Automation with Github Actions

Hi Alex,

Thanks for a great package. Don't know if you've looked into this yet, but a natural step for me is to automate. It is actually pretty straightforward to use your package with Github Actions. I created a couple of examples. It works really well in practice and comes with additional benefits over running locally.

Michael

"Screenshot" canvas cells faster

notebook.screenshot uses ElementHandle.screenshot under the hood, which is slow but necessary is many cases. But, if the cell that want's to be screenshot is already a canvas, then calling .toDataURL() and passing that base64 string back to node would be much faster, I think.

So new screenshot algo would be:

  1. Get the cell's element handle
  2. If it is a canvas element, then get the data url, pass back to node, then covert to buffer/write as a png/jpeg/whatever manually.
  3. Else use ElementHandle.screenshot and suffer

To be sure, need to benchmark how fast .screenshot is currently, and how much faster .toDataURL() would be. Might as well throw .svg() and .html() benchmarking in here too to compare.

Proposal: `.render()`

  • render(cell, type, options)
    • cell - name of the cell to render.
    • type - one of text, html, svg, png, jpeg, jpg
    • options - object with config for whichever method is picked
  1. Simplicity, pass in args from CLI, documentation would be cleaner, examples easier to follow
  2. Can do optimizations under the hood (e.g. canvas toDataURL instead of screenshot

v1 Roadmap

  • Set up automated testing for this repo. Getting puppeteer to work on Github Actions is pretty painful, so I just resorted to local tests. PRs welcome!
  • Move puppeteer to a npm peerDepedency, so greatly reduce this package's size.
  • There might be some bugs/unexpected behavior when loading an entire notebook (as opposed to just a portion).
  • Support define functions in load. That way, people could load local notebooks instead of relying on observablehq.com

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.