parkerziegler / reviz Goto Github PK

View Code? Open in Web Editor NEW

67.0 2.0 0.0 8.53 MB

A lightweight engine for generating partial Observable Plot programs from SVG subtrees.

Home Page: https://reviz.vercel.app

License: MIT License

JavaScript 0.92% TypeScript 82.23% TeX 15.05% HTML 0.32% CSS 1.48%

observable typescript data-visualization reverse-engineering observable-plot

reviz's Introduction

A lightweight engine for reverse engineering data visualizations from the DOM

reviz is a lightweight engine for reverse engineering data visualizations from the DOM. Its core goal is to assist in rapid visualization sketching and prototyping by automatically generating partial programs written using Observable Plot from input svg subtrees. reviz can be used directly through the core library, @reviz/compiler, or the Chrome extension, @reviz/extension.

For a hands-on, interactive introduction to reviz, check out the Hello reviz! notebook on Observable.

To get familiar with the various packages in this codebase, check out their respective READMEs.

@reviz/compiler – The core library and compiler.
@reviz/examples – The examples site.
@reviz/extension – The Chrome extension.
@reviz/ui – Shared UI components used across the reviz ecosystem.

Installation

Compiler

npm install @reviz/compiler

Extension

🚧 Under Construction 🚧

API

The reviz API is very small; in fact, it consists of only a single function, analyzeVisualization!

import { analyzeVisualization } from '@reviz/compiler';

const viz = document.querySelector('#my-viz');

const { spec, program } = analyzeVisualization(viz);

`analyzeVisualization`

export declare const analyzeVisualization: (root: SVGSVGElement) => {
  spec: VizSpec;
  program: string;
};

analyzeVisualization is a function that takes in an svg Element as input and returns an Object containing two properties, spec and program.

spec refers to the intermediate representation used by reviz to generate partial Observable Plot programs. It encodes semantic information about the input svg subtree, including its inferred visualization type, geometric attributes of its marks (either circle or rect elements), and presentational attributes of its marks. reviz's architecture mimics that of a traditional compiler, with spec acting as the intermediate representation (IR). It can be useful to examine spec to see whether or not reviz has inferred the correct visualization type for your input svg subtree.

program refers to the partial Observable Plot program that reviz generates. These programs are intentionally incomplete and contain "holes" represented by the string '??'. The presence of a hole indicates that the value for a particular attribute (e.g. the r attribute of a bubble chart or the fill attribute of a stacked bar chart) should be mapped to a column in a user's input dataset rather than kept static across all data elements. After filling in holes with column names from your input dataset, you'll have a complete visualization program ready to run in the browser!

By Example

Let's look at an example to see how reviz works in practice. We'll use this visualization from the New York Times:

A scatterplot visualization from the New York Times

If we point reviz at the root svg Element of this visualization, it generates the following (partial) program:

Plot.plot({
  color: {
    scale: 'categorical',
    range: ['#C67371', '#ccc', '#709DDE', '#A7B9D3', '#C23734'],
  },
  marks: [
    Plot.dot(data, {
      fill: '??',
      stroke: '??',
      fillOpacity: 0.8,
      strokeOpacity: 1,
      strokeWidth: 1,
      x: '??',
      y: '??',
      r: 7,
    }),
  ],
});

Notice that fill, stroke, x and y are all inferred to be holes (indicated by'??') that must be mapped to columns of an input dataset. Conversely, attributes like fillOpacity and strokeWidth are automatically inferred because they are found to be consistent across all mark elements. We can also see that reviz has inferred that the visualization is using a categorical color scale and automatically configures the scale for us.

We can now apply this partial program to a new dataset. Let's use this delightful dataset about penguins from @Fil's Plot Exploration notebook. We can choose input columns from this dataset to "fill in" the holes like so:

Plot.plot({
  color: {
    scale: 'categorical',
    range: ['#C67371', '#ccc', '#709DDE', '#A7B9D3', '#C23734'],
  },
  marks: [
    Plot.dot(data, {
-     fill: '??',
+     fill: 'island',
-     stroke: '??',
+     stroke: 'island',
      fillOpacity: 0.8,
      strokeOpacity: 1,
      strokeWidth: 1,
-     x: '??',
+     x: 'flipper_length_mm',
-     y: '??',
+     y: 'body_mass_g',
      r: 7,
    }),
  ],
});

The result that we get is a new visualization that takes the appearance of the original New York Times piece and applies it to our data.

A scatterplot visualization of penguins.

In this way, reviz allows end users to quickly experiment with seeing their data in the form of a visualization they encounter anywhere in the wild.

To see more examples of the partial programs reviz generates, check out our examples site. To understand how reviz works at a deeper level, consider reading our paper.

Supported Visualization Types

reviz is restricted to only work on a small subset of visualization types. We hope to extend reviz to include more visualization types in the future.

Visualization Type	Description
Bar Chart	Old trusty. The bar chart represents data values using the height of each `rect` mark. The data values mapped to the x-axis must be discrete, not continuous.
Bubble Chart	The bubble chart is similar to the scatterplot, with the radius of each `circle` mark mapped to the square root of a data value.
Histogram	Similar to a bar chart, but the data values mapped to the x-axis must be continuous, not discrete. Histograms are typically used to visualize distributions in a dataset.
Scatterplot	The scatterplot places `circle` marks in an x-y coordinate plane, often to show a correlation between two variables.
Stacked Bar Chart	A dressed up version of the bar chart in which subcategories of data can be compared across groups.
Strip Plot	Many rows of `circle` marks are placed on the same continous scale to visualize distributions in a dataset.

reviz's People

Contributors

Stargazers

Watchers

reviz's Issues

extension: Separate UI into Analyze and Visualize tabs.

A core part of reviz's premise is that users can not only analyze existing visualizations but also generate new visualizations by filling program holes with column names. Given this vision, we want to separate the extension UI into separate Analyze and Visualize tabs. The current content of the extension will fall into the Analyze tab. The Visualize tab will host, at a minimum, a Code Mirror editor for the partial program and an output space to render the new visualization.

For this issue, the scope is to:

Use @radix-ui/react-tabs to split the UI into separate Analyze and Visualize tabs.
Add the scaffolding for the Code Mirror editor filled with the partial program.

feat: Execute program on ⌘ / ^ S and ⌘ / ^ Enter.

To speed up program execution in the Visualize tab, we'll want to add keyboard event handlers for executing the program. In particular, we'll want to add ⌘ / ^ S to match a traditional "save" reflex and ⌘ / ^ Enter to match behavior on Observable. To do this, we'll want to wire up event handlers to the CodeMirror editor using the static EditorView.domEventHandlers API.

Add event listener for ⌘ / ^ S and ⌘ / ^ Enter using EditorView.domEventHandlers API.
Display the proper altering key (either ⌘ or ^) depending on the OS detected in navigator.userAgent.

feat: Render data in a grid on upload.

In #9, we added support allowing users to upload data to the extension. Now, we need to make that data visible. To do so, we'll introduce a new DataGrid component. For the first crack at implementation, we'll use @tanstack/react-table. The core requirements of the table are:

Handle both CSV and JSON.
- Handle CSV with and without header rows.
- Handle JSON parsed as an Object rather than an Array.
Virtualized rendering to accommodate (potentially) very large datasets.
Sorting by columns.

We'll defer any work on filtering until we have more evidence that it's a core need.

fix: Improve rendering of program output to fit window dimensions.

Currently, the rendering of program output is a bit wonky. For example, notice below how the visualization doesn't expand to fill the available width and is far too tall for the container.

Fortunately, Plot allows us to specify manual widths and heights of plots. We can take advantage of this by passing along the clientWidth and clientHeight properties of the containing DOM node to the invisible iframe where the plot is rendered. This should ensure we get nicely-sized plots for the program output.

In addition to the above, the following improvements should be in scope for this issue:

Add an event listener to trigger program runs on Shift + Enter, Cmd + S, and Ctrl + S.
Show a program as "stale" by lighting up its Execute button (á la Observable).
- Addressed in #19.
Debug and address any issues around tick marks; we're currently rendering far too many tick marks.

feat: Render program output.

As part of the Visualize portion of the extension interface, we need to render the output generated by filling holes in the partial program. Doing this has some quirks in the context of an extension because we can't use the Function constructor to evaluate code; it violates the unsafe-eval portion of the extension's Content Security Policy. As an alternative, we'll use a "sandbox" page to run the user's program against their uploaded data in an isolated environment. Finally, we'll send the serialized HTML of the resulting visualization back to the extension's DevTools panel for rendering.

The core parts of the implementation include:

Adding a "sandbox" page to the extension.
Rendering this "sandbox" page in an invisible iframe.
Sending the filled program and data to the sandbox page and executing the program. This will involve use of iframe.contentWindow.postMessage.
Sending the resulting serialized HTML back to the extension main thread, again using postMessage.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

chore(deps): update dependency @types/chrome to ^0.0.266
chore(deps): update dependency autoprefixer to v10.4.19
chore(deps): update dependency prettier-plugin-tailwindcss to v0.5.14
chore(deps): update dependency tailwindcss to v3.4.3
chore(deps): update dependency typescript to v5.4.5
chore(deps): update dependency eslint-config-next to v14.2.3
chore(deps): update dependency eslint to v9
chore(deps): update dependency eslint-plugin-jest to v28
🔐 Create all rate-limited PRs at once 🔐

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

chore(deps): update dependency @types/node to v20

Detected dependencies

npm

package.json

@typescript-eslint/eslint-plugin ^7.0.0

@typescript-eslint/parser ^7.0.0

eslint ^8.57.0

eslint-config-next ^14.0.0

eslint-config-prettier ^9.0.0

eslint-plugin-import ^2.27.5

eslint-plugin-jest ^27.9.0

lerna ^8.0.0

prettier ^3.2.5

prettier-plugin-tailwindcss ^0.5.11

yarn 1.22.21

packages/compiler/package.json

lodash.camelcase ^4.3.0

lodash.groupby ^4.6.0

lodash.orderby ^4.6.0

@observablehq/plot ^0.6.9

@sucrase/jest-plugin ^3.0.0

@types/jest ^29.5.12

@types/lodash.camelcase ^4.3.7

@types/lodash.chunk ^4.2.7

@types/lodash.groupby ^4.6.7

@types/lodash.orderby ^4.6.7

@types/node ^18.16.1

esbuild ^0.20.0

jest ^29.7.0

jest-environment-jsdom ^29.7.0

lodash.chunk ^4.2.0

rimraf ^5.0.1

typescript ^5.1.6

packages/extension/package.json

@codemirror/lang-javascript ^6.1.9

@codemirror/language ^6.10.1

@codemirror/view ^6.25.1

@lezer/highlight ^1.2.0

@observablehq/plot ^0.6.9

@radix-ui/react-tabs ^1.0.4

@radix-ui/react-tooltip ^1.0.6

@tanstack/react-table ^8.9.3

classnames ^2.3.2

codemirror ^6.0.1

d3-dsv ^3.0.1

prettier ^3.2.5

react ^18.2.0

react-dom ^18.2.0

@types/chrome ^0.0.263

@types/d3-dsv ^3.0.1

@types/react ^18.2.62

@types/react-dom ^18.2.19

@vitejs/plugin-react ^4.0.1

autoprefixer ^10.4.14

postcss ^8.4.24

tailwindcss ^3.3.2

typescript ^5.1.6

vite ^5.0.0

packages/ui/package.json

classnames ^2.3.2

prism-react-renderer ^2.0.6

@types/react ^18.2.62

@types/react-dom ^18.2.19

esbuild ^0.20.0

rimraf ^5.0.1

typescript ^5.1.6

react ^18.2.0

react-dom ^18.2.0

Check this box to trigger a request for Renovate to run again on this repository

infra: Migrate to a monorepo using lerna.

With the addition of the Chrome extension for reviz, we now have three separate repositories—reviz itself, the reviz docs, and the Chrome extension. Moreover, both the docs and the Chrome extension rely on reviz and should ideally be published with them directly. Given these interdependencies, it makes sense to shift to a monorepo.

We'll use lerna as our monorepo management tool. lerna is actively maintained and has a long history as the primary tool for monorepo development in JavaScript.

For the scope of this issue, we should accomplish the following:

Split the primary repo, docs, and extension into separate packages.
Ensure docs and extension can reference the core package, which we'll name something like core.
Configure CI to run reviz unit tests, deploy docs, and build the extension.
Centralize linting configuration.
Look for opportunities to share React components across the docs and extension.