Git Product home page Git Product logo

Comments (3)

bmschmidt avatar bmschmidt commented on June 12, 2024

There's also some movement here I should mention.

Transformations on a working branch are allowed to return promises. Since duckdb-wasm queries are async, that means you could package an Arrow record batch on a tile, send it to the duckDB with arbitrary SQL, and then get a record batch back.

from deepscatter.

bmschmidt avatar bmschmidt commented on June 12, 2024

So this is a little more straightforward now, although the API is still in flux. Here's what the steps would be on this new branch.

  1. Define a transformation function that works on Tile objects with a signature (tile : Tile) => Promise<Float32Array>, where the array is the same length as the tile. For example, you could do:
[...]
scatterplot._root.transformations['has dog in it'] = async function(tile) {
  const output = new Float32Array(tile.record_batch.numRows)
  // the column being searched.
  const all_rows = tile.record_batch.getChild("full_text")
  const all_strings = all_rows.toArray(); // The reason to use duckdb is that this function, deserializing a lot of text from arrow UTF-8 to javascript UTF-16 strings, is *extremely* slow.
  let i = 0;'
  for (let string of all_strings) {
     if (string.match(/dog/) {
        output[i] = 1;// Store a match as a float.
     }
  }
  return output // This array will be attached to the record batch at render-time, lazily.
}

(It's allowed for the promise not to be async, which is what you'd want in this simple JS regex case. But with duckdb you would ship the array of full_text to the db, and then get back the results of a db query as arrow.)

And then actually plot it, which causes the transformation to be run on tiles as they're needed.

scatterplot.plotAPI({encoding: {
foreground: {
field: 'has dog in it',
op: 'eq',
a: 1
}}})

from deepscatter.

bmschmidt avatar bmschmidt commented on June 12, 2024

If anyone's following along at home, David and I got a prototype of this running over the weekend. https://observablehq.com/d/cae8e4a3a8b7d4db The hardest part turned out to be misalignment of Arrow versions among arrow-js, duckdb, and deepscatter.

from deepscatter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.