Git Product home page Git Product logo

icql-dba-vnr-legacy's Introduction

ICQL-DBA Plugin: VNR

Table of Contents generated with DocToc

DEPRECATION NOTICE

This package has been deprecated; for more see

Documentation left here for Posterity

Notes (in lieu of docs)

  • Purpose

    • Add a JSON and a BLOB column to your table (called vnr and vnr_blob by default)

    • JSON encodes lists of integer numbers called Vectorial NumbeRs (VNRs)

    • BLOB is a generated, indexed column of a binary representation (provided by [hollerith-code tng])[https://github.com/loveencounterflow/hollerith-codec/blob/master/src/tng.coffee] of the VNR

    • the binary representation can be meaningfully sorted lexicographically by SQLite

    • VNRs support scenarios where you want to transform ordered items such that a given line (item, record) may result in any number of result lines (items, records) and keep the relative ordering

    • e.g. you store lines of a textfile as

      • { nr: 1, text: 'helo world', }
      • { nr: 2, text: 'fancy stuff here', }
      • { nr: 3, text: 'that\'s all', }
    • say you want to transform that to a one-word-per-line format in the same table; now you have

      • { nr: 1, text: 'helo world', }
      • { nr: ?, text: 'helo', }
      • { nr: ?, text: 'world', }
      • { nr: 2, text: 'fancy stuff here', }
      • { nr: ?, text: 'fancy', }
      • { nr: ?, text: 'stuff', }
      • { nr: ?, text: 'here', }
      • { nr: 3, text: 'that\'s all', }
      • { nr: ?, text: 'that's', }
      • { nr: ?, text: 'all', }
    • For that to work, you'd have to (1) mark the orginal records as processed (not covered here), and (2) find a way to enumerate the new lines such that they are correctly ordered.

    • What's more, wouldn't it be great if the enumeration could somehow preserve the origin of each new record?

    • We could just use consecutive numbers but then we'd have to renumber all entries whenever a single record got added somewhere in the middle. One could also use floating point numbers which gets messy soon.

    • Instead if we converted the nr field to a vector of numbers, we could do the following:

      • { vnr: [ 1, ], text: 'helo world', }
      • { vnr: [ 1, 1, ], text: 'helo', }
      • { vnr: [ 1, 2, ], text: 'world', }
      • { vnr: [ 2, ], text: 'fancy stuff here', }
      • { vnr: [ 2, 1, ], text: 'fancy', }
      • { vnr: [ 2, 2, ], text: 'stuff', }
      • { vnr: [ 2, 3, ], text: 'here', }
      • { vnr: [ 3, ], text: 'that\'s all', }
      • { vnr: [ 3, 1, ], text: 'that\'s', }
      • { vnr: [ 3, 2, ], text: 'all', }
    • As it stands, we can store the VNRs as JSON (that is, as texts as far as SQLite is concerned). However, that is not bound to give us the intended because the lexicographic sorting of numbers-as-texts is not numerically monotonous ('1' comes before '10' which comes before '2' etc.)

    • This is why we turn the JSON lists into Binary Large OBjects (BLOBs) that SQLite will order correctly out of the box.

    • We do this by adding a vnr_blob field:

      alter table myfiles
        add column vnr_blob blob generated always as ( vnr_encode( vnr ) ) virtual not null;
    • The vnr_blob field is generated but not stored, meaning it will be re-generated on each select that includes it. That sounds inefficient—and it would be, were it not for an index we also add:

      create unique index myfiles_vnr_blob_idx
        on myfiles ( vnr_encode( vnr ) );
    • issuing select * from myfiles order by vnr_blob; now gives us this display:

    ┌──────────────────┬───────┬────────────────────────────────────────────────────────────────────┐
    │text              │vnr    │vnr_blob                                                            │
    ├──────────────────┼───────┼────────────────────────────────────────────────────────────────────┤
    │helo world        │[1]    │<Buffer 80 00 00 01 80 00 00 00 80 00 00 00 80 00 00 00 80 00 00 00>│
    │helo              │[1,1]  │<Buffer 80 00 00 01 80 00 00 01 80 00 00 00 80 00 00 00 80 00 00 00>│
    │world             │[1,2]  │<Buffer 80 00 00 01 80 00 00 02 80 00 00 00 80 00 00 00 80 00 00 00>│
    │fancy stuff here  │[2]    │<Buffer 80 00 00 02 80 00 00 00 80 00 00 00 80 00 00 00 80 00 00 00>│
    │fancy             │[2,1]  │<Buffer 80 00 00 02 80 00 00 01 80 00 00 00 80 00 00 00 80 00 00 00>│
    │stuff             │[2,2]  │<Buffer 80 00 00 02 80 00 00 02 80 00 00 00 80 00 00 00 80 00 00 00>│
    │here              │[2,3]  │<Buffer 80 00 00 02 80 00 00 03 80 00 00 00 80 00 00 00 80 00 00 00>│
    │that's all        │[3]    │<Buffer 80 00 00 03 80 00 00 00 80 00 00 00 80 00 00 00 80 00 00 00>│
    │that's            │[3,1]  │<Buffer 80 00 00 03 80 00 00 01 80 00 00 00 80 00 00 00 80 00 00 00>│
    │all               │[3,2]  │<Buffer 80 00 00 03 80 00 00 02 80 00 00 00 80 00 00 00 80 00 00 00>│
    └──────────────────┴───────┴────────────────────────────────────────────────────────────────────┘
    
    • observe that all BLOBs in the above are of equal length. This is due to a restriction of the better-sqlite3 API which sadly is missing a way to define one's own collations / cmp functions
    • While a VNR's elements are (after adjustments) encoded with Buffer.writeUInt32BE() and so can each cover a range from -0x80000000 to +0x7fffffff (-2_147_483_648 to 2_147_483_647, meaning there's space for four billion of them),
    • the length of VNRs is restricted by the BLOBs having a finite constant length. In the above, we arranged for five places per VNR which may or may not be enough for a given use case.
  • Links

To Do

  • documentation
    • API
    • DB structure

icql-dba-vnr-legacy's People

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.