Git Product home page Git Product logo

mork's Introduction

MeTTa Optimal Reduction Kernel [WIP]

A blazing fast hypergraph processing kernel for Hyperon

MORK seeks to retrofit Hyperon with a state-of-the-art graph database and a specialized zipper-based multi-threaded virtual machine to provide speedy MeTTa evaluation across the full range of Space sizes and topologies.

By rearchitecting certain Hyperon bottlenecks, MORK has the potential to accelerate important use cases by thousands to millions of times. That kind of speedup represents a qualitative jump in capabilities. It's the difference between running a training step vs. finishing the training in the same amount of time. It's the difference between a thousand input samples vs. millions, or a crocodile's brain vs. a human's. Deep learning has advanced due in part to the software platforms that exposed the full capabilities of underlying hardware, and we hope Hyperon + MORK can help do that for symbolic AI.

Roadmap

Deliverable 1 - Graph database with efficient queries

Deliverable 1 is a set of data structures for a fast, scalable and memory-efficient graph database that runs on a single machine.

  • Expression queries
    Support for querying by any structured key type, e.g. expression structure, etc. (See Triemaps that match)

  • Efficient space-wide operations
    Full relational algebra support. I.e. union, intersection, and set subtraction ops across entire spaces, performed using lazy or constant-time operations.

  • Matching and unification
    Support for bidirectional pattern matching and unification of S-expressions with spaces by translating slow declarative "shape matching" queries into native queries

  • Billions of atoms
    A high end workstation should be able to load, query, and transform a space with over a billion atoms without running out of memory or hanging for hours

  • JSON interop
    Load data from json and querying using (a subset of) JSONPath

Tasks:

  • Definition of Expr type structures to represent S-expressions efficiently in memory with high cache locality and searchability

  • Derivation of a triemap over a range of algebraic data structures, (excluding dependent typing). See Multiplate

  • Fast and memory-efficient S-expression triemap to support MeTTa atoms

  • Union, intersection, and subtract ops for S-expression triemap (and other derived triemap types) implemented with algorithms that have optimal scaling properties (based on current SOTA)

  • Fast primitive triemap implementing the space-wide ops (union, intersection, and subtract), based on (Ring of Sets)

  • Automatic translation of slow declarative "shape matching" queries into more classical database queries, to support bidirectional pattern matching and unification

  • Streaming JSON parser, and JSONPath query engine

  • Take first place (or at least crack the top 3) in the One Billion Row Challenge

Deliverable 2 - Multi-threaded zipper machine

Deliverable 2 is a model of computation and a multi-threaded virtual machine to run it, utilizing the data structures from deliverable 1.

  • "Zipper Abstract Machine"
    Inspired by Prolog's Warren Abstract Machine, exposes spaces as native computational primitives and Zippers as cursors to encapsulate runtime state

  • Logical inference at scale
    Decompose inference steps for parallel execution and an efficient fast-path for light-weight high-frequency inference

  • Basic inference control
    Expands upon MeTTa's support for nondeterminism with machine-level operations for prioritization and pruning of exploration and evaluation

  • Near-linear scaling with cores
    Fearless concurrency based on isolation guarantees to allow scaling without hitting scaling walls caused by central bottlenecks

Tasks:

  • Define (mathematically) the model of computation and the specification of the ZAM behavior

  • Implement the ZAM in Rust leveraging an existing framework

  • Benchmark and optimize towards maximizing serial-inference-operations-per-second

  • Benchmark and optimize towards parallel decomposed operation throughput

  • Saturate the compute on a 128 core x86 test machine, with each incremental core delivering concomitant incremental performance for a use-case / workload with enough concurrency

Deliverable 3 - MeTTa language support

Deliverable 3 implements a fully-functional interpreter of the high-level MeTTa language on top of the ZAM from deliverable 2, with support for native grounded operations and primitives.

  • ZAM based interpreter for logical inference
    Interprets code to perform the full range of tasks and use cases that MeTTa can currently execute, although some adaptation of the MeTTa code may be required

  • Enable complex programs
    Supports enough functionality to enabling long-lived and general-purpose codebases that live along-side or within massive datasets

  • Grounded Interfaces
    Provides API hooks to extend MeTTa functionality with native operations, while taking care to preserve the performance characteristics of the triemap primitives as much as is possible using bidirectional transformation endpoints

  • Integrated Inference Control
    Augments space-wide operations with sampling and enriching strategies using the inference control mechanisms built on the ZAM.

Tasks:

  • Map each operation in minimal MeTTa into ZAM operations

  • Define API entry points to express grounded operations, and permit them to be inverted into queries

  • Implement fallback paths to materialize transformed spaces when grounded operations cannot be inverted, and develop graceful strategies to prevent these fallbacks from compromising the usability of the system overall

Deliverable 4 - Adapt hyperon clients to use MORK

Deliverable 4 ports and adapts the MORK kernel into the hyperon-experimental project, exposing a rich set of features and interface bindings for end-users.

  • Python integration
    Update hyperon-experimental’s Python bindings to allow the MeTTa interpreter to be embedded inside Python, and allow the use of Python to define native grounded objects

  • C API
    Run MeTTa from C/C++, and use C/C++ to implement native grounded objects

  • WASM API
    Integrate a bridge to allow grounded objects to be implemented using WASM.

  • Complete MeTTa stdlib support
    Arithmetic and string operations, etc.

  • Modules
    Isolated and composable units of functionality

  • Package management
    Load packages implemented with any combination of MeTTa, Python, or WASM from disk, git, or a central repository

General desiderata

  • Software deliverables built to target WebAssembly as well as native (Linux + Mac) x (x86-64 + AArch64) platform targets

  • NoCopy binary formats where possible, to allow fast loading of large files using mmap, and sharing across virtual-machine boundaries without a serialize-deserialize translation cost

Possible future directions for MORK, beyond the 4 deliverables

Currently we don't have these items as specific deliverables, but these are directions in which we may opt to continue the work.

  • Native compilation of MeTTa to machine code

  • Support for specialized many-core architectures and/or accelerators

  • Distribution across multiple machines on a network

Additional reference

  • MeTTa
    Official resources for the MeTTa language

  • Atomspace
    An existing hypergraph processing system that served as a progenitor Hyperon and MeTTa

  • Hyperon Experimental
    The most feature-complete implementation of the MeTTa language

  • Triemaps that match
    A paper from Simon Peyton Jones describing one of the foundational patterns MORK will use

  • CZ2
    Some experiments in Scala demonstrating the scaling characteristics of the triemap structures

  • Interacting Trie-Maps
    An implementation of triemap interactions in Scala, serving as a proof-of-concept for ZAM and concurrency model

  • Triemap derivation for Rust
    A crate exposing a macro to derive a triemap over a Rust enum or struct type

mork's People

Contributors

adam-vandervorst avatar clarkeremy avatar luketpeterson avatar

Stargazers

Hogan Kangas avatar Pascal Gula avatar Andrew Meier avatar Vitaly Bogdanov avatar Helder S Ribeiro avatar  avatar

Watchers

Lucius Meredith avatar  avatar Alexey Potapov avatar  avatar Helder S Ribeiro avatar

mork's Issues

S-Expression representation

While we hope S-expressions are going to be rare on themselves in the kernel, they will exist, and when they do, they're likely to be the bottleneck.
Therefore, let's not be naive about implementing them.
The representation of variables is (de Bruijn indices/levels VS scoped IDs VS globally unique IDs) orthogonal and will have its own issue.
The question about left or right associative representation is irrelevant to this discussion.

Implementation options

Traditional

pub enum Expr {
    Var(usize),
    App(Box<Expr>, Box<Expr>),
}

What you'd expect modulo the pointer type chosen to be Box here.

Prolog-style

(= (parent $y Bob) $rhs) is [2 =; 2 parent; 0 $y; 0 Bob; 0 $rhs]
Assuming arities are disjoint from vars, and symbols and vars are interchangeable:
(a (b c) (d e f g h) i)
becomes
[4 a 2 b c 5 d e f g h 1 i]
yielding Expr = *u64.

This maintains the nice property that subexpressions are fractal (have the same type and are free).
It does not need allocations or pointer indirections.
E.g. The (b c) subexpression is just the base pointer plus two, and the word the pointer points to tells you how many words to read: [2 b c].

Dyck word

Something with format [arity, dyck-word, *vars], e.g. [5, 0b00100111, =, parent, $y, Bob, $rhs]
Visually:

             .
        0  /   \
         / \ 1   \
    0  / 0 / \     \ 1
     /    .    \ 1   \
   /  0 /   \ 1  \     \
((= ((parent $y) Bob)) $rhs)

where 00100111 is the Dyck word (a.k.a. edge walk) of the above tree.

The fractal relation is not as clear here, but a subexpression is a subword of the Dyck word together with a subarray of the full expression.

Hadamard

If we want constant-time access of an array element and fast pattern matching, we can go a step further:
The hadamard code of the expression + indication of variables/symbols: [treemask, varmask, (vars - symbols)*, symbols*], e.g.
[0b0000011111111111, 0b111110110000000, $t, $rhs, =, parent, Bob]
Visually:

   
-   -   -   -   -   -   -   -   + V + V + V + V + V + V + V +   
                                               $rhs
- S - S - S - S +   +   +   +   
       =
                -   -   + S + S 
                           Bob
                - S + V 
              parent  $t

The fractal relation is unclear here, but it should be possible to parallel bit extract the masks and gather/compress the right words to efficiently build a subexpression.

Parallel Vector

TODO

Notes

Experimentation will lead the way here, and conversions between these formats are a great standalone task!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.