Git Product home page Git Product logo

vmware / differential-datalog Goto Github PK

View Code? Open in Web Editor NEW
1.3K 1.3K 117.0 301.87 MB

DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. A DDlog programmer does not write incremental algorithms; instead they specify the desired input-output mapping in a declarative manner.

License: MIT License

Haskell 24.65% Makefile 0.06% Rust 27.00% Shell 1.43% Python 2.53% C 2.88% Java 37.86% Yacc 0.91% HTML 0.15% Dockerfile 0.05% Go 1.42% Nix 0.07% JavaScript 0.01% CSS 0.03% TypeScript 0.88% Vim Script 0.06%
datalog ddlog incremental programming-language rust

differential-datalog's People

Contributors

amytai avatar antoninbas avatar argc0 avatar arkivm avatar blp avatar booxter avatar c-nixon avatar d-e-s-o avatar d4hines avatar danbst avatar debnil avatar dependabot[bot] avatar desharchana19 avatar falzberger avatar gfour avatar gz avatar justinpettit avatar kixiron avatar krs85 avatar lalithsuresh avatar lykahb avatar mooly-sagiv avatar paralax avatar price1999a avatar remysucre avatar ryzhyk avatar smadaminov avatar srenatus avatar tobindekorne avatar yjiayu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

differential-datalog's Issues

differential-dataflow optimizations

  • use distinct_total instead of distinct where possible.
  • don't convert external collections to Variables in the nested scope (variables use a lot of memory)

TODOs

Basic Datalog (Leonid)

  • Invoke user callbacks on output relations
  • Generate code for ground facts
  • Framework for (incremental) input/output testing of Datalog programs
  • More examples of interesting Datalog programs
    • span computation.
    • souffle examples.
  • Tutorial on writing and testing Datalog programs (so far, wrote a mini-tutorial on adding new tests, but need a more detailed one introducing language basics and workflow)
  • FFI
  • Performance/memory tests
    • Tools for datalog program memory profiling
  • Update main ddlog executable

FTL (Mihai)

  • Generate Datalog schema from JSON
  • Generate input and output adapters
  • Adapters for external C functions invoked from FTL
  • Test workload

Future improvements

  • Debugging facility
  • CLI interface to ddlog
  • RPC interface to ddlog

Cannot have antijoins with wildcards

Also, the error message is not very good.

error: file.dl:211:234-211:235: Argument _method must be specified in this context
 
RBasic_MethodLookup(_simplename, _descriptor, _type, _method) :- RDirectSuperclass(_type, _supertype), RBasic_MethodLookup(_simplename, _descriptor, _supertype, _method), not RBasic_MethodImplemented(_simplename, _descriptor, _type, _).

No symbol table?

Currently one can have a type that depends on an undefined identifier

typedef t = x

OVN integration TODOs

TODO list for DDlog/OVN integration:

  • generate DDlog schema from OVSDB schema.
    • This can be done mostly automatically from OVSDB schema files; however we will need some way to specify which OVSDB relations are inputs and which our outputs (can they be both?). Should we add those as annotations to the OVSDB schema or just specify them as command line arguments to the schema conversion tool?
  • port rules from northd.c to northd.dl
  • create data sets for standalone testing of northd.dl (i.e., outside of OVN)
  • generate runtime OVSDB input/output adapters for DDlog
  • create OVN infrastructure to test northd.dl against northd.c on a live system
  • integration testing
  • scale testing (and CPU/memory profiling)
  • suppress Rust warnings
  • catch up with latest OVN master
  • documentation
  • fix remaining test failures
  • test C and DDlog at the same time if DDlog is enabled in the build
  • clean up DDlog patches

add support for constants

At the moment, constants can be simulated by 0-arity functions. The main downside is the need to use parenthesis every time a constant is referenced. Also, function names start with lower-case letters, whereas it is customary to capitalize constant names.

Implement bit slice logic in Compile.hs

This is a little tricky, as we use different representations depending on the size of a bit vector (BitUint or uXX) depending on its width. Bit slicing may involve conversion between these representations

Type inference using unification

Implement a proper type inference algorithm based on unification. The current ad hoc implementation is hacky and fragile. No idea why I did not do the right thing straight away...

Automatic string conversion for type variables

At the moment, the compiler refuses to convert type variables to strings.

This can be fixed by:

  1. Infering that a function expects a type variable to be printable
  2. In the generated Rust function, add the Display trait for this function
  3. Whenever the function is called, check that its concrete arguments satisfy the trait
  4. Implement Display trait for types that have a user-defined 2String method

Optimizations

  • string interning
  • use arrangements for relations that are used as the first atom in multiple rules (a special case of common subexpression elimination)
  • use Box or Arc type for Value to save memory
  • avoid redundant cloning in the generated code
  • use per-key distinct when possible
  • extract "by-self" arrangement from distinct
  • 32-bit weights
  • use unreachable() instead of panic!() in Compile.hs (see TODO in Compile.hs)

https://github.com/frankmcsherry/differential-dataflow/issues/113

fix goldenVsFiles

goldenVsFiles behaves funny when more than one files has changed: it first complains about the first file only. Deleting the corresponding .expect file causes it to report that a new golden file has been generated (without an error). Finally, the third run reports a mismatch in the second golden file.

DDlog lints

  • Detect unused variables in rules and functions
  • Detect unused intermediate relations (i.e., relations that are neither output nor used to compute other relations).
  • Grouping by complex variables or variables wrapped in Ref<> is inefficient. Ideally, one should group by one or several small identifiers.
  • ???

Syntax improvements

  • Rename int -> bigint
  • rename ground relation -> input relation
  • add return statement

Incomplete implementation of assignments in Compile.hs

We currently don't have a way to compile arbitrary l-values to Rust, e.g.,

Constructor{var x, var y} = z

is ok, but

Constructor{x, y} = z

where x and y are previously introduced variables is illegal.

Either implement the logic to support this translation (e.g., by assigning to intermediate variables) or modify validation logic to disallow such programs

Test and document parsec's string parser

We rely on parsec's standard parser for strings, which supports unicode and escaping.

TODO:

  • check and document its exact functionality
  • can we handle all OVSDB strings?

mpsc interface to ValMap

ValMap is a data structure that stores the content of output tables. It is updated by each change callback invoked by the differential inspect() operator. ValMap is protected by a mutex, potentially introducing contention in workloads where outputs are frequently updated. One possible solution is to maintain ValMap in a separate thread and use mpsc queue to communicate with this thread from workers. We must be careful though to flush the queue before transaction_commit() returns. Another (even better?) option is to keep a ValMap per worker and only merge them when the client requests a copy of an output table.

foreign keys for more compact FTL

Ben's version of FTL allows traversing cross-table pointers, e.g.:

let O = lspip.lsp.dhcpv6_options.option_args

To support this without using nested for loops, we'd need to add some notion of foreign keys and possibly other SQL constraints to the language

Get rid of `let` syntax

We use let in FTL, and var in the rest of the language. There is not reason for these two forms of variable declaration.

Allow access to enum fields

The Rust backend currently does not allow access to fields of types with multiple constructors (i.e., types compiled to enums), even if all constructors of a type have the same field

The Rust code generated for split does not compile

The unmerged tutorial has an example which does not seem to generate correct Rust code.
This is the dl code:

extern function split(s: string, sep: string): set<string>
function split_ip_list(x: string): set<string> =
   split(x, " ")

The generated code does not compile:

fn split_ip_list(x: &String) -> set<String> {
    split(x, (&r###" "###.to_string()))
}

since there is no set type in Rust.

Test calling DDlog from Java

It should be possible to load a compiled DDlog program and execute transactions from Java.

This requires a couple of preparatory steps:

  • DDlog currently generates static libraries. We have to change the crate type to build .so instead.
  • DDlog still does not have a complete C API.

Background:

  • The generated library file is written to the target/release directory, e.g., tests/datalog_tests/path/target/release/libpath.a.
  • Associated header file is tests/datalog_tests/path/ddlog.h

support for multi-file specs

The most important use case is combining auto-generated schema (e.g., from OVSDB) with hand-written code. We may be able to get away with using CPP or m4, but must make sure that line numbers in error message refer to original input files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.