harvardpl / formulog Goto Github PK
View Code? Open in Web Editor NEWDatalog with support for SMT queries and first-order functional programming
Home Page: https://harvardpl.github.io/formulog/
License: Apache License 2.0
Datalog with support for SMT queries and first-order functional programming
Home Page: https://harvardpl.github.io/formulog/
License: Apache License 2.0
Sometimes the binary we generate exits with code 141, signaling that it has been terminated by signal 13 (trying to write to a closed pipe). In particular, this occurs in the interp-N
benchmarks for the symbolic executor. This seems to have been introduced by explicitly terminating Z3 child processes (#27).
From examples/liquid_types.flg, there appears to be an automatic (builtin?) type of list
, with pattern matching on []
and ::
operations. These are undocumented as far as I can tell.
Additionally, ::
appears to be an infix constructor, but there's no discussion of infix operators elsewhere (that I saw, maybe I missed it); is ::
unique or is there a language syntax to declare infix operators or constructors and define precedence and affinity?
What are the semantics of ::
? Is X :: Y :: ZList
a valid pattern to match the first two elements of a list
?
The 04_logical_formulas document section "Integers" describes a set of solver operations on types of int smt
, but 01_language_basics does not describe int
as one of the 5 built-in primitive types. Is there an int
primitive type, or are these somehow auto-converted to an i32
or i64
?
I just cloned formulog and tried to run mvn package
, and I see some test failures. Here is some of the output:
Running edu.harvard.seas.pl.formulog.eval.SemiNaiveEvaluationTest
line 9:8 token recognition error at: '").\n'
line 11:0 no viable alternative at input 'foo("\\"\noutput'
Tests run: 166, Failures: 22, Errors: 0, Skipped: 0, Time elapsed: 5.341 sec <<< FAILURE!
test099(edu.harvard.seas.pl.formulog.eval.SemiNaiveEvaluationTest) Time elapsed: 0.055 sec <<< FAILURE!
java.lang.AssertionError: Unexpected exception:
edu.harvard.seas.pl.formulog.eval.EvaluationException: Exception raised while evaluating the rule:
ok :-
none = get_model([_to_formula_normal_form(#x211), _to_formula_normal_form((~ #x211))], none),
get_model([_to_formula_normal_form(#x211), _to_formula_normal_form((~ #x254))], none) -> some($102),
query_model(#x211, $102) = some(true),
query_model(#x254, $102) = some(false),
query_model(#x289, $102) = none.
Exception raised while evaluating the literal: get_model([_to_formula_normal_form(#x211), _to_formula_normal_form((~ #x254))], none) -> some($102)
edu.harvard.seas.pl.formulog.eval.EvaluationException: Problem with solver 300:
Problem parsing solver output: Tried to consume "model", but found "(".
at edu.harvard.seas.pl.formulog.eval.RoundBasedStratumEvaluator$RulePrefixEvaluator.doTask(RoundBasedStratumEvaluator.java:365)
at edu.harvard.seas.pl.formulog.util.AbstractFJPTask.compute(AbstractFJPTask.java:40)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
I'm on macOS 10.14, and my Z3 version is 4.8.12. I saw the same failure with Java 8 and Java 11. If my config is off or I am missing a dep, please let me know. Thanks!
Right now, the Formulog version is hard-coded into the CLI, which is easy to forget to update (see #25). Instead, the version should be stored in the JAR (and maybe the git commit as well), and then we can load it programmatically from the CLI code. See:
This program causes the Formulog interpreter to crash:
rel foo(bool smt)
foo(`true`).
rel not_ok
not_ok :- foo(`#x[i32] #= _`).
rel ok
ok :- !not_ok.
The code for a destructor assumes that its term is a Constructor
object, whereas here it is a primitive (bool).
As part of trying to parse a string IPv4 address into its binary representation, I've written a string split utility function that takes an input string haystack
and the character to split on needle
and returns a list of strings. I started with:
fun stringSplit(haystack: string, needle: string) : string list =
let haystackChars = string_to_list(haystack) in
let needleChar :: _ = string_to_list(needle) in
stringSplit_(haystackChars, needleChar, [], [])
And originally I wrote the helper function stringSplit_
:
(* this one doesn't type check *)
fun stringSplit_(haystackChars: i32 list, needleChar: i32, currentGroup: i32 list, sofar: string list) : string list =
match haystackChars with
| [] => sofar
| current :: nil =>
let lastGroup = cons(currentGroup, current) in
let lastString = list_to_string(lastGroup) in
stringSplit_([], needleChar, currentGroup, cons(sofar, lastString))
| current :: rest =>
if current = needleChar
then
let part = list_to_string(currentGroup) in
stringSplit_(rest, needleChar, [], cons(sofar, part))
else
stringSplit_(rest, needleChar, cons(currentGroup, current), sofar)
end
But I'm getting a type-check error:
Exception: Problem type checking function: stringSplit_
Type error in function: stringSplit_
match haystackChars with
| [] => sofar
| [current] => match (currentGroup :: current) with
| lastGroup => match list_to_string(lastGroup) with
| lastString => stringSplit_([], needleChar, currentGroup, (sofar :: lastString))
end
end
| (current :: rest) => match beq(current, needleChar) with
| true => match list_to_string(currentGroup) with
| part => stringSplit_(rest, needleChar, [], (sofar :: part))
end
| false => stringSplit_(rest, needleChar, (currentGroup :: current), sofar)
end
end
Cannot unify string and string list list
Problematic term: lastString
Which is surprising to me. If I reverse the order of arguments in my calls to cons
I type check just fine but the lists are in the reverse order. stringSplit("11.2.3.4", ".")
outputs ["4", "3", "2", "11"]
. I can use the rev
function from one of the included examples:
fun rev(Xs: 'a list) : 'a list =
let fun cons_wrapper(Xs: 'a list, X: 'a) : 'a list = X :: Xs in
fold[cons_wrapper]([], Xs)
And then reverse the lists where appropriate:
(* this one works *)
fun stringSplit_(haystackChars: i32 list, needleChar: i32, currentGroup: i32 list, sofar: string list) : string list =
match haystackChars with
| [] => rev(sofar)
| current :: nil =>
let lastGroup = cons(current, currentGroup) in
let lastString = list_to_string(lastGroup) in
stringSplit_([], needleChar, currentGroup, cons(lastString, sofar))
| current :: rest =>
if current = needleChar
then
let part = list_to_string(rev(currentGroup)) in
stringSplit_(rest, needleChar, [], cons(part, sofar))
else
stringSplit_(rest, needleChar, cons(current, currentGroup), sofar)
end
My questions:
cons
was that it was adding an item to the end of the list. I'm guessing it actually pre-pends the item. Is that correct?stringSplit_
. This pattern matching took a while to figure out and I'm not thrilled with my solution of having both a current :: rest
block and a current :: nil
block. Is there a better (more concise, more idiomatic) way to do this pattern matching block?In 01-language-basics, in the section "External input relations", TSV should be defined, and the last paragraph contains a reference to "CSV" which should be "TSV". You may also want to emphasize somehow that the example document has tabs and not spaces.
We support different ways of interacting with the back-end SMT solver(s). These include strategies that reset the solver state between queries, as well as different approaches to caching queries to take advantage of incremental SMT solving. We should document these.
We should support character literals like 'a'
. At a minimum, we could restrict support to the front end, where they could be converted to i32
s.
Hi guys,
Consider the following program:
input a(i32).
input d(i32).
@bottomup
output b(i32).
@topdown
output c(i32, i32).
@topdown
output ppp(i32).
@topdown
output q(i32).
a(0). a(1). a(2).
d(0). d(2). d(4).
ppp(X) :- a(X).
b(X) :- d(X).
c(X, Y) :- b(X), ppp(X), b(Y).
q(X) :- c(X, X).
output ok.
ok :- q(2).
output e__( i32, i32)
output f__(i32, i32)
output g__(i32, i32) (* ------------ Output relation ------------ *)
output h__(i32)
output i__(string, i32)
e__(Var_b, Var_a) :- c(Var_a, Var_b).
f__(Var_d, Var_c) :- e__(_, Var_d), a(Var_c), !b(Var_c), ppp(_), e__(_, _).
f__(Var_b, Var_b) :- e__(Var_a, Var_b), !e__(Var_a, Var_b).
g__(Var_a, Var_c) :- d(Var_a), f__(_, Var_c).
h__(Var_g) :- f__(Var_g, Var_f), f__(Var_g, _), f__(Var_g, Var_g), q(_), h__(Var_f).
I get the following result for the relation g__
g__(0, 1)
g__(2, 1)
g__(4, 1)
If however I add a new rule f__(Var_a, Var_b) :- f__(Var_a, Var_b).
, the result for g__
should not change. But if I run the following new program:
input a(i32).
input d(i32).
@bottomup
output b(i32).
@topdown
output c(i32, i32).
@topdown
output ppp(i32).
@topdown
output q(i32).
a(0). a(1). a(2).
d(0). d(2). d(4).
ppp(X) :- a(X).
b(X) :- d(X).
c(X, Y) :- b(X), ppp(X), b(Y).
q(X) :- c(X, X).
output ok.
ok :- q(2).
output e__( i32, i32)
output f__(i32, i32)
output g__(i32, i32) (* ------------ Output relation ------------ *)
output h__(i32)
output i__(string, i32)
e__(Var_b, Var_a) :- c(Var_a, Var_b).
f__(Var_d, Var_c) :- e__(_, Var_d), a(Var_c), !b(Var_c), ppp(_), e__(_, _).
f__(Var_b, Var_b) :- e__(Var_a, Var_b), !e__(Var_a, Var_b).
f__(Var_a, Var_b) :- f__(Var_a, Var_b). (* ------------ New Rule ------------ *)
g__(Var_a, Var_c) :- d(Var_a), f__(_, Var_c).
h__(Var_g) :- f__(Var_g, Var_f), f__(Var_g, _), f__(Var_g, Var_g), q(_), h__(Var_f).
I get:
g__(0, 1)
g__(0, 2)
g__(2, 1)
g__(2, 2)
g__(4, 1)
g__(4, 2)
Formulog version I am using: 508c768
Operating System: Debian GNU/Linux 10 (buster)
Kernel: Linux 5.4.163.1.amd64-smp
I'd like to create user-defined types to represent IP addresses. Specifically, a bv[32]
to represent an IPv4 address and a bv[128]
to represent an IPv6 address. Since i32
is interchangeable with bv[32]
I'm currently just doing:
type ipv4address = i32
And then I can parse an IPv4 address from a string to a 32-bit integer:
fun parseIPv4Address(address: string) : ipv4address option = ...
rel ipv4addresses(string, ipv4address option)
ipv4addresses("11.2.3.4", parseIPv4Address("11.2.3.4")).
And when I run with --dump-all
I do see this relation:
ipv4addresses("11.2.3.4", some(184681220))
Where 184681220
is the decimal representation of that IPv4 address.
However, this won't work for IPv6 addresses since there is no corresponding native 128-bit type. Additionally, I'd like to extend this to represent CIDR ranges / subnetworks. For example, the CIDR 192.168.0.0/16
represents the range of IP addresses from 192.168.0.0
to 192.168.255.255
. I'd like to represent this as an SMT bit vector where the first 16-bits are fixed (has the values from the first two octets, 192
and 168
) and the last 16-bits are free. This representation makes it very easy to check if a (concrete) IP address is in the range of a CIDR. And I'll be using IP addresses and CIDRs to build out models of firewalls and routing tables.
So my questions are:
Currently there are 4 string
manipulation functions (plus to_string
), but inspection is only supported on the prefix or via regexp. There are no functions to extract from the string (presumably these would be i32
char values?) or convert to another form such as a list, so the utility of strings is significantly constrained.
Suggestions would be string_sub : string -> 'a -> 'b -> string
where 'a
and 'b
are integer forms to extract a sub-string given an offset and a length, string_char : string -> i32 -> i32
to extract a single character (from which string_sub
could be synthesized), or string_to_list : string -> i32 list
.
Running with many threads, some threads hang in a call to read
that happens during the constructor for boost::child::process
. Apparently there are a couple issues related to pipes and creating child processes in a multithreaded setting:
We could use a nice introduction to the language. Optimally, it would build up to an interesting example Formulog program (such as a symbolic evaluator for a simple input language).
Using is_sat
in the body of a rule doesn't use any framing, so even if the solver has discharged the sat determination, it remains in memory within the solver and can't be discarded even if it will no longer be used. When the rule is being used to process a large number of intensional facts, this can cause significant memory consumption overhead in the solver.
For example:
@external
input values(i32)
output solve(i32, i32, i32, i32)
solve(X, Y, Z, A) :-
values(X), values(Y), values(Z),
X != Y, X != Z, Y != Z,
2020 = X + Y + Z,
A = X * Y * Z.
This trivial program generates results for all triples that sum to 2020. For a values.tsv file with 1000 entries, this executes in Formulog on the order of 9 seconds on my system.
Changing the line
2020 = X + Y + Z,
to
is_sat(`2020 #= bv_add(bv_add(X, Y), Z)`),
can generate millions of SMT terms. While each of these is trivially sat/unsat by Z3, I would expect this to run for some lengthy period of time and produce the same answer as the Datalog-only version. Instead, this runs for tens of minutes with the Z3 process slowly accumulating more memory consumption before dying from memory exhaustion on my 32GB machine. Using SMT framing should allow previous results to be garbage collected and avoid this memory exhaustion.
As a convenience, allows users to specify options in a configuration file, instead of requiring them to list the options on the command line.
The built-in functions substitute
and is_free
have not proven to be useful in the case studies, and can be written in Formulog directly (i.e., don't require special runtime support), so we should consider removing them.
The Formulog-to-Souffle transpiler generates invalid query plans in the presence of the CODEGEN_PROJECT
relation, which it seems to treat as being in the same stratum as the predicate being defined.
$ cat foo.flg
rel a(i32)
rel b(i32, i32)
b(3, 42).
a(0).
a(1).
a(X + 2) :- a(X), a(X + 1), !b(X + 2, _).
$ java -Doptimize=5 -jar target/formulog-0.7.0-SNAPSHOT-jar-with-dependencies.jar foo.flg -c
$ cd codegen
$ cmake -B build -S .
...
$ cmake --build build
-- Configuring done
-- Generating done
-- Build files have been written to: /root/codegen/build
[ 6%] Generating formulog.cpp
Error: Invalid execution order in plan (expected 2 atoms, not 3) in file formulog.dl at line 30
.plan 0: (1,2,3), 1: (2,1,3)
---------^-------------------
Error: Invalid execution order in plan (expected 2 atoms, not 3) in file formulog.dl at line 30
.plan 0: (1,2,3), 1: (2,1,3)
---------------------^-------
2 errors generated, evaluation aborted
gmake[2]: *** [CMakeFiles/flg.dir/build.make:74: formulog.cpp] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:82: CMakeFiles/flg.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
The offending rule in the generated Souffle is
a_(@expr_5(X)) :-
a_(X),
a_(@expr_4(X)),
!CODEGEN_PROJECT_b_10(@expr_5(X)).
.plan 0: (1,2,3), 1: (2,1,3)
Right now, Formulog is parameterized by a mix of system properties (like -DdebugSmt
) and proper flags (like --dump-idb
). For the sake of consistency and usability, we should convert user-relevant system properties to flags.
This program causes the interpreter to behave incorrectly:
rel bar(i32)
rel foo(i32 smt)
rel baz
foo(`#x[?]`).
baz :-
bar(X),
foo(`X`),
X + 1 = 42.
bar(0).
(* Make sure `foo` is recursive in `ok` stratum, so that when `ok` rule is run,
`foo` predicate is reordered to be first *)
foo(`0`) :- baz.
foo(`#x0[?]`).
foo(`#x1[?]`).
foo(`#x2[?]`).
foo(`#x3[?]`).
foo(`#x4[?]`).
foo(`#x5[?]`).
foo(`#x6[?]`).
foo(`#x7[?]`).
foo(`#x8[?]`).
foo(`#x9[?]`).
foo(`#xa[?]`).
foo(`#xb[?]`).
foo(`#xc[?]`).
foo(`#xd[?]`).
foo(`#xe[?]`).
foo(`#xf[?]`).
foo(`41`).
bar(41).
fun len(xs: 'a list): i32 =
match xs with
| [] => 0
| _ :: t => 1 + len(t)
end
rel ok
ok :-
baz,
len(foo(??)) = 19.
Run in semi-naive evaluation mode, it derives only 18 foo
facts. Run in eager evaluation mode, it crashes in the baz
rule trying to add an SMT variable and a primitive integer.
The problem is that the interpreter currently assigns numeric IDs to solver variables independently of IDs to other terms, so that two different terms (a solver variable and something else) can have the same ID. This breaks assumptions made elsewhere in the interpreter.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.