gbury / dolmen Goto Github PK

View Code? Open in Web Editor NEW

75.0 10.0 16.0 10.16 MB

Dolmen provides a library and a binary to parse, typecheck, and evaluate languages used in automated deduction

License: BSD 2-Clause "Simplified" License

Makefile 0.05% OCaml 84.27% SMT 13.25% OpenEdge ABL 2.30% Perl 0.13%

parse language functor smt-lib flexible-parsers tptp ocaml ocaml-library atp model

dolmen's People

Contributors

Stargazers

Watchers

Forkers

c-cube delisea mihasighi mudathirmahgoub quicquid bobot hra687261 tdacik 4txj7f smt-comp halbaroth kian-weimer bclement-ocp daniel-larraz modelchecker

dolmen's Issues

Let-bindings in smtlib and equality

While writing doc to resolve #1 , I happened upon an inconsistency between the smtlib and tptp parser.

In tptp, the syntax allows quantified equalities or equivalences in tff (see https://github.com/Gbury/dolmen/blob/master/src/languages/tptp/parseTptp.mly#L314) , and even arbitrary quantified formula in thf (see https://github.com/Gbury/dolmen/blob/master/src/languages/tptp/parseTptp.mly#L168). So typically, variable bindings in a let will have the form: App(Builtin(Eq), [Symbol x; bound_term]), or App(Builtin(Eq), [Colon(Symbol x, some_type); boud_term])
Smtlib is more restrictive and directly gives a list of pair of variables and terms. Currently this is parsed as Colon(Symbol x, bound_term).

It would be more uniform to use equalities for smtlib bindings, however this brings another problem: equality is not strictly part of the smtlib language (as of smtlib v2.5 at least), i.e. it --as well as usual logical connectives --

are just function symbols of the basic theory Core, implicitly included in all SMT-LIB theories

to quote the smtlib v2.5 manual. What this means is that an equality (= a b) will actually be parsed as: App(Symbol "=", [a; b]) rather than use the builtin equality symbol.

So, parsing bound variables as equalities (rather than colons) in smtlib would: 1) add a requirement to its term interface for an equality building function, 2) introduce in parsed AST from smtlib files, two distinct representation of equalities. Any thought ? cc @c-cube

Destructor for datatypes missing in environment

I believe that the destructor int_content, should be in the typing environment but it is not found in it.

(set-logic ALL)

(declare-datatypes ((int_ref 0))
(((mk_int_ref (int_content Int)))))

(define-fun int_ref_projection ((a int_ref)) Int (int_content a))

(check-sat)

I'm not able to quickly find where is the problem in the code.

Dolmen "stops" at the first error

For a file such as:

(set-logic QF_LIA)
(push 1)
(declare-fun a () Int)
(declare-fun b () Int)
(assert (= 0 (* 2 (+ a b))))
(pop 1)
(push 1)
(declare-fun c () Int)
(declare-fun d () Int)
(assert (= 0 (* 2 (+ c d))))
(pop 1)

Dolmen only reports on the first "error".

(I used push/pop to make it clear that these should be "recoverable" errors; the same problem happens without incremental solving)

Memory leaks during Model Checking?

When checking the model of this big file form the smt-lib, QF_BV/2019-Mann/ridecore-qf_bv-bug.smt2, with this model
output.zip dolmen takes more than 10Go of memory and increasing. Typing takes only a fraction of that. Are we keeping the intermediate values or environment unnecessarily?

Smtlib v3

The proposal is becoming more tangible http://smtlib.cs.uiowa.edu/version3.shtml. I believe it could give good feedback to try to implement it in dolmen. I can give a hand, but I would certainly need your help @Gbury to know the best way to encode namespace and filters for example. Do you think it is a difficult task?

Superformats of dimacs: qdimacs and dqdimacs

Those two formats are, if I recall correctly, supersets of dimacs and are used for QBF solving and DQBF solving, respectively.

The standard for qdimacs is here: http://www.qbflib.org/qdimacs.html Note that the standard forbids empty clauses and the empty matrix, some preprocessors, however, are happy to produce such non-standard files.

dqdimacs is even more specialized. It is described in a paper by Fröhlich et. al. (https://forsyte.at/wp-content/uploads/idq_pos2014.pdf). It is the format used in the DQBF track of QBFEVAL (http://www.qbflib.org/qbfeval20.php). This page also gives this paper as a reference.

smtlib parser: Unknown logic UF

Hello! Please consider the following smtlib script:

(set-logic UF)

(declare-const x Bool)

(assert (and x (not x)))
(check-sat)

Calling dolmen aua.smt2 leads to:

Error Unhandled exception: Dolmen_type.Base.Unknown_logic("UF")

I assume there are two issues here: dolmen exits due to the exception but the error message is hard to read. Also, UF is a valid smtlib logic. Using QF_UF instead works (but is only the quantifier-free fragment) and UFX gives the same exception. Could you have a look?

cheers, Martin

Vendor dependencies (optionally) to allow opam-less installation

cc @c-cube

SMT-LIB: quoted symbols ar treated as distinct symbols

In SMT-LIB symbols can be quoted by enclosing them in |, e.g., |A|. According to the SMT-LIB standard (3.1 Symbols Page 23) a quoted and an unquoted symbol is the same symbol:

Following Common Lisp’s conventions,enclosing a simple symbol in vertical bars does not produce a new symbol. This means for instance that abc and |abc| are the same symbol.

Dolmen, however, does not follow this convention and the following SMT-LIB problem produces an error:

(set-logic QF_UF)
(declare-sort S 0)
(declare-const |A| S)
(declare-fun f (S) Bool)
(assert (not (f A)))
(check-sat)

Dolmen's Typechecker is sometimes too smart for the SMTLIB

Consider the following problem:

(set-logic QF_UFLIA)
(declare-datatypes ((list 1)) ((par (alpha) ((nil) (cons (head (list alpha)))))))
(assert (= (as nil (list Int)) nil))
(check-sat)

Dolmen's tff typechecker is able to infer the type of the second nil because it knows the type of the first one, whereas cvc4 will complain. I don't know how much of a problem that is.

bv2nat is not recognized

With the latest version of dolmen installed through opam, as well as the released binary, I'm getting an error on use of bv2nat:

dolmen bv.smt2
File "./bv.smt2", line 7, character 11-24:
Error Unbound identifier: `bv2nat`

on the file:

(set-info :smt-lib-version 2.6)
(set-logic AUFBVDTNIRA)

(declare-const bvec (_ BitVec 32))
(declare-const ival Int)

(assert (= (bv2nat bvec) ival))

Is it expected? That might be an issue for submitting benchmarks to the SMT-COMP, as dolmen is used as a PR check. Also, I see that dolmen --version is returning 0.1, while I would have expected 0.6, so is that a setup issue somewhere on my side? TIA

Recursive definition of identifier in AE language

Dolmen 0.8 rejects the above AE file

type 'a t = Cons of {elt : 'a; tail : 'a t} | Nil

function length (list : 'a t) : int =
  match list with
  | Nil -> 0
  | Cons(_, l') -> length(l') + 1
end

logic l : int t

goal g: length(l) <= 4

with the error:

File "git/alt-ergo/tests/issues/531.ae", line 6, character 19-25:
6 |   | Cons(_, l') -> length(l') + 1
                       ^^^^^^
Error Unbound identifier: `length`

SMT-LIB: dolmen fails to recognize `and` with only one argument

The attached file does not produce an error, but in line 57 an and operator is only applied to one argument:

 (let (($x159 (and (and (fp.eq x!FP (fp.add roundNearestTiesToEven xFP ?x151)) $x157))))

problem.smt2

Datatype theory seems to be builtin and always present

From what I read in the smtlib spec, and the behavior of cvc4, it looks like the DT theory for datatypes is part of the core of smtlib that is always present in all logics. Thus the default should be for it to always be included (we could continue to parse it in logics, it just wouldn't change anything).
cc @bobot who introduced the theory

Comments for suppression

I'm interested in using Dolmen alongside a test-case reducer such as ddSMT or cvise. However, what I'm really interested in is in making sure that the reduction process does not make things any worse than when they started.

This ability to suppress warnings (in the input file) would match what typical code-based static analysers allow us to do.

For example, consider a file such as:

(set-logic QF_UFLIA)
(declare-fun fmt0 () Int)
(declare-fun fmt1 () Int)
(declare-fun s_count (Int) Int)
(assert (>= (* 4 (s_count (+ (- 2) (* (- 1) fmt0) fmt1))) 0))
(check-sat)
(exit)

which gives:

File "./suppress.smt2", line 5, character 12-57:
Error This is a non-linear expression according to the smtlib spec.
      Hint: multiplication in strict linear arithmetic expects an integer or
        rational literal and a symbol (variable or constant) but was given:
        - an integer coefficient
        - an arbitrary expression with top symbol not in Arithmetic

I would like the ability to make Dolmen not report on this error, if I do something like:

(set-logic QF_UFLIA)
(declare-fun fmt0 () Int)
(declare-fun fmt1 () Int)
(declare-fun s_count (Int) Int)
; dolmen-no-error
(assert (>= (* 4 (s_count (+ (- 2) (* (- 1) fmt0) fmt1))) 0))
(check-sat)
(exit)

Or something similar.

I think to actually make this useful, the errors that come from Dolmen need to have some kind of "unique identifier", such that you can silence particular warnings (e.g., with gcc can you do things like -Wno-unused).

For example, the above might be:

; dolmen-no-error[non-linear]

; dolmen-no-error[dol123]

where dol123 is some identifier that Dolmen gives to "This is a non-linear expression according to the smtlib spec".

It is then up to the minimiser to know if it can/can't remove something -- for example, the reducer might simplify Line 5 such that the comment can then be removed. It has a weakness that the minimiser might "squash lines together" to get them to avoid errors, but /c'est la vie/.

Add some automaton flow constraints checks for SMTLIB

Some additional things to check in the main loop for SMTLIB

check that commands are in an order coherent with the semantics (as summarised by page 53 of http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2017-07-18.pdf )
check headers of files as per https://clc-gitlab.cs.uiowa.edu:2443/SMT-LIB-benchmarks-tmp/benchmarks-pending/-/blob/master/README.md
ensure that there are check-sat at each level of push/pop
ensure that the file ends with an (exit)

Incomplete dependencies

Since the move to dune, it seems parallel build are somewhat broken (cf ocaml/opam-repository#13954 ).

Interface of Pipes

The interface of Pipes is at the same time in control through the function of the State module and give-up control through Continue. Is it really needed to have this mixed instead of one or the other or both but separated? I can propose an MR to make things more clear.

Bitvector `bv%i` is broken

Some offsets seems to be off in the conversion to a binary string. But more importantly it uses int_of_string which is limited to 63 bit. We need to handle any size. Two solutions:

doing the conversion by hand, error prone but no new dependencies
depend on zarith

What do you prefer?

Correctly handle tptp's THF

Either not typecheck the THF statements, or upgrade the type-checker to handle higher-order.

SMTLIB's FP theory might require Reals/Reals_Ints and BV

It seems to happen that some problems set their logic to QF_FP, and then use the to_fp function to convert a real to a floating point number (it happens in e.g. QF_FP/20170501-Heizmann-UltimateAutomizer/filter2_reinit_true-unreach-call.c_7.smt2).
The problem is that currently dolmen fails when that happens because the real number is not typed since neither the Reals nor the Reals_ints theory is used because QF_FP does not include them. So the question is:

is the use of a real litteral illegal in QF_FP (and would need e.g. QF_LRAFP ?)
does QF_FP means also using the Reals theory
does the Floats theory includes litterals for Reals (which doesn't seem to be the case according to the description of the theory checked in the repo currently).

cc @bobot

Compilation error

There seems to be a problem compiling dolmen with these deps:

menhir=20171013
dolmen=0.4.1
dune=1.11.4

I'll try and reproduce and see what can be done (cc @c-cube who reported the error).

CI doesn't test 4.08

there's Seq.unfold used somewhere and it's only available since 4.11, but the opam files say Dolmen supports 4.08.

edit: I think it's only in tune actually. But it makes make watch break anyway :)

Typo in smtlib grammar

function_def:
  | s=SYMBOL OPEN args=sorted_var* CLOSE ret=sort body=term CLOSE
{ I.(mk term s), args, ret, body }

The last CLOSE is a typo. It should be removed.

SMT-LIB: print symbols with `|`

This is a feature request.
It would be nice if dolmen could print extended symbols with the | quotes in error messages.
For example:

File "foo.smt2", line 1056, character 64-96:
Error The term: (q(10) * 203.0) has type real but was expected to be of type
      int

Here q(10) is a symbol that must be written as |q(10)| in the input. While here this only leads to a second of confusion, in other cases it might lead to actual problems. For example, if the symbols is |(q 10)|, which would also be a valid SMT-LIB term.

Support LSP's textDocument/definition

Hi, I'm glad I found a language server for SMT-LIB.
Currently, the most important feature for me is go-to-definition
("textDocument/definition", so I can quickly navigate my formulas.
This includes mainly names declared by declare-* or define-* statements,
and maybe also let bindings.

I've hacked together my own language server in Python that works pretty well
for me. But I'm always happy to get rid of that and move to a real solution.
I'm not sure when I'll get to delve into OCaml, but for now I'll leave my
Python implementation here, in case it helps anyone. There's some boilerplate -
the actual routine to find definitions is really simple (and inefficient).
Let me know if I can help with examples/explanation.

(The next logical step would be to implement "textDocument/references", to list all uses of a name)

#!/usr/bin/env python3

# Dependencies:
# pip install python-language-server

import sys
import logging
import threading

from pyls import _utils, uris
from pyls_jsonrpc.dispatchers import MethodDispatcher
from pyls_jsonrpc.endpoint import Endpoint
from pyls_jsonrpc.streams import JsonRpcStreamReader, JsonRpcStreamWriter

from pyls.workspace import Workspace

"""
Toy language server that implements textDocument/definition

For example, given this file

```smt2
(declare-const x Int)

(assert (= x 123))
```

if the cursor is on the "x" in line 3, textDocument/definition will return
the position of the x in line 1.
"""

# SMTLIBLanguageServer is adadpted from pyls
log = logging.getLogger(__name__)
PARENT_PROCESS_WATCH_INTERVAL = 10  # 10 s
MAX_WORKERS = 64


class SMTLIBLanguageServer(MethodDispatcher):
    """ Implementation of the Microsoft VSCode Language Server Protocol
    https://github.com/Microsoft/language-server-protocol/blob/master/versions/protocol-1-x.md
    """

    def __init__(self, rx, tx, check_parent_process=False):
        self.workspace = None
        self.root_uri = None
        self.watching_thread = None
        self.workspaces = {}
        self.uri_workspace_mapper = {}

        self._jsonrpc_stream_reader = JsonRpcStreamReader(rx)
        self._jsonrpc_stream_writer = JsonRpcStreamWriter(tx)
        self._check_parent_process = check_parent_process
        self._endpoint = Endpoint(
            self, self._jsonrpc_stream_writer.write, max_workers=MAX_WORKERS)
        self._dispatchers = []
        self._shutdown = False

    def start(self):
        """Entry point for the server."""
        self._jsonrpc_stream_reader.listen(self._endpoint.consume)

    def __getitem__(self, item):
        """Override getitem to fallback through multiple dispatchers."""
        if self._shutdown and item != 'exit':
            # exit is the only allowed method during shutdown
            log.debug("Ignoring non-exit method during shutdown: %s", item)
            raise KeyError
        try:
            return super(SMTLIBLanguageServer, self).__getitem__(item)
        except KeyError:
            # Fallback through extra dispatchers
            for dispatcher in self._dispatchers:
                try:
                    return dispatcher[item]
                except KeyError:
                    continue

        raise KeyError()

    def m_shutdown(self, **_kwargs):
        self._shutdown=True
        return None

    def m_exit(self, **_kwargs):
        self._endpoint.shutdown()
        self._jsonrpc_stream_reader.close()
        self._jsonrpc_stream_writer.close()

    def _match_uri_to_workspace(self, uri):
        workspace_uri=_utils.match_uri_to_workspace(uri, self.workspaces)
        return self.workspaces.get(workspace_uri, self.workspace)

    def capabilities(self):
        server_capabilities={
            "definitionProvider": True,
        }
        log.info('Server capabilities: %s', server_capabilities)
        return server_capabilities

    def m_initialize(self, processId=None, rootUri=None, rootPath=None, initializationOptions=None, **_kwargs):
        log.debug('Language server initialized with %s %s %s %s',
                  processId, rootUri, rootPath, initializationOptions)
        if rootUri is None:
            rootUri=uris.from_fs_path(
                rootPath) if rootPath is not None else ''

        self.workspaces.pop(self.root_uri, None)
        self.root_uri = rootUri
        self.workspace = Workspace(rootUri, self._endpoint, None)
        self.workspaces[rootUri] = self.workspace

        if self._check_parent_process and processId is not None and self.watching_thread is None:
            def watch_parent_process(pid):
                # exit when the given pid is not alive
                if not _utils.is_process_alive(pid):
                    log.info("parent process %s is not alive, exiting!", pid)
                    self.m_exit()
                else:
                    threading.Timer(PARENT_PROCESS_WATCH_INTERVAL,
                                    watch_parent_process, args=[pid]).start()

            self.watching_thread = threading.Thread(
                target=watch_parent_process, args=(processId,))
            self.watching_thread.daemon = True
            self.watching_thread.start()
        return {'capabilities': self.capabilities()}

    def m_initialized(self, **_kwargs):
        pass

    def m_text_document__definition(self, textDocument=None, position=None, **_kwargs):
        doc_uri = textDocument["uri"]
        workspace = self._match_uri_to_workspace(doc_uri)
        doc = workspace.get_document(doc_uri) if doc_uri else None
        return smt_definition(doc, position)

    def m_text_document__did_close(self, textDocument=None, **_kwargs):
        pass
    def m_text_document__did_open(self, textDocument=None, **_kwargs):
        pass
    def m_text_document__did_change(self, contentChanges=None, textDocument=None, **_kwargs):
        pass
    def m_text_document__did_save(self, textDocument=None, **_kwargs):
        pass
    def m_text_document__completion(self, textDocument=None, **_kwargs):
        pass


def flatten(list_of_lists):
    return [item for lst in list_of_lists for item in lst]

def merge(list_of_dicts):
    return {k: v for dictionary in list_of_dicts for k, v in dictionary.items()}

def smt_definition(document, position):
    pos = definition(document.source, position["line"], position["character"])
    if pos is None:
        return None

    line, col, token = pos

    offset = 1 if len(token) == 1 else (len(token) + 1)
    if col == 0:
        line -= 1
        col = len(document.lines[line]) - offset
    else:
        col = col - offset

    return {
            'uri': document.uri,
            'range': {
                'start': {'line': line, 'character': col},
                'end': {'line': line, 'character': col},
            }
        }

def definition(source, cursor_line, cursor_character):
    nodes = list(parser().parse_smtlib(source))
    node_at_cursor = find_leaf_node_at(cursor_line, cursor_character, nodes)
    line, col, node = find_definition_for(node_at_cursor, nodes)
    if node is None:
        return None
    return line, col, node_at_cursor

def find_leaf_node_at(line, col, nodes):
    prev_line_end = -1
    prev_col_end = -1
    needle = (line, col)
    for line_end, col_end, node in nodes:
        prev_range = (prev_line_end-1, prev_col_end)
        cur_range = (line_end, col_end)
        if prev_range < needle < cur_range:
            if isinstance(node, str):
                return node
            else:
                node_at = find_leaf_node_at(line, col, node)
                assert node_at is not None
                return node_at
        prev_line_end = line_end
        prev_col_end = col_end
    return None

def stripprefix(x, prefix):
    if x.startswith(prefix):
        return x[len(prefix):]
    return x

def find_definition_for(needle, nodes):
    for node in nodes:
        line_end, col_end, n = node
        _, _, head = n[0]
        if not head.startswith("declare-") and not head.startswith("define-"):
            continue
        _, _, symbol = n[1]

        if head in ("declare-const", "define-const", "declare-fun", "define-fun", "define-fun-rec", "declare-datatype"):
            if symbol == needle:
                return n[1]
            continue

        if head in ("declare-datatypes", "define-funs-rec"):
            for i, tmp in enumerate(symbol):
                _, _, type_parameter_declaration = tmp
                _, _, type_name = type_parameter_declaration[0]
                if type_name == needle:
                    return type_parameter_declaration[0]
            if head == "declare-datatypes":
                constructor = dfs(needle, node)
                if constructor is not None:
                    return constructor
                constructor = dfs(stripprefix(needle, "is-"), node)
                if constructor is not None:
                    return constructor
            continue
        assert f"unsupported form: {head}"
    return -1, -1, None

def dfs(needle, node):
    assert isinstance(node, tuple)
    _, _, n = node
    if isinstance(n, str):
        if n == needle:
            return node
        else:
            return None
    for child in n:
        found = dfs(needle, child)
        if found is not None:
            return found
    return None

class parser:
    def __init__(self):
        self.pos = 0
        self.line = 0
        self.col = -1
        self.text = None

    def nextch(self):
        char = self.text[self.pos]
        self.pos += 1
        self.col += 1
        if char == "\n":
            self.line += 1
            self.col = 0
        return char

    def parse_smtlib(self, text):
        assert self.text is None
        self.text = text
        return self.parse_smtlib_aux()

    def parse_smtlib_aux(self):
        exprs = []
        cur_expr = None
        size = len(self.text)

        while self.pos < size:
            char = self.nextch()

            # Stolen from ddSMT's parser. Not fully SMT-LIB compliant but good enough.
            # String literals/quoted symbols
            if char in ('"', '|'):
                first_char = char
                literal = [char]
                # Read until terminating " or |
                while True:
                    if self.pos >= size:
                        return
                    char = self.nextch()
                    literal.append(char)
                    if char == first_char:
                        # Check is quote is escaped "a "" b" is one string literal
                        if char == '"' and self.pos < size and self.text[self.pos] == '"':
                            literal.append(self.text[self.pos])
                            self.nextch()
                            continue
                        break
                cur_expr.append((self.line, self.col, literal))
                continue

            # Comments
            if char == ';':
                # Read until newline
                while self.pos < size:
                    char = self.nextch()
                    if char == '\n':
                        break
                continue

            # Open s-expression
            if char == '(':
                cur_expr = []
                exprs.append(cur_expr)
                continue

            # Close s-expression
            if char == ')':
                cur_expr = exprs.pop()
                # Do we have nested s-expressions?
                if exprs:
                    exprs[-1].append((self.line, self.col, cur_expr))
                    cur_expr = exprs[-1]
                else:
                    yield self.line, self.col, cur_expr
                    cur_expr = None
                continue

            # Identifier
            if char not in (' ', '\t', '\n'):
                token = [char]
                while True:
                    if self.pos >= size:
                        return
                    char = self.text[self.pos]
                    if char in ('(', ')', ';'):
                        break
                    self.nextch()
                    if char in (' ', '\t', '\n'):
                        break
                    token.append(char)

                token = ''.join(token)

                # Append to current s-expression
                if cur_expr is not None:
                    cur_expr.append((self.line, self.col, token))
                else:
                    yield self.line, self.col, token

def serve():
    stdin = sys.stdin.buffer
    stdout = sys.stdout.buffer
    server = SMTLIBLanguageServer(stdin, stdout)
    server.start()

if __name__ == "__main__":
    if len(sys.argv) >= 2 and sys.argv[1] == "definition":
        line = int(sys.argv[2])
        col = int(sys.argv[3])
        print(definition(sys.stdin.read(), line, col))
    else:
        serve()

Constructing smtlib set_info

There is a make function for Terms but it is not available in the signature. I think it should be mk instead.

I leave where an example

Dolmen.Statement.set_info ( Dolmen.Term.mk ( Dolmen.Term.Colon (
	    		Dolmen.Term.mk (Dolmen.Term.Symbol (Dolmen.Id.mk Dolmen.Id.Attr ":smt-lib-version") ),
	    		Dolmen.Term.mk (Dolmen.Term.Symbol (Dolmen.Id.mk Dolmen.Id.Term "2.5"))
	    	)
	))

We could also add mk_symbol, mk_colon etc. as shorthands.

Do you have pretty printers for smtlib? I'm interested in that.

Switch to dune ?

Providing printers for terms needs to correctly handle unicode[1] in the general case. Thus it adds uutf and uucp as dependencies, however people may not need or want those dependencies in a parser library[2].

The simple answer is to provide another package (kind of a sub-package) that provides the export functionalities, and which can depend on uutf and uucp while the main package does not need them.

However, correctly handling multiple packages is quite a pain to deal with when using ocamlbuild and manual makefiles for installation, so switching to dune would probably ease that a lot (once the switch done), so here it goes: dolmen should probably switch to dune.

If anyone is motivated to do the switch, please do so and create a PR ! Else I'll have a look sometime soon.

cc @anmaped

[1] : handling unicode primarily means detecting unicode characters in order to know what to do with them in languages that do not support unicode. This require a unicode-aware handling of identifiers strings, if only to replace unicode characters by appropriate ASCII characters.

[2]: though if dolmen one day parses unicode-aware languages (such as Coq ?), a unicode-ware lexer might be needed...

smtlib parsing: assertion hit when parsing wrongly typed equation

Please consider the following smtlib script:

(set-logic AUFLIA)
(declare-fun notindomain (Int) Int)
(assert (forall ((i Int))
                (= (notindomain i) (or (< i 0) (< 5 i)) )))
(check-sat)

It contains a type error because notindomain should be of type Int to Bool. When I run dolmen on the file, I receive the following error message:

Error While typing: (= (notindomain (i)) (or (< (i) 0) (< 5 (i))))
File '/tmp/test.smt2', line 6, character 16-57:
The term: (Fatal error: exception File "src/standard/expr.ml", line 484, characters 6-12: Assertion failed

The error occurs in pretty printing (the assertion requires an infix operator to have at least 2 arguments) so it might hide the actual error.

document shape of terms

in particular, what the arguments of letin are, also bound variables in fun, etc.

`Define-sort` not supported

Sort definition (type alias) seems not supported:

(set-logic QF_LIA)
(define-sort F () Int)
(declare-fun x () F)
(assert (= x 1))
(check-sat)
(exit)

returns:

Error While typing: (= (x) 1)
File ./test.smt2, line 4, character 8-15:
The term: 1 has type int but was expected to be of type F

Setup testing over all of tptp/smtlib libraries

Still not clear on how an automated setup could work, the libs are definitely too big for the free CI on github.

Files missing while installing on Mac OS

Hi,
I had some issues while installing dolmen on Mac OS. (the error message didn't appear on debian 9).

Is it a problem with parallel compilation since it's requesting non-existing files?

opam install dolmen

<><> Synchronising pinned packages ><><><><><><><><><><><><><><><><><><><><>  🐫 
[dolmen.dev] no changes from git+https://github.com/Gbury/dolmen.git

The following actions will be performed:
  ∗ install dolmen dev*

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫 
[ERROR] The compilation of dolmen failed at "/Users/HOME_DIRECTOTY/.opam/opam-init/hooks/sandbox.sh build dune build -p dolmen -j 4".

#=== ERROR while compiling dolmen.dev =========================================#
# context     2.0.0 | macos/x86_64 | ocaml-base-compiler.4.07.1 | pinned(git+https://github.com/Gbury/dolmen.git#926e50df)
# path        ~/.opam/4.07.1/.opam-switch/build/dolmen.dev
# command     ~/.opam/opam-init/hooks/sandbox.sh build dune build -p dolmen -j 4
# exit-code   1
# env-file    ~/.opam/log/dolmen-94435-d8a4ae.env
# output-file ~/.opam/log/dolmen-94435-d8a4ae.out
### output ###
# Error: tokens_dimacs.mly: No such file or directory
# [...]
# (cd _build/default && /Users/HOME_DIRECTOTY/.opam/4.07.1/bin/menhir --explain --table --external-tokens Tokens_iCNF tokens_iCNF.mly src/languages/icnf/parseiCNF.mly --base src/languages/icnf/parseiCNF --infer-write-query src/languages/icnf/parseiCNF__mock.ml.mock)
# Error: tokens_iCNF.mly: No such file or directory
#       menhir src/languages/smtlib/parseSmtlib__mock.ml.mock (exit 1)
# (cd _build/default && /Users/HOME_DIRECTOTY/.opam/4.07.1/bin/menhir --explain --table --external-tokens Tokens_smtlib tokens_smtlib.mly src/languages/smtlib/parseSmtlib.mly --base src/languages/smtlib/parseSmtlib --infer-write-query src/languages/smtlib/parseSmtlib__mock.ml.mock)
# Error: tokens_smtlib.mly: No such file or directory
#       menhir src/languages/tptp/parseTptp__mock.ml.mock (exit 1)
# (cd _build/default && /Users/HOME_DIRECTOTY/.opam/4.07.1/bin/menhir --explain --table --external-tokens Tokens_tptp tokens_tptp.mly src/languages/tptp/parseTptp.mly --base src/languages/tptp/parseTptp --infer-write-query src/languages/tptp/parseTptp__mock.ml.mock)
# Error: tokens_tptp.mly: No such file or directory
#       menhir src/languages/zf/parseZf__mock.ml.mock (exit 1)
# (cd _build/default && /Users/HOME_DIRECTOTY/.opam/4.07.1/bin/menhir --explain --table --external-tokens Tokens_zf tokens_zf.mly src/languages/zf/parseZf.mly --base src/languages/zf/parseZf --infer-write-query src/languages/zf/parseZf__mock.ml.mock)
# Error: tokens_zf.mly: No such file or directory



<><> Error report <><><><><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫 
┌─ The following actions failed
│ λ build dolmen dev
└─ 
╶─ No changes have been performed

inconsistent documentation in Language.S

dolmen/src/interface/language.ml

Lines 31 to 37 in 1ee36bf

 val parse_file : string -> file * statement list 

 (** Parse the given file. 

  @param dir: optional directory to use if the file path is relative. *) 

 val parse_file_lazy : string -> file * statement list Lazy.t 

 (** Parse the given file. 

  @param dir: optional directory to use if the file path is relative. *)

These functions document an optional dir parameter, but they don't have one.

Term printers

Implement printers for terms in each language.

This brings some questions that need answers:

what to do when the term uses features not available in the output languages (higher-order terms printed in smtlib, first-order terms printed in dimacs/iCNF, ...) ?
particularly, dimacs and iCNF are not easy to print, since they have quite a loose relation with the terms: propositional atoms expressed as integers in dimacs and iCNF are encoded using constants named #1, #2, etc... So should we just not give printer for them, or have the printer maintain a map of terms to numbered propositional atoms ?
escaping of identifiers: constants/variables that do not respect the lexical conventions of the output language should be escaped (failing on those would be annoying). I currently have some code to do so in another project (cf https://github.com/Gbury/archsat/blob/master/src/util/escape.ml and https://github.com/Gbury/archsat/blob/master/src/output/tptp.ml#L45 ), however that code depends on unicode handling libraries (because it also escapes identifers for Coq). Currently, all languages (except maybe zf) are pretty much all-ascii, so escaping could probably be done in ascii, but that would make it difficult to add non-ascii languages (such as coq). In case unicode escaping is required, it would probably means shipping the printers in a sub-package (in which case, the build system should probably switch to dune).
should statements also be printable ?

cc @anmaped
cc @c-cube for the unicode-related escaping question

SMTLIB refuses redefinition or redeclaration

Cf SMTLIB v6 end of §4.1.5

Overloading accepted in theory definition but not for users.

Well-sortedness checks, required for commands that use sorts or terms, are always done
with respect to the current signature. It is an error to declare or define a symbol that is already
in the current signature. This implies in particular that, contrary to theory function symbols,
user-defined function symbols cannot be overloaded.(29)

29 The motivation for not overloading user-defined symbols is to simplify their processing by a solver. This restriction is significant only for users who want to extend the signature of the theory used by a script with a new polymorphic function symbol—i.e., one whose rank would contain parametric sorts if it was a theory symbol. For instance, users who want to declare a “reverse” function on arbitrary lists, must define a different reverse function symbol for each (concrete) list sort used in the script. This restriction might be removed in future versions

Let dolmen fix that for you

Long-term thread to list the various little errors that dolmen could accept during typing, in order to eventually allow to "fix" problems once problem export in dolmen has been implemented. This would be done either when using --strict=false or even another option (--strict=fix ?).

Term normalizer

Implement a term normalizer that maps each language pre-defined constants to term builtins.

Try and handle all smtlib2 logics

If possible better than by listing them all. Currently, in case of an unknonw logic, the typechecker will default to the most comprehensive set of theories it can.

Add up-to-date tptp parser

To do once #18 is merged.

Ast_smtlib missing in mlpack

Some issues I have detected when trying to use dolmen.

Error: Signature mismatch:
       ...
       Unbound module type Ast_smtlib.Id
Command exited with code 2.

Fix:
Add Ast_smtlib in dolmen.mlpack.

Missing cmx for native compilation.
Fix:
Append $(addsuffix .cmx, $(NAME)) to the Makefile in the var TO_INSTALL_LIB

Last `set-logic` "wins"

Related to #74

It seems that the last set-logic "wins":

(set-logic QF_LRA)
(set-logic QF_BVFPLRA)
(declare-fun honk () Real)
(assert (= honk 0.75))
(check-sat)
(exit)

This gives an error (as per #74); but:

(set-logic QF_BVFPLRA)
(set-logic QF_LRA)
(declare-fun honk () Real)
(assert (= honk 0.75))
(check-sat)
(exit)

is fine.

As a user, I think I'd expect Dolmen to give me at least a warning if it sees multiple set-logics.

smtlib parser: :named annotations lead to warnings

Please consider the following smtlib script:

(set-info :smt-lib-version 2.6)
(set-logic UFNIA)
(set-option :produce-unsat-cores true)
(declare-const x Int)
(declare-const y Int)
(declare-const z Int)

(assert (! (>= x 3) :named mx1 ))
(assert (! (<= x 3) :named mx2 ))
(assert (! (>= y 2) :named my1 ))
(assert (! (<= y 2) :named my2 ))
(assert (! (>= z 1) :named mz1 ))
(assert (! (<= z 1) :named mz2 ))

(assert
         (or
          (> 3 x)
          (>= y 3)
          (>= y x)
          (>= 0 (+ z y))
          )
 )
(check-sat)
(get-unsat-core)
(exit)

Each of the named assertion leads to a warning:

Warning Error while type-checking an attribute:
While typing: :named Unbound identifier: ':named'

I'm not sure how to interpret the warning - cvc4 and z3 correctly compute the unsat core for the problem though.

Catch-all exceptions in lsp handler

Avoid exiting early because of internal dolmen errors.

Adding FP Library of the SMTLIB

I'm proposing to do it, but I prefer to check that someone else is not doing it. I think I will take inspiration from the BV theory which also a family of types indexed by integers, but is there any other advice?

PS: Dolmen is really really great! I still don't understand why builtins are not exported in Dolmen.Expr, how should we match the different ids when going through the typed expressions?

No warning when no quantifiers are used in quantified logic

It would be great if Dolmen warns if no quantifiers are used, but a quantified logic was specified:

For example for

(set-logic BV)
(declare-const a (_ BitVec 8))
(declare-const b (_ BitVec 8))
(assert (= a b))
(check-sat)

Dolmen parses the benchmark without any warning, however, it'd be great if it would inform the user that QF_BV would be sufficient for the logic.

Parse error with `let`

For the following file:

( set-logic QF_LIA)
( declare-fun a () Int )
( declare-fun b () Bool )
(assert (let ( (c (ite b 1.0 2.0))) (= 0 (* 2 c))))
( check-sat )

Dolmen gives:

File "./test.smt2", line 4, character 25-28:
Error Unbound identifier: `1.0`

False positive on `QF_BVFPLRA`

For this file:

(set-logic QF_BVFPLRA)
; (set-logic QF_LRA)
(declare-fun honk () Real)
(assert (= honk 0.75))
(check-sat)
(exit)

Dolmen reports:

File "./lra.smt2", line 4, character 16-20:
Error Real literals are not part of the Floats specification.

However, the logic LRA has been enabled (and Dolmen does not report "unknown logic"). If you comment the first line and uncomment the second (i.e., so LRA is the enabled logic), Dolmen happily accepts this file.

Why3 as input language

Would it be interresting to have Why3 as input language (for example by linking with the Why3 library if available?)

Reparsing a file and `Dolmen.Std.Loc.table`

Currently it is not possible to parse again a file. For a prover used on the command line it is not useful but for an interactive use or a GUI it is. I tried to add the Loc.file to use in Loc.newline, but that doesn't work for include. So perhaps adding Loc.table to the state would be better.

What do you think of this problem?

	val parse_file : string -> file * statement list
	(** Parse the given file.
	@param dir: optional directory to use if the file path is relative. *)

	val parse_file_lazy : string -> file * statement list Lazy.t
	(** Parse the given file.
	@param dir: optional directory to use if the file path is relative. *)