acsl-language / acsl Goto Github PK
View Code? Open in Web Editor NEWSources for the ANSI/ISO C Specification Language manual
License: Other
Sources for the ANSI/ISO C Specification Language manual
License: Other
(backported from Frama-C's bts)
In 0002390 the issue was raised whether functions contracts such as
/*@ requires
\valid(&a);
ensures \false; */
void f(int a) {
return ;
}
are meaningful.
More specifically, the ACSL manual should clarify whether using the address of a formal argument in a precondition, and more generally a function contract, is illegal.
d
suffix (e.g. 0.1d
, which would be equivalent to (double)0.1
).long double
?is_finite_float(x) ==> (float)(\real_of_float(x)) == x
?statement contracts can be nested in other statement or function contracts. So nested behaviors may have the same ids. We can forbid this or we can declare the inner ids shadow the outer; the text is written as the latter at the moment. Anyone object?
Give an example of data invariant that express initialization properties over a global variable, such as an array containing pairwise distinct values.
The term 'memory location' is used inconsistently. Edit appropriately.
\abs
and \power
?logic real (^^^)(real x, real y) = \power(x,y);
)In section 2.2.3 (Typing):
C types float and double are subtypes of type real .
Section 2.2.5. Real numbers and floating point numbers
later explains that this is not entirely true:
Unlike the promotion of C integer types to mathematical integers, there are special float
values that do not naturally map to a real number, namely the IEEE-754 special values for
"not-a-number", +∞ and −∞. See below for a detailed discussion on such special values.
However, remember that ACSL’s logic has only total functions. Thus, there are implicit
promotion functionsreal_of_float
andreal_of_double
whose results on the 3 values above
is left unspecified.
Maybe a small footnote or mention in 2.2.3 could help avoiding hasty interpretations.
It is unfortunate that the \sign
predicate is not applicable to NaN
. The signbit
equivalent in C99 does accept NaN
. This complicates writing an ACSL specification for that function. If there are advantages to making \sign
partial, maybe they could be listed in the document (or should there be a "ACSL Rationale" with such details?).
Floating-point constants and operations are interpreted as mathematical real numbers: a
C variable of type float or double is implicitly promoted to a real.
Does that mean that one cannot write -0.0
in ACSL (otherwise it is converted to the real zero, which is unsigned)?
ACSL introduces clauses for statement contracts in order to deal with an "abrupt" exit out of the block on which the contract applies through a break
, goto
, return
or exit
. In addition, ensures
are not necessarily verified in such a case. On the other hand, nothing is said about the validity of the contract if the block is entered via a jump to an arbitrary statement, as in the following example:
void f() {
int x = 0;
goto L1;
x = 1;
/*@ requires x == 1;
ensures x == 2;
*/
{
x = 2;
L1: ;
}
What can we say about such a contract?
ACSL manual states that (section 2.3.1)
C function parameters are obtained by value from actual parameters that mostly remain
unaltered by the function calls. For that reason, formal parameters in function contracts
are defined such that they always refer implicitly to their values interpreted in the pre-state.
However, this leaves out some important questions, notably whether we can speak about the address of such formal parameters. Basically, given a function rettype f(prms) { body; return r; }
, a call lval = f(args)
can be decomposed as such (we assume that the body of f
is normalized and there's a single return
statement)
EDIT as mentioned in a comment below, the initial // Post
labels were incorrectly placed.
// Pre1
{
rettype retvar;
// Pre2
{
prms = args;
// Pre3
{
body;
retvar = r;
-// Post1
}
+// Post1
-// Post2
}
+//Post2
lval = retvar;
-// Post3
}
+//Post3
In this context, there are three places where the Pre
(as in \at(.,Pre)
) state of f
could be located. Pre2
is not really useful: it only give access to the location where the returned value will be temporarily placed, and by definition retvar
is uninitialized until the Post
state. Equating Pre
with Pre1
implies that we cannot take the address of the formals in the pre state, and notably that it is not possible to state something like \initialized(&prm)
. Basically, the requires
of f
would be translated at Pre1
as assert R{prms <- args}
where {prms <- args}
denotes the substitution of prms
by args
. Equating Pre
with Pre3
implies that we have addresses attached to the formals.
Similarly, equating Post
with Post1
implies that we can speak about the addresses of the formals, and of retvar
. In particular, if p
holds the address of a formal, ensures \valid(p);
will be true. This is not the case if we equate Post
with Post2
or Post3
, but in the former case it might be possible to refer to the adress of \result
(so that it would be possible to state ensures \initialized(&\result)
, e.g. if we return a struct
and want to state explicitly that each field has been initialized. Again, the ensures
of f
would be translated as assert E{prms<-args, \result<- retvar}
(or \result <- lval
for Post3
).
At a quick glance, it would seem that equating Pre
with Pre3
and Post
with Post2
or Post3
would be preferable, but there may be some other subtleties than \valid
and \initialized
lying around.
In the definition of modifiable lvalue, there should be an example for the distinction between lvalue and modifiable lvalue.
In particular, can \result
be considered as a modifiable lvalue, i.e. is it possible to write something like assigns \result.a from i = i;
?
Does a local variable from a caller function still satisfy its invariant when the callee returns?
It is not specified that the red color introduces unimplemented features.
Should we define base_addr and block_length on null ? (0 in both cases), or leave undefined. Review the examples in the documentation based on this decision.
In 2.7.1 of acslpp.pdf
(page 68) it is stated that a logic function or predicate may be declared virtual and that dynamic binding applies. For me dynamic binding happens at runtime but the logic functions are not executed at runtime. So, I am a bit confused.
Original issue in Frama-C bts
Local ghosts should not shadow local non-ghost in non-ghost code, as in
void titi() {
int c = 0;
{
/*@ ghost int c = 1; */
c = 2; //c is still the non-ghost variable here.
}
/*@ assert c == 2; */
}
However, the ACSL manual does not say anything about that. A possibility would be to say that ghost code is implicitly enclosed into its own block. however, this would preclude something like that:
void f() {
/*@ ghost int c = 1; */
int x = 0;
/*@ ghost c = x; */
x++;
/*@ assert c == x - 1; */
}
and more generally would probably preclude the use of local ghost variables in assertions. It's probably better to have a ghost syntactic scope, which respects the same block boundaries than non-ghost one, but is only active for ghost statements.
The ACSL manual indicates that there are builtins operators for floating-point comparison (as opposed to the <
et al. operators that take real
arguments. However, the phrasing is quite ambiguous:
Special predicates are also available to express the comparison operators of float (resp. dou-
ble) numbers as in C: \eq_float , \gt_float , \ge_float , \le_float , \lt_float , \ne_float (resp. for
double).
Does that mean that we have an overloaded operator or two distinct operators?
boolean \eq_float(float x, float y);
boolean \eq_float(double x, double y);
boolean \eq_float(float x, float y);
boolean \eq_double(double x, double y);
If the former, I'd suggest to find another name than \eq_float
, \eq_ieee
for instance (in the context of floating-point computation, which IEEE norm we are referring to should be quite clear), or \eq_fp
.
In the termination paragraph (2.5), the order of arguments for the measure is not clear.
The equivalence R(x, y) <==> x > y && x >= 0
for the integer measure (§2.5.1) uses the first argument x
as the previous value, but the example lexico(intpair p1, intpair p2)
in the general measure (§2.5.2) seems to use the second argument p2
as the previous value as the definition of the predicate is similar to p2 > p1 && p2 >= 0
.
What is the correct order of arguments?
In short: I propose that \valid(p)
and \valid_read(p)
include in particular \initialized(p)
.
This issue is related to issues #74 and #81.
In much more details: historically, the predicate \valid(p)
is inherited from Caduceus. Its informal meaning, or informal intention, was to be the natural pre-condition for accessing to the value pointed by p
, either by reading (*p
) or
writing (*p = ...
). It was included as is in ACSL from the beginning. Later on, the variant \valid_read(p)
was introduced, with the intention to be the natural pre-condition for access but not allowing writing, which is the typical situation when the pointer points to const
data. Much later on, the predicate \initialized
was introduced because of a need to distinguish between data that is allocated but not yet initialized, and initialized data. reading non-initialized data is not forbidden by the C norm, but the result is unspecified, and in particular there is no guarantee that reading twice the same non-initialized memory location will give the same value.
The main problem is that in the majority of examples that exist so far, e.g. in the ACSL by example document, Allan Blanchard tutorial, or the gallery of verified programs of Toccata, the \initialized
predicate is not used, and \valid
is used as it was meaning \valid(p) && \initialized
. In a strict interpretation of \valid
all these examples are in fact incorrect, even if they were proved correct by Jessie or by WP.
This annoying situation is hot nowadays, not only because of the issues #74 and #81, but also because of the writing of the ACSL chapter of the Frama-C book, in which a clear meaning for \valid
is expected.
The proposal is then:
to rename the current predicate \valid
and \valid_read
into new names such as \write_permitted
and \read_permitted
.
to give to \valid(p)
the new meaning \write_permitted(p) && \initialized(p)
and to valid_read(p)
the meaning \read_permitted(p) && \initialized(p)
.
\valid
into \write_permitted
, and \valid_read
into read_permitted
. I assumed it should be only a renaming, feel free to correct me.\write_permitted
, whereas for memory access it should probably remain \valid_read
.\initialized
could be removed because now subsumed by a \valid
.This was discusses longly this morning with Virgile and Allan, I probably forgot other arguments discussed, let's what comes in the upcoming discussion. Any comments welcome! @ALL
Is there a need for a non-deterministic constructor such as \any T x; P(x)
?
It is not clear in the documentation the meaning of implicit bounds in the term? .. term?
range construct.
Ideally, looking for "range" in the document should quickly lead to a phrase saying, for instance, that:
.. b
is equivalent to
0 .. b
and
a ..
is equivalent to
a .. \block_length(term_over_which_the_range_operates)-1
or something similar.
The grammar allows for nested axiomatic blocks, i.e. something like
axiomatic Foo {
predicate foo(integer x);
axiomatic Bar {
predicate bar (integer x);
axiom fb: forall integer x; foo(x) >= bar(x);
}
}
Is it something we would really like to accept?
If yes, we should probably also be more explicit regarding inference of reads
clause from axiom
s defining the predicates within an axiomatics (nothing is said about that in the document itself, but Jessie used to take advantage of grouping predicate/function declarations and corresponding axioms to infer the memory locations that were involved in determining the value of the predicate/function).
Would it be possible to define logic symbols local to a function contracts (in which the formals of the function would be in scope, and that could be used anywhere in the contract)?
Is the usage of /@ @/
to delimit multi-line annotations in ghost code the best way to handle, or should we stick to /*@ ... */
event if that means nested comments for standard C compiler?
The section about the \initialized{L}(p)
needs to be developed.
In ACSL, there is no strong link between \initialized{L}(p)
and \valid_read{L}(p)
even if from the C side, it seems difficult to have a data initialized at a program state that cannot be read at the same state.
Some development processes require that source code stays unmodified during V&V. This causes an issue when one wants to add annotation to prove a property. Possible solutions for implementations should be discussed in an appendix of the file.
.h
file that is not part of the original sources.patch
filesIn the release 1.13 of ACSL manual, companion to Frama-C 17 - Chlorine, in section 2.4.4 "Statement contracts", at the end of page 45, a paragraph is placed in the right margin.
The grammar should include a footnote to say that a behavior-body-stmt may not be empty.
if e1
and e2
are two expressions of C arithmetic type tau
and we do in C res = e1 op e2;
(with res also of type tau
and res an arithmetic operation), do we always have in ACSL (tau) (e1 op e2) == res
.
This is probably wrong for signed arithmetic where overflow results in undefined behavior. What about unsigned arithmetic?
I am not sure this is the right place for this issue....
The ACSL language implementation document for Frama-C 20.0 shows the following snippet in Example 2.12
//@ type point = struct { real x; real y; };
When processing it with frama-c -wp acsltype.c
I receive the following error message
[kernel] Parsing acsltype.c (with preprocessing)
[kernel:annot-error] acsltype.c:2: Warning: unexpected token '{'
[kernel] User Error: warning annot-error treated as fatal error.
...
Do I have to provide more arguments to Frama-C or is it just not yet implemented (it isn't marked as such in the document)?
The error also occurs when changing the snippet into
//@ type point = struct { int x; int y; };
This is a place holder for a issue with many ramiifications. Like in JML and likely in ACSL++, it would be useful to allow pure C functions to be used in logical expressions. This could be accomplished as syntactic sugar -- marking a function pure impliictly creates a corresponding logical function or predicate. I suggest that the marking of pure be required (and then syntactically checked); what the 'corresponding logical function' is requires thought and elaboration.
In case of multiple contracts, all clauses are merged. This semantics is arguably dangerous, e.g. if one does only include a part of the contracts of a called function when performing modular verification, some requires
clauses might not be checked.
also
mechanism of JML)?I will shortly being to assemble a draft C++ specification language document. Since C and C++ have much in common, there will obviously be considerable overlap with the ACSL document. Thus it does not make sense to have two completely separate documents where the ommon material needs to be updated and kept consistent in both of them. So we can either
Virgile and I are preferring the second approach. It will require some refactoring of the ACSL sources in order to isolate the shared pieces and therefore some mutual dependency. The C++ sources will be in the same repository as the ACSL sources and the document created with something like make acppsl
Comments are welcome.
@jensgerlach @pbaudin @vprevosto @maroneze for comments
The introduction says that @ symbols in an annotation are equivalent to blanks.
But the section on ghost code says that @ symbols at the beginning of lines and end of the comment are blanks, which does not include all @ symbols.
Which is it to be?
I think an axiomatic block should declare a separate namespace.
Instead of
/*@ axiomatic sign {
@ logic integer get_sign(real x);
@ axiom sign_pos: \forall real x; x >= 0. ==> get_sign(x) == 1;
@ axiom sign_neg: \forall real x; x <= 0. ==> get_sign(x) == -1;
@}
@*/
one could then omit the sign_
prefix for the individual axioms.
/*@ axiomatic sign {
@ logic integer get_sign(real x);
@ axiom pos: \forall real x; x >= 0. ==> get_sign(x) == 1;
@ axiom neg: \forall real x; x <= 0. ==> get_sign(x) == -1;
@}
@*/
In the following examples with a valid pointer and an invalid pointer, should they be \separated()
?
Example 1:
#include <stdlib.h>
int main() {
int * a = malloc(sizeof(int));
int * b = malloc(sizeof(int));
free(a);
if (b) {
//@ assert !\valid(a);
//@ assert \valid(b);
//@ assert \separated(a, b); // ?
}
free(b);
return 0;
}
Example 2:
int * f() {
int a = 0;
return &a;
}
int main() {
int * a = f();
int b = 0;
//@ assert !\valid(a);
//@ assert \valid(&b);
//@ assert \separated(a, &b); // ?
return 0;
}
On the front page, the repository given is acsl, not acsl-language.
Constants such as \plus_infinity
, \minus_infinity
and \NaN
would be useful to have in ACSL.
Currently, specifications dealing with non-finite floating-point values must resort to using expressions such as 1.0f/0.0
or 0.0f/0.0
, or they need to include math.h
and rely on C99 macros such as INFINITY
and NAN
. The specifications become much less clear and may induce unnecessary dependency on libc headers.
chains of !=
are explicitely forbidden. Would it make sense to consider that a != b != c
means a!=b && a!=c && b!=c
in the same spirit as the distinct
operator of SMTLib?
Of course, !=
could not be mixed with any other operator.
There has been no release of the 1.12 version of the manual. Also, it would be very helpful to add the .pdf
as release artifacts, especially since it is not obvious that those are hosted on Frama-C's website.
Grammar entry "type-expr" is defined twice in incompatible ways: once in Fig. 2.4 (Grammar of binders and type expression) and once in Fig. 2.14 (Grammar for global logic definitions).
I have a question about the meaning of \valid. The ACSL manual says "A pointer p is said to be valid if *p is guaranteed to produce a definite value according to the C standard", and "\valid{L}(s) holds if and only if dereferencing any p ∈ s is safe at label L, both for reading from *p and writing to it."
In the following code, is p valid?
int main() {
int x, *p = &x;
//@ assert \valid(p);
}
According to the C Standard, the value of x is "indeterminate" and an attempt to read *p results in undefined behavior, because x might hold a "trap representation" which does "not represent a value of the object type".
So it sounds like *p is not guaranteed to produce a definite value, and certainly dereferencing p is not safe for reading. But I'm pretty sure that the intention is that p should be valid and Frama-C+WP says the assertion holds.
Quotes from C Standard (2018)...
"If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate." (6.7.9 (10))
"Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. ... Such a representation is called a trap representation". (6.2.6.1(5))
"Thus, an automatic variable can be initialized to a trap representation without causing undefined behavior, but the value of the variable cannot be used until a proper value is stored in it." (footnote 50)
Currently, in ACSL, the long double
type is not mentioned at all. However, for better compatibility with code using it, it would be nice to support the L
suffix in floating-point literals (e.g. 1.0L
) for type long double
, as is the case in C.
More generally, it would be nice if ACSL standardized the usage of the D
suffix for literal floating-point constants of type double
, as is the case in WG14 N1176, Extension for the programming language C to support decimal floating-point arithmetic. While the document is mainly related to decimal floating-point constants (which are rarely used), it mentions adding an extension to the parsing of floating-point literals:
An unsuffixed floating constant has type double. If suffixed by the letter f or F, it has type float. If suffixed by the letter d or D, it has type double. If suffixed by the letter l or L, it has type long double.
For information, the D
suffix is accepted by GCC (which mostly ignores it, since unsuffixed floating-point literals in C already have type double
), but not supported by other compilers, such as Clang and CompCert.
Still, considering that, in ACSL, unsuffixed floating-point literals have type real
(unlike in C), adding support for the D
suffix avoids casts in some cases.
For long double
s, the L
suffix maintains consistency with C literals, minimizing surprise, and avoiding casts.
In the introduction, it is said that all ACSL keywords that are valid C identifiers and can occur as terms are preceded by a backslash to avoir conflicts with pre-existing C identifiers. There is a particular case for integer
, real
, and boolean
, which do not begin with a backslash, but can be replaced by their UTF-8 counter-part if needed.
Should ACSL incorporate extension to other specification formalisms such as
pre
) and flow of valuesSuch extension could include support for JML's history constraints.
Related work:
Do invariant hold for functions defined before them, as in
typedef struct S { int x; } S;
void reset(S *p) { p->x = 0; }
//@ type invariant I(S *p) { p.x > 0 }
What is the semantics of a string literal in an ACSL annotation?
Proposal: it is equivalent to declare a new ghost variable: assert P("foo")
would then become assert P(ghost_foo);
, with a global declaration /*@ ghost const char ghost_foo[4] = "foo"; */
inserted before.
Some Frama-C plug-ins resort to the notion of attribute annotation, either ad'hoc or through (an abuse of) the Microsoft-inspired __decl_spec
attribute. Should this be described officially in the manual?
Adpated from a post on the frama-c mailing list:
Using the first example from the tutorial:
/*@ ensures \result >= x && \result >= y;
ensures \result == x || \result == y;
*/
int max (int x, int y) {
return (x > y) ? x : y;
}
A pragma-based version might look something like:
#pragma acsl ensures \result >= x && \result >= y;
#pragma acsl ensures \result == x || \result == y;
int max (int x, int y) {
return (x > y) ? x : y;
}
One major advantage of the pragma syntax is integration with the preprocessor. For example, let's say I use a macro to define several variants of a function:
#define DEFINE_MAX_FN(T, id) \
T id##_max (T x, T y) { \
return (x > y) ? x : y; \
}
DEFINE_MAX_FN(int, int)
DEFINE_MAX_FN(unsigned int, uint)
DEFINE_MAX_FN(int64_t, int64)
I realize that (ab)using the preprocessor like this is often considered to be something of an anti-pattern, but there are times where the benefits from code reuse vastly outweigh the decreased readability. I know several of my own projects would have been impractical without this feature.
As far as I know, with ACSL there is no good way to annotate the different versions; I would need to add a comment for each one:
/*@ ensures \result >= x && \result >= y;
ensures \result == x || \result == y;
*/
DEFINE_MAX_FN(int, int)
/*@ ensures \result >= x && \result >= y;
ensures \result == x || \result == y;
*/
DEFINE_MAX_FN(unsigned int, uint)
/*@ ensures \result >= x && \result >= y;
ensures \result == x || \result == y;
*/
DEFINE_MAX_FN(int64_t, int64)
Which is especially onerous if the macro is considered to be part of the user-facing API and not just an implementation detail. This is pretty common for things like test harnesses and data structure implementations.
With a pragma-based implementation the annotations could be moved into the macro:
#define DEFINE_MAX_FN(T, id) \
_Pragma("acsl ensures \\result >= x && \\result >= y") \
_Pragma("acsl ensures \\result == x || \\result == y") \
T id##_max (T x, T y) { \
return (x > y) ? x : y;
}
Just to give a bit of context, I'm interested in adding (partial) ACSL support to C compilers; not for full formal verification, but to help the compilers provide better compile-time warnings and errors.
I think the pragma-based syntax would also be more attractive for compilers as it would likely be easier to hook in to existing mechanisms in place to handle things like OpenMP, Cilk++, and compiler-specific directives. It's also seems a bit more palatable since the pragma mechanism is part of the C99 and C11 standards.
@vprevosto responded with:
I see your point. This might indeed be worth considering, even though I think that some clarifications are needed on what exactly can go after an acsl pragma. In particular, you seem to want to be able to split a single annotation into several pragmas, which implies an obvious question: how do we merge them? In particular, we should take care of easily distinguishing global annotations from function contracts, as in:
_Pragma("acsl lemma foo: \\true;") _Pragma("acsl requires \\true;") _Pragma("acsl ensures \\true;") int f() { return 0; }
That's definitely a problem. I think that allowing people to split annotations across multiple pragmas is important for readability, though.
IMHO perhaps the most natural thing would be to have keywords like lemma
apply to a single statement, but also add a begin/end syntax for multi-statement annotations. So, the above example would translate into something like
/*@ lemma foo: \true; */
/*@ requires: \true;
ensures: \true;
*/
int f() { return 0; }
However, you changed the code to something like
#pragma acsl begin lemma foo: \true;
#pragma acsl requires: \true;
#pragma acsl ensures: \true;
#pragma acsl end lemma;
You would end up with
/*@ lemma foo: \true
requires: \true;
ensures: \true;
*/
int f() { return 0; }
Of course you could also go with end lemma foo
, end foo
, or even just end
.
array.sub
array.map
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.