Comments (62)
I tried looking at this, but I have a very hard time relating
%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
%3:i32 = var
%4:i1 = slt 1:i32, %3
blockpc %0 0 %4 0:i1
blockpc %0 1 %2 0:i1
%5:i1 = slt 0:i32, %3
%6:i32 = zext %5
%7:i32 = zext %4
%8:i32 = phi %0, %6, %7
%9:i1 = slt %1, %8
to the only fragment of IR it seems to map to:
entry:
%0 = load i32* @x1, align 4, !tbaa !2
%cmp = icmp slt i32 %0, 1
br i1 %cmp, label %entry.if.end10_crit_edge, label %if.then
if.then: ; preds = %entry
%1 = load i32* @x0, align 4, !tbaa !2
%cmp3 = icmp sgt i32 %1, 1
%conv4 = zext i1 %cmp3 to i32
br i1 %cmp3, label %cond.end, label %cond.false
cond.false: ; preds = %if.then
%cmp5 = icmp sgt i32 %1, 0
%conv6 = zext i1 %cmp5 to i32
br label %cond.end
cond.end: ; preds = %cond.false, %if.then
%cond = phi i32 [ %conv6, %cond.false ], [ %conv4, %if.then ]
store i32 %cond, i32* @x5, align 4, !tbaa !2
store i32 %cond, i32* @x2, align 4, !tbaa !2
%cmp7 = icmp slt i32 %0, %cond
The phi node corresponds to the branch in the if.then
block, which check the value of %cmp3
. This can only correspond to %4:i1 = slt 1:i32, %3
in the souper code. Yet, only one of the blockpc's that follows %4
refers to this value. I would actually expect something like:
%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
pc %2 1:i1
%3:i32 = var
%4:i1 = slt 1:i32, %3
blockpc %0 0 %4 0:i1
blockpc %0 1 %4 1:i1
%5:i1 = slt 0:i32, %3
%6:i32 = zext %5
%7:i32 = zext %4
%8:i32 = phi %0, %6, %7
%9:i1 = slt %1, %8
But maybe I'm wrong and simply don't sufficiently understand blockpc's.
from souper.
Here's the critical transformation. I also have a hard time with blockpc, but it's a really valuable thing.
*** IR Dump After Combine redundant instructions ***
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
%1 = load i32* @x1, align 4, !tbaa !1
%2 = icmp slt i32 %1, 1
br i1 %2, label %._crit_edge, label %3
._crit_edge: ; preds = %0
%.pre = load i32* @x2, align 4, !tbaa !1
br label %14
; <label>:3 ; preds = %0
%4 = load i32* @x0, align 4, !tbaa !1
%5 = icmp sgt i32 %4, 1
%6 = zext i1 %5 to i32
br i1 %5, label %10, label %7
; <label>:7 ; preds = %3
%8 = icmp sgt i32 %4, 0
%9 = zext i1 %8 to i32
br label %10
; <label>:10 ; preds = %3, %7
%11 = phi i32 [ %9, %7 ], [ %6, %3 ]
store i32 %11, i32* @x5, align 4, !tbaa !1
store i32 %11, i32* @x2, align 4, !tbaa !1
%12 = icmp slt i32 %1, %11
br i1 %12, label %14, label %13
; <label>:13 ; preds = %10
store i32 0, i32* @x3, align 4, !tbaa !1
br label %14
; <label>:14 ; preds = %._crit_edge, %10, %13
%15 = phi i32 [ %11, %10 ], [ %.pre, %._crit_edge ], [ %11, %13 ]
store i32 %15, i32* @x4, align 4, !tbaa !1
%16 = icmp eq i32 %1, 0
br i1 %16, label %17, label %18
; <label>:17 ; preds = %14
%puts = tail call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i64 0, i64 0))
br label %18
; <label>:18 ; preds = %14, %17
ret i32 0
}
; Listing applied replacements for main
; Using solver: Z3 + external cache + internal cache
; Replacing " %12 = icmp slt i32 %1, %11"
; from ""
; with "i1 false" in:
%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
%3:i32 = var
%4:i1 = slt 1:i32, %3
blockpc %0 0 %4 0:i1
blockpc %0 1 %2 0:i1
%5:i1 = slt 0:i32, %3
%6:i32 = zext %5
%7:i32 = zext %4
%8:i32 = phi %0, %6, %7
%9:i1 = slt %1, %8
cand %9 0:i1
; Replacing " %15 = icmp eq i32 %1, 0"
; from ""
; with "i1 false" in:
%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
blockpc %0 1 %2 0:i1
%3 = block 3
blockpc %3 1 %2 1:i1
%4:i32 = var
%5:i1 = slt 0:i32, %4
%6:i32 = zext %5
%7:i1 = slt 1:i32, %4
%8:i32 = zext %7
%9:i32 = phi %0, %6, %8
%10:i1 = slt %1, %9
blockpc %3 2 %10 0:i1
%11:i1 = eq 0:i32, %1
cand %11 0:i1
*** IR Dump After Souper super-optimizer pass ***
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
%1 = load i32* @x1, align 4, !tbaa !1
%2 = icmp slt i32 %1, 1
br i1 %2, label %._crit_edge, label %3
._crit_edge: ; preds = %0
%.pre = load i32* @x2, align 4, !tbaa !1
br label %13
; <label>:3 ; preds = %0
%4 = load i32* @x0, align 4, !tbaa !1
%5 = icmp sgt i32 %4, 1
%6 = zext i1 %5 to i32
br i1 %5, label %10, label %7
; <label>:7 ; preds = %3
%8 = icmp sgt i32 %4, 0
%9 = zext i1 %8 to i32
br label %10
; <label>:10 ; preds = %3, %7
%11 = phi i32 [ %9, %7 ], [ %6, %3 ]
store i32 %11, i32* @x5, align 4, !tbaa !1
store i32 %11, i32* @x2, align 4, !tbaa !1
br i1 false, label %13, label %12
; <label>:12 ; preds = %10
store i32 0, i32* @x3, align 4, !tbaa !1
br label %13
; <label>:13 ; preds = %._crit_edge, %10, %12
%14 = phi i32 [ %11, %10 ], [ %.pre, %._crit_edge ], [ %11, %12 ]
store i32 %14, i32* @x4, align 4, !tbaa !1
br i1 false, label %15, label %16
; <label>:15 ; preds = %13
%puts = tail call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i64 0, i64 0))
br label %16
; <label>:16 ; preds = %13, %15
ret i32 0
}
from souper.
blockpc asserts path conditions that must be true for a given branch. For example, given the following llvm bitcode:
entry:
%0 = load i32* @x1, align 4, !tbaa !2
%cmp = icmp slt i32 %0, 1
br i1 %cmp, label %entry.if.end10_crit_edge, label %if.then
if.then: ; preds = %entry
%1 = load i32* @x0, align 4, !tbaa !2
%cmp3 = icmp sgt i32 %1, 1
%conv4 = zext i1 %cmp3 to i32
br i1 %cmp3, label %cond.end, label %cond.false
cond.false: ; preds = %if.then
%cmp5 = icmp sgt i32 %1, 0
%conv6 = zext i1 %cmp5 to i32
br label %cond.end
cond.end: ; preds = %cond.false, %if.then
%cond = phi i32 [ %conv6, %cond.false ], [ %conv4, %if.then ]
store i32 %cond, i32* @x5, align 4, !tbaa !2
store i32 %cond, i32* @x2, align 4, !tbaa !2
%cmp7 = icmp slt i32 %0, %cond
We start collecting blockpc(s) at the phi instruction %cond
. We first look at the first incoming block, i.e., %cond.false
. Because %cond.false
has single predecessor %if.then
and this predecessor's terminator is a conditional instruction, we add one blockpc as blockpc 0 %0 %cmp3 0
. Then we examine the predecessor of %if.then
. Similarly, because %if.then
has single predecessor %entry
, which has a conditional terminator, we add another blockpc blockpc %0 0 %cmp 0
.
Now we finish the path starting from %cond.false
, and we look at %cond
's second incoming block %if.then
. Similarly, we add blockpc %0 1 %cmp 0
.
from souper.
Does the optimization look correct to you Yang? If so, then LLVM must be wrong somewhere else. I'm not sure how useful bugpoint is for wrong code bugs, anyone know? Otherwise we can do something less smart like cut-and-pasting stuff from the print-after-all output into lli.
from souper.
@jeroenk For the souper IR that you gave, we cannot have pc
, because we don't utilize context-sensitive control-flow information. In other words, we don't know the fact that %cond.false
and %if.then
come from the same predecessor %entry
with the same branch condition. IMHO, this is probably one of the limitations of our current pc/blockpc harvesting strategy.
from souper.
@regehr, I will investigate it tonight.
from souper.
Thanks Yang. Full optimization log is here if that's helpful.
from souper.
The second optimization looks invalid to me.
%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
blockpc %0 1 %2 0:i1
%3 = block 3
blockpc %3 1 %2 1:i1
%4:i32 = var
%5:i1 = slt 0:i32, %4
%6:i32 = zext %5
%7:i1 = slt 1:i32, %4
%8:i32 = zext %7
%9:i32 = phi %0, %6, %8
%10:i1 = slt %1, %9
blockpc %3 2 %10 0:i1
%11:i1 = eq 0:i32, %1
cand %11 0:i1
Here we have %10 == 0
, which means that %1 < %9
is false, i.e., %1 >= %9
. Because %9
can be either 0
or 1
, %1
may equal to 0
. So, there is some bug in generating blockpc-related queries. Investigating.
from souper.
Generated KLEE expression. Pondering if it's correct.
(Or (Eq false
(And (Or (Eq false (And N0:(Read 0 blockpred) N0))
N1:(Eq false (Slt N2:(Read w32 0 a1) 1)))
(Or N0 N1)))
(Eq false (Eq 0 N2)))
from souper.
It's correct, but not sufficient. The problem is that the N0
expression is not being guarded by anything. Hence, if we let
(Slt (Read w32 0 a1) 1)
evaluate to true, which happens when you follow the control flow from entry
-> ._crit_edge
-> 14
(referring to John's dump above), then you essentially end up with
(And (Eq false N0) N0)
for the (And ...)
sub-expression, which can never be satisfied. This is perfectly fine, as we know that the branch on which N0
is needed is not the one being taken. However, this latter fact is not being encoded in the formula.
What exposes the problem here is that in addition to this we have that if
(Slt (Read w32 0 a1) 1)
evaluates to false, then we have a rightful simplification (which is essentially the first simplification souper spots).
from souper.
I just want to add that if there is a Souper bug here, it's a really hard one to find using Csmith, coming up only once every 12 hours or so on a fast quad core. Next time we might not be so lucky, so let's be really careful about this code.
from souper.
@jeroenk , Thanks for the explanation! I don't quite understand this part: how can we get
(And (Eq false N0) N0)
by assuming
(Slt (Read w32 0 a1) 1)
to be true. Can you elaborate on that? Thanks.
from souper.
OK, I see the problem now. Actually the bug relates to the part where we generate souper IR. The phi node relevant to %3 = block
is not generated. If we look at the second IR in John's example, we only have %3 = block
and blockpc %3
, but the corresponding phi
instruction doesn't exist. Looks like the way that we compute the equivalence classes for blockpcs has some problem. Working on a fix.
from souper.
@chenyang78 I assume you figured it out by now, but just for completeness, we have:
(And (Or (Eq false (And N0:(Read 0 blockpred) N0))
N1:(Eq false (Slt N2:(Read w32 0 a1) 1)))
(Or N0 N1)))
By (And x x) = x
this is equal to:
(And (Or (Eq false N0:(Read 0 blockpred) )
N1:(Eq false (Slt N2:(Read w32 0 a1) 1)))
(Or N0 N1)))
then assuming (Slt (Read w32 0 a1) 1)
we get:
(And (Or (Eq false N0:(Read 0 blockpred) )
N1:(Eq false true))
(Or N0 N1)))
and by (Eq false true) = false
:
(And (Or (Eq false N0:(Read 0 blockpred) )
N1:false)
(Or N0 N1)))
Substituting N1
:
(And (Or (Eq false N0:(Read 0 blockpred) )
false)
(Or N0 false)))
and simplifying (Or x false)
to x
we get:
(And (Eq false N0:(Read 0 blockpred) )
N0))
from souper.
@jeroenk Thanks!
from souper.
First of 5 problems found by Csmith in a 24-hour run using Yang's issue_166 branch.
$ SOUPER_NO_SOUPER=1 /home/regehr/f/souper-yang/build/sclang -O2 small.c -o small1 ; ./small1
y
$ gcc -O2 small.c -o small1 ; ./small1
y
$ /home/regehr/f/souper-yang/build/sclang -O2 small.c -o small1 ; ./small1
x
$ cat small.c
int printf(const char *, ...);
int x0, x1, x2, x3;
int main() {
if (x3) {
x1 = x3;
for (; x0 <= 0; x0++)
;
}
x2 = x1 |= x0;
if (x2)
printf("x\n");
else
printf("y\n");
return 0;
}
from souper.
$ cat small.c
int printf(const char *, ...);
char x0;
int x1 = 1;
int x2() { return x0 ?: 1 % x0; }
int main() {
if ((x1 || x2()) ^ 0) {
if (x1 < 1) {
;
} else {
printf("x\n");
}
}
return 0;
}
$ /home/regehr/f/souper-yang/build/sclang -O2 small.c -o small1 ; ./small1
$ /home/regehr/f/souper-yang/build/sclang -O0 small.c -o small1 ; ./small1
x
$ gcc -O2 small.c -o small1 ; ./small1
x
from souper.
Program from the previous comment is a good one since it results in only one souper optimization.
%0 = block 2
%1:i32 = var
%2:i1 = eq 0:i32, %1
blockpc %0 0 %2 1:i1
%3:i8 = var
%4:i1 = eq 0:i8, %3
blockpc %0 0 %4 1:i1
blockpc %0 1 %2 1:i1
%5 = block 2
%6:i32 = sext %3
%7:i32 = srem 1:i32, %6
%8:i32 = phi %0, %7, %6
%9:i1 = eq 0:i32, %8
%10:i1 = phi %5, 0:i1, %9
%11:i1 = slt %1, 1:i32
%12:i1 = or %10, %11
cand %12 1:i1
from souper.
int printf(const char *, ...);
int x0, x1, x2, x3;
volatile int x4;
void x5(x6) {
x1 = 0;
for (; x1 >= 0; x1--)
if (x4) {
x6 = x0;
if (x6)
x2 = 0;
}
x3 = x6 >= 1;
}
int main() {
x5(0);
printf("%x", x3);
return 0;
}
from souper.
int printf(const char *, ...);
int x0, x1, x2, x3, x4, x5;
char x6;
int main() {
x6 = x4;
if (x6) {
x5 = x1 == 0 ?: x4 % x1;
if (x5)
x4 = 0;
} else {
if (x0)
x4 = 9;
x3--;
}
x2 = 1 <= x4;
printf("%x", x2);
return 0;
}
from souper.
int printf(const char *, ...);
int x0, x1, x2, x3, x4, x5;
short x6;
void x7() {
x2 = 5;
for (; x2; x2--) {
x6 = x4 ?: x1 > x4;
x5 = x6 == 0 ?: x0 / x6;
if (x5)
break;
}
}
int main() {
x7();
if (x4)
for (; x3;)
printf("x\n");
return 0;
}
from souper.
I will investigate those. Thanks, John.
from souper.
OK, I see Souper's issue for your second program, i.e,
int printf(const char *, ...);
char x0;
int x1 = 1;
int x2() { return x0 ?: 1 % x0; }
int main() {
if ((x1 || x2()) ^ 0) {
if (x1 < 1) {
;
} else {
printf("x\n");
}
}
return 0;
}
Here Souper wrongfully exploited the "UB" in the function x2
. The program is UB-free, but Souper is not aware of the fact that x2()
will never get invoked. The following simplified souper IR illustrates the problem:
%0 = block 2
%3:i8 = var
%4:i1 = eq 0:i8, %3
blockpc %0 0 %4 1:i1
%6:i32 = sext %3
%7:i32 = srem 1:i32, %6
%8:i32 = phi %0, %7, %6
...
blockpc %0 0 %4 1:i1
asserts that %3
is 0
, and therefore %6
is 0
. Consequently, we get UB for instruction %7
.
from souper.
I believe that all of these problems occur when -souper-exploit-ub=true but please double check.
from souper.
Ok I double checked and this optimization does happen when UB is turned off:
$ ~/f/souper-yang/build/souper -souper-exploit-ub=false -stp-path=/usr/local/bin/stp small.bc
; Listing valid replacements.
; Using solver: STP + internal cache
; Static profile 1
; Function: main
%0 = block 2
%1:i32 = var
%2:i1 = eq 0:i32, %1
blockpc %0 0 %2 1:i1
%3:i8 = var
%4:i1 = eq 0:i8, %3
blockpc %0 0 %4 1:i1
blockpc %0 1 %2 1:i1
%5 = block 2
%6:i32 = sext %3
%7:i32 = srem 1:i32, %6
%8:i32 = phi %0, %7, %6
%9:i1 = eq 0:i32, %8
%10:i1 = phi %5, 0:i1, %9
%11:i1 = slt %1, 1:i32
%12:i1 = or %10, %11
cand %12 1:i1
from souper.
Here's the llvm (which looks correct):
; ModuleID = 'small.c'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@x1 = global i32 1, align 4
@x0 = common global i8 0, align 1
@str = private unnamed_addr constant [2 x i8] c"x\00"
; Function Attrs: nounwind readonly uwtable
define i32 @x2() #0 {
%1 = load i8* @x0, align 1, !tbaa !1
%2 = sext i8 %1 to i32
%3 = icmp eq i8 %1, 0
br i1 %3, label %4, label %6
; <label>:4 ; preds = %0
%5 = srem i32 1, %2
br label %6
; <label>:6 ; preds = %0, %4
%7 = phi i32 [ %5, %4 ], [ %2, %0 ]
ret i32 %7
}
; Function Attrs: nounwind uwtable
define i32 @main() #1 {
%1 = load i32* @x1, align 4, !tbaa !4
%2 = icmp eq i32 %1, 0
br i1 %2, label %3, label %10
; <label>:3 ; preds = %0
%4 = load i8* @x0, align 1, !tbaa !1
%5 = sext i8 %4 to i32
%6 = icmp eq i8 %4, 0
br i1 %6, label %7, label %x2.exit
; <label>:7 ; preds = %3
%8 = srem i32 1, %5
br label %x2.exit
x2.exit: ; preds = %3, %7
%9 = phi i32 [ %8, %7 ], [ %5, %3 ]
%phitmp = icmp eq i32 %9, 0
br label %10
; <label>:10 ; preds = %0, %x2.exit
%11 = phi i1 [ false, %0 ], [ %phitmp, %x2.exit ]
%12 = icmp slt i32 %1, 1
%or.cond = or i1 %11, %12
br i1 %or.cond, label %14, label %13
; <label>:13 ; preds = %10
%puts = tail call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i64 0, i64 0))
br label %14
; <label>:14 ; preds = %10, %13
ret i32 0
}
; Function Attrs: nounwind
declare i32 @puts(i8* nocapture readonly) #2
attributes #0 = { nounwind readonly uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.6.0 (trunk 223586)"}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"omnipotent char", metadata !3, i64 0}
!3 = metadata !{metadata !"Simple C/C++ TBAA"}
!4 = metadata !{metadata !5, metadata !5, i64 0}
!5 = metadata !{metadata !"int", metadata !2, i64 0}
from souper.
Below is a partial annotation of the Souper. It looks to me like the blockpcs are telling us that x1 == 0 regardless of which branch was taken. Is that correct, Yang?
%0 = block 2
%5 = block 2
%1:i32 = var ; x1
%2:i1 = eq 0:i32, %1 ; x1 == 0
%3:i8 = var ; x0
%4:i1 = eq 0:i8, %3 ; x0 == 0
%6:i32 = sext %3 ; x0 promoted to int
%7:i32 = srem 1:i32, %6 ; 1 % x0
blockpc %0 0 %2 1:i1 ; x1 == 0 if pred 0
blockpc %0 1 %2 1:i1 ; x1 == 0 if pred 1
blockpc %0 0 %4 1:i1 ; x0 == 0 if pred 0
%8:i32 = phi %0, %7, %6 ;
%9:i1 = eq 0:i32, %8 ;
%10:i1 = phi %5, 0:i1, %9
%11:i1 = slt %1, 1:i32 ; x1 < 1
%12:i1 = or %10, %11 ;
cand %12 1:i1
from souper.
Going by Yang's earlier explanation, I think this says that for %8:i32 = phi %0, %7, %6
, we may assume that:
- For the the first alternative of phi node: %4 == 1 and %2 == 1
- For the second alternative of phi node: %2 == 1
from souper.
Jeroen that is my understanding as well. The problem here, of course, is that x1 != 0.
from souper.
But, %8:i32 = phi %0, %7, %6
corresponds to %9 = phi i32 [ %8, %7 ], [ %5, %3 ]
in the bitcode, which is only reachable in case x1 == 0. Or am I missing something?
from souper.
Jeroen, I think you're right. So I'm still not sure what's going wrong here.
One thing we should consider is what the no-UB mode does when there is a divide by zero. My understanding is that division by zero is permitted in SMTLIB2 code but that the result is unspecified. Would that be contributing to the problem here?
from souper.
Thanks, John and Jeroen. I will take a close look at it tonight.
from souper.
I'm pretty sure the divide by zero isn't the problem...
from souper.
Div by zero records UB if the second op is a constant 0. In turn, KLEEBuilder returns constant 0 as the result. If any path conditions constrain the second op to be 0, solver doesn't complain. I did the following test:
%0:i32 = var
pc %0 0:i32
%1:i32 = udiv 1:i32, %0
cand %1 1:i32
$ souper-check $SOUPER_SOLVER -souper-exploit-ub=1 divbyzero.opt
; LGTM
$ souper-check $SOUPER_SOLVER -souper-exploit-ub=0 divbyzero.opt
Invalid, e.g.
%0 = 0
from souper.
@regehr Bitvector division in SMTLib is a total function, so anything divided by zero will have a well-defined answer. However, the solver is not allowed to make any assumptions about the result of the function in case the divisor is 0.
For a program in which the division by zero cannot actually be executed, I would expect the SMT query that is generated to be such that the value of the division does not matter for the satisfiability of the query.
from souper.
Thanks @jeroenk. I ran the test above with z3. stp says in both cases LGTM
.
from souper.
The KLEE expression is as follows (writing A ==> B
for (Or (Eq false A) B)
) and with some trivial simplification:
(And (N0:(Read 0 blockpred0) ==> (And N1:(Eq 0 N2:(Read w32 0 a1)) (Eq 0 N3:(Read w8 0 a3))))
(Or N0 N1))
==>
(Or (And (Eq false (Read 0 blockpred)) (Eq 0 (Select w32 N0 (SRem w32 1 N4:(SExt w32 N3)) N4)))
(Slt N2 1))
Here, N0 represents the blockpc, N2 represents %1 (or x1 in the original C code), and N3 represents %3 (or x0 in the original). For the optimisation to hold the expression must be valid, i.e., whatever values I pick for N0, N1, and N2, the result must be true. Of course, this is not what we want in this case.
Let's concentrate on the case that N2 != 0 (I believe that the N2 == 0 case is properly encoded by the expression). If N2 != 0, the N1 == false, and the expression simplifies to:
(And (N0:(Read 0 blockpred0) ==> false)
N0)
==>
(Or (And (Eq false (Read 0 blockpred)) (Eq 0 (Select w32 N0 (SRem w32 1 N4:(SExt w32 N3)) N4)))
(Slt N2 1))
The second half of the formula simplifies depending on N2 >= 1, or N2 < 0, but this doesn't really matter, as
(And (N0:(Read 0 blockpred0) ==> false) N0)
is always false. And, hence, the whole formula is true if N2 != 0. So, N0 affects the validity of the expression even in case N2 != 0, and this should not happen, as it's related to the execution path we didn't choose in this case! It seems that there's something missing from the generated KLEE expression, but I'm not totally sure what that is (needs more thought).
from souper.
I think I'm missing here something, because for me the souper optimization output looks correct. Regardless the branch of %8:i32 = phi %0, %7, %6
, %1
is always constrained to 0
(blockpcs are equal for both branches). Consequently, %11
is always 1
and herewith cand %12 1:i1
is correct. In other words, the optimization is equivalent to the opt with all blockpcs replaced with a single pc %2 1:i1
. What is wrong with my reasoning?
from souper.
Raimondas, if you can't find a bug in the Souper optimization, there must be a harvesting bug, right?
from souper.
@rsas Naive question: wouldn't constraining %1
to 0
require blockpc's that refer to %5
?
from souper.
Yes, if others prove me wrong. I don't see a bug in the Souper output and I see a problem in the LLVM as follows:
The first branching instruction br i1 %2, label %3, label %10
can jump to label %10
if %1 != 0
. I hope we all agree here.
At label 10, the execution of %or.cond = or i1 %11, %12
is constrained by blockpcs (collected through %11
) which in turn constrain %1
. This is not ok, because there are no constraints on %1
when coming from the initial false branch and jumping to label 10.
from souper.
In summary, %12
can be 0 or 1 when coming from the initial false branch. On that path, there are no constraints on %1
!
from souper.
Thanks guys. @rsas, I think you are right. Looks like it relates to issue #157. We shouldn't skip constraints on branches, which can result in both missing-optimization and mis-optimization.
from souper.
@jeroenk can you please rephrase the question?
from souper.
@rsas Rephrasing: I think I'm fundamentally confused about what a list of souper instructions is. From the documentation and all previous discussions I got the impression that there's some importance in the order of the instructions, but following your reasoning above, it seems that I should view as a set and not as a list. We write it as a list to conveniently express some non-circular dependencies that may occur.
from souper.
Correct. You always start from the bottom cand instruction, collect all constraints recursively and add them at the top level.
from souper.
OK, I think the fundamental problem here is that we are not taking control-flow facts into account. For the given example, two blockpcs
on %2
cannot be viewed as a pure "union". Thinking about a way to encode control-flow info...
from souper.
I think so too. I suspect you'll need to encode both data flow and control
flow information into blockpc.
On Mon, Feb 9, 2015 at 6:27 PM, Yang Chen [email protected] wrote:
OK, I think the fundamental problem here is that we are not taking
control-flow facts into account. For the given example, two blockpcs on %2
cannot be viewed as a pure "union". Thinking about a way to encode
control-flow info...—
Reply to this email directly or view it on GitHub
#166 (comment).
from souper.
@chenyang78 I think it's time for us to rethink the design of pcs/blockpcs/UBs/Phis. They all have things in common and the code is not trivial at all.
from souper.
At the same time, we have to encode poison values too.
from souper.
@rsas I agree.
from souper.
One of the earliest major Souper bugs was where UB constraints weren't properly filtered through phi nodes. Raimondas, you solved that -- can you please explain the flavor of the solution and which aspects of it can be used (and not used) here?
Regarding poison/undef/UB, this is hard, but Nuno implemented a good solution in Alive. I'll mail everyone our draft Alive paper where this is explained pretty carefully. We're going to need to do something similar here, but note that there are some differences between poison/undef/UB as they are explained in the LangRef and as they really are treated by LLVM. It sucks to not have a real specification.
from souper.
The flow of UB constraints is encoded using Phi branch predicates. We add the UBs only if the corresponding Preds are true (Pred => UB
). I'm not sure if we can easily adapt this concept to blockpcs
though.
from souper.
Do we want to add a flag for turning off blockpc until we can resolve this?
from souper.
I second that.
from souper.
blockpc
constraints are generated in the same way as UBs. I will add a flag to disable blockpc
for now.
from souper.
Thanks Yang.
Yes, I feel like there can be some shared mechanisms here. Let's take the time and get this right -- in the meantime Souper works perfectly well in degraded (no UB, no blockpc) mode.
from souper.
@chenyang78 I just created a pull request that clears the collected BPCs so that they are neither printed nor exploited. If we disable them in KLEEBuilder, they would be still printed which might be confusing. Let me know what you think.
from souper.
@rsas Thanks for implementing it. I was considering to disable blockpc by default. It's fine to make it on by default. Enabling blockpc by default requires fewer changes to the relevant tests.
from souper.
Leaving the default option true
encourages us to work on the fix faster ;)
from souper.
I was going to say the same thing.
from souper.
from souper.
Related Issues (20)
- Souper crash in codegen from valid transformation HOT 5
- Assertion failed: (EB.get(&I)->hasOrigin(&I)), function ExtractExprCandidates, file .../lib/Extractor/Candidates.cpp, line 1025 HOT 6
- ICE: Assertion `false && "Phi with muliple arguments unimplemented"' failed HOT 1
- Instruction does not dominate all uses!
- Not deterministic SMT query string HOT 1
- Build failure HOT 1
- use error HOT 5
- LLVM Test Suite test failure with NO_INFER
- Build failure HOT 2
- futur m1 support ?
- Better spreading SMT solver load?
- [alive2] `IfNonPoison` removed upstream HOT 1
- Souper fails to turn multiplication by a power of two into a shift? HOT 3
- building for m1 HOT 2
- Packaging Souper HOT 1
- Possible bugs when generating inputs for dataflow pruning HOT 3
- Rgiht way to use souper HOT 2
- Hi, Is this superoptimizer hardware irrelvant? HOT 4
- Question about Souper IR's advantage compare to LLVM IR? HOT 1
- not found klee and alive2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from souper.