The behavior of the test case below is changed by Souper even with UB turned off. Of c

I tried looking at this, but I have a very hard time relating <div class="snippet-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The second optimization looks invalid to me. <div class="snippet-clipboard-content

Generated KLEE expression. Pondering if it's correct. <div class="snippet-clipboar

blockpc is unsound,about google/souper

Comments (62)

jeroenk commented on May 6, 2024

I tried looking at this, but I have a very hard time relating

%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
%3:i32 = var
%4:i1 = slt 1:i32, %3
blockpc %0 0 %4 0:i1
blockpc %0 1 %2 0:i1
%5:i1 = slt 0:i32, %3
%6:i32 = zext %5
%7:i32 = zext %4
%8:i32 = phi %0, %6, %7
%9:i1 = slt %1, %8

to the only fragment of IR it seems to map to:

entry:
  %0 = load i32* @x1, align 4, !tbaa !2
  %cmp = icmp slt i32 %0, 1
  br i1 %cmp, label %entry.if.end10_crit_edge, label %if.then

if.then:                                          ; preds = %entry
  %1 = load i32* @x0, align 4, !tbaa !2
  %cmp3 = icmp sgt i32 %1, 1
  %conv4 = zext i1 %cmp3 to i32
  br i1 %cmp3, label %cond.end, label %cond.false

cond.false:                                       ; preds = %if.then
  %cmp5 = icmp sgt i32 %1, 0
  %conv6 = zext i1 %cmp5 to i32
  br label %cond.end

cond.end:                                         ; preds = %cond.false, %if.then
  %cond = phi i32 [ %conv6, %cond.false ], [ %conv4, %if.then ]
  store i32 %cond, i32* @x5, align 4, !tbaa !2
  store i32 %cond, i32* @x2, align 4, !tbaa !2
  %cmp7 = icmp slt i32 %0, %cond

The phi node corresponds to the branch in the if.then block, which check the value of %cmp3. This can only correspond to %4:i1 = slt 1:i32, %3 in the souper code. Yet, only one of the blockpc's that follows %4 refers to this value. I would actually expect something like:

%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
pc %2 1:i1
%3:i32 = var
%4:i1 = slt 1:i32, %3
blockpc %0 0 %4 0:i1
blockpc %0 1 %4 1:i1
%5:i1 = slt 0:i32, %3
%6:i32 = zext %5
%7:i32 = zext %4
%8:i32 = phi %0, %6, %7
%9:i1 = slt %1, %8

But maybe I'm wrong and simply don't sufficiently understand blockpc's.

from souper.

regehr commented on May 6, 2024

Here's the critical transformation. I also have a hard time with blockpc, but it's a really valuable thing.

*** IR Dump After Combine redundant instructions ***
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
  %1 = load i32* @x1, align 4, !tbaa !1
  %2 = icmp slt i32 %1, 1
  br i1 %2, label %._crit_edge, label %3

._crit_edge:                                      ; preds = %0
  %.pre = load i32* @x2, align 4, !tbaa !1
  br label %14

; <label>:3                                       ; preds = %0
  %4 = load i32* @x0, align 4, !tbaa !1
  %5 = icmp sgt i32 %4, 1
  %6 = zext i1 %5 to i32
  br i1 %5, label %10, label %7

; <label>:7                                       ; preds = %3
  %8 = icmp sgt i32 %4, 0
  %9 = zext i1 %8 to i32
  br label %10

; <label>:10                                      ; preds = %3, %7
  %11 = phi i32 [ %9, %7 ], [ %6, %3 ]
  store i32 %11, i32* @x5, align 4, !tbaa !1
  store i32 %11, i32* @x2, align 4, !tbaa !1
  %12 = icmp slt i32 %1, %11
  br i1 %12, label %14, label %13

; <label>:13                                      ; preds = %10
  store i32 0, i32* @x3, align 4, !tbaa !1
  br label %14

; <label>:14                                      ; preds = %._crit_edge, %10, %13
  %15 = phi i32 [ %11, %10 ], [ %.pre, %._crit_edge ], [ %11, %13 ]
  store i32 %15, i32* @x4, align 4, !tbaa !1
  %16 = icmp eq i32 %1, 0
  br i1 %16, label %17, label %18

; <label>:17                                      ; preds = %14
  %puts = tail call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i64 0, i64 0))
  br label %18

; <label>:18                                      ; preds = %14, %17
  ret i32 0
}

; Listing applied replacements for main
; Using solver: Z3 + external cache + internal cache

; Replacing "  %12 = icmp slt i32 %1, %11"
; from ""
; with "i1 false" in:
%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
%3:i32 = var
%4:i1 = slt 1:i32, %3
blockpc %0 0 %4 0:i1
blockpc %0 1 %2 0:i1
%5:i1 = slt 0:i32, %3
%6:i32 = zext %5
%7:i32 = zext %4
%8:i32 = phi %0, %6, %7
%9:i1 = slt %1, %8
cand %9 0:i1

; Replacing "  %15 = icmp eq i32 %1, 0"
; from ""
; with "i1 false" in:
%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
blockpc %0 1 %2 0:i1
%3 = block 3
blockpc %3 1 %2 1:i1
%4:i32 = var
%5:i1 = slt 0:i32, %4
%6:i32 = zext %5
%7:i1 = slt 1:i32, %4
%8:i32 = zext %7
%9:i32 = phi %0, %6, %8
%10:i1 = slt %1, %9
blockpc %3 2 %10 0:i1
%11:i1 = eq 0:i32, %1
cand %11 0:i1
*** IR Dump After Souper super-optimizer pass ***
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
  %1 = load i32* @x1, align 4, !tbaa !1
  %2 = icmp slt i32 %1, 1
  br i1 %2, label %._crit_edge, label %3

._crit_edge:                                      ; preds = %0
  %.pre = load i32* @x2, align 4, !tbaa !1
  br label %13

; <label>:3                                       ; preds = %0
  %4 = load i32* @x0, align 4, !tbaa !1
  %5 = icmp sgt i32 %4, 1
  %6 = zext i1 %5 to i32
  br i1 %5, label %10, label %7

; <label>:7                                       ; preds = %3
  %8 = icmp sgt i32 %4, 0
  %9 = zext i1 %8 to i32
  br label %10

; <label>:10                                      ; preds = %3, %7
  %11 = phi i32 [ %9, %7 ], [ %6, %3 ]
  store i32 %11, i32* @x5, align 4, !tbaa !1
  store i32 %11, i32* @x2, align 4, !tbaa !1
  br i1 false, label %13, label %12

; <label>:12                                      ; preds = %10
  store i32 0, i32* @x3, align 4, !tbaa !1
  br label %13

; <label>:13                                      ; preds = %._crit_edge, %10, %12
  %14 = phi i32 [ %11, %10 ], [ %.pre, %._crit_edge ], [ %11, %12 ]
  store i32 %14, i32* @x4, align 4, !tbaa !1
  br i1 false, label %15, label %16

; <label>:15                                      ; preds = %13
  %puts = tail call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i64 0, i64 0))
  br label %16

; <label>:16                                      ; preds = %13, %15
  ret i32 0
}

from souper.

chenyang78 commented on May 6, 2024

blockpc asserts path conditions that must be true for a given branch. For example, given the following llvm bitcode:

entry:
    %0 = load i32* @x1, align 4, !tbaa !2
    %cmp = icmp slt i32 %0, 1
    br i1 %cmp, label %entry.if.end10_crit_edge, label %if.then

if.then:                                          ; preds = %entry
    %1 = load i32* @x0, align 4, !tbaa !2
    %cmp3 = icmp sgt i32 %1, 1
    %conv4 = zext i1 %cmp3 to i32
    br i1 %cmp3, label %cond.end, label %cond.false

cond.false:                                       ; preds = %if.then
    %cmp5 = icmp sgt i32 %1, 0
    %conv6 = zext i1 %cmp5 to i32
    br label %cond.end

cond.end:                                         ; preds = %cond.false, %if.then
    %cond = phi i32 [ %conv6, %cond.false ], [ %conv4, %if.then ]
    store i32 %cond, i32* @x5, align 4, !tbaa !2
    store i32 %cond, i32* @x2, align 4, !tbaa !2
    %cmp7 = icmp slt i32 %0, %cond

We start collecting blockpc(s) at the phi instruction %cond. We first look at the first incoming block, i.e., %cond.false. Because %cond.false has single predecessor %if.then and this predecessor's terminator is a conditional instruction, we add one blockpc as blockpc 0 %0 %cmp3 0. Then we examine the predecessor of %if.then. Similarly, because %if.then has single predecessor %entry, which has a conditional terminator, we add another blockpc blockpc %0 0 %cmp 0.

Now we finish the path starting from %cond.false, and we look at %cond's second incoming block %if.then. Similarly, we add blockpc %0 1 %cmp 0.

from souper.

regehr commented on May 6, 2024

Does the optimization look correct to you Yang? If so, then LLVM must be wrong somewhere else. I'm not sure how useful bugpoint is for wrong code bugs, anyone know? Otherwise we can do something less smart like cut-and-pasting stuff from the print-after-all output into lli.

from souper.

chenyang78 commented on May 6, 2024

@jeroenk For the souper IR that you gave, we cannot have pc, because we don't utilize context-sensitive control-flow information. In other words, we don't know the fact that %cond.false and %if.then come from the same predecessor %entry with the same branch condition. IMHO, this is probably one of the limitations of our current pc/blockpc harvesting strategy.

from souper.

chenyang78 commented on May 6, 2024

@regehr, I will investigate it tonight.

from souper.

regehr commented on May 6, 2024

Thanks Yang. Full optimization log is here if that's helpful.

http://pastebin.com/kHLxjv2A

from souper.

chenyang78 commented on May 6, 2024

The second optimization looks invalid to me.

%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
blockpc %0 1 %2 0:i1
%3 = block 3
blockpc %3 1 %2 1:i1
%4:i32 = var
%5:i1 = slt 0:i32, %4
%6:i32 = zext %5
%7:i1 = slt 1:i32, %4
%8:i32 = zext %7
%9:i32 = phi %0, %6, %8
%10:i1 = slt %1, %9
blockpc %3 2 %10 0:i1
%11:i1 = eq 0:i32, %1
cand %11 0:i1

Here we have %10 == 0, which means that %1 < %9 is false, i.e., %1 >= %9. Because %9 can be either 0 or 1, %1 may equal to 0. So, there is some bug in generating blockpc-related queries. Investigating.

from souper.

chenyang78 commented on May 6, 2024

Generated KLEE expression. Pondering if it's correct.

  (Or (Eq false
          (And (Or (Eq false (And N0:(Read 0 blockpred) N0))
                   N1:(Eq false (Slt N2:(Read w32 0 a1) 1)))
               (Or N0 N1)))
      (Eq false (Eq 0 N2)))

from souper.

jeroenk commented on May 6, 2024

It's correct, but not sufficient. The problem is that the N0 expression is not being guarded by anything. Hence, if we let

(Slt (Read w32 0 a1) 1)

evaluate to true, which happens when you follow the control flow from entry -> ._crit_edge -> 14 (referring to John's dump above), then you essentially end up with

(And (Eq false N0) N0)

for the (And ...) sub-expression, which can never be satisfied. This is perfectly fine, as we know that the branch on which N0 is needed is not the one being taken. However, this latter fact is not being encoded in the formula.

What exposes the problem here is that in addition to this we have that if

(Slt (Read w32 0 a1) 1)

evaluates to false, then we have a rightful simplification (which is essentially the first simplification souper spots).

from souper.

regehr commented on May 6, 2024

I just want to add that if there is a Souper bug here, it's a really hard one to find using Csmith, coming up only once every 12 hours or so on a fast quad core. Next time we might not be so lucky, so let's be really careful about this code.

from souper.

chenyang78 commented on May 6, 2024

@jeroenk , Thanks for the explanation! I don't quite understand this part: how can we get

(And (Eq false N0) N0)

by assuming

(Slt (Read w32 0 a1) 1)

to be true. Can you elaborate on that? Thanks.

from souper.

chenyang78 commented on May 6, 2024

OK, I see the problem now. Actually the bug relates to the part where we generate souper IR. The phi node relevant to %3 = block is not generated. If we look at the second IR in John's example, we only have %3 = block and blockpc %3, but the corresponding phi instruction doesn't exist. Looks like the way that we compute the equivalence classes for blockpcs has some problem. Working on a fix.

from souper.

jeroenk commented on May 6, 2024

@chenyang78 I assume you figured it out by now, but just for completeness, we have:

(And (Or (Eq false (And N0:(Read 0 blockpred) N0))
                   N1:(Eq false (Slt N2:(Read w32 0 a1) 1)))
               (Or N0 N1)))

By (And x x) = x this is equal to:

(And (Or (Eq false N0:(Read 0 blockpred) )
                   N1:(Eq false (Slt N2:(Read w32 0 a1) 1)))
               (Or N0 N1)))

then assuming (Slt (Read w32 0 a1) 1) we get:

(And (Or (Eq false N0:(Read 0 blockpred) )
                   N1:(Eq false true))
               (Or N0 N1)))

and by (Eq false true) = false:

(And (Or (Eq false N0:(Read 0 blockpred) )
                   N1:false)
               (Or N0 N1)))

Substituting N1:

(And (Or (Eq false N0:(Read 0 blockpred) )
                   false)
               (Or N0 false)))

and simplifying (Or x false) to x we get:

(And (Eq false N0:(Read 0 blockpred) )
               N0))

from souper.

chenyang78 commented on May 6, 2024

@jeroenk Thanks!

from souper.

regehr commented on May 6, 2024

First of 5 problems found by Csmith in a 24-hour run using Yang's issue_166 branch.

$ SOUPER_NO_SOUPER=1 /home/regehr/f/souper-yang/build/sclang -O2 small.c -o small1 ; ./small1
y
$ gcc -O2 small.c -o small1 ; ./small1
y
$ /home/regehr/f/souper-yang/build/sclang -O2 small.c -o small1 ; ./small1
x
$ cat small.c
int printf(const char *, ...);
int x0, x1, x2, x3;
int main() {
  if (x3) {
    x1 = x3;
    for (; x0 <= 0; x0++)
      ;
  }
  x2 = x1 |= x0;
  if (x2)
    printf("x\n");
  else
    printf("y\n");
  return 0;
}

from souper.

regehr commented on May 6, 2024

$ cat small.c
int printf(const char *, ...);
char x0;
int x1 = 1;
int x2() { return x0 ?: 1 % x0; }

int main() {
  if ((x1 || x2()) ^ 0) {
    if (x1 < 1) {
      ;
    } else {
      printf("x\n");
    }
  }
  return 0;
}
$ /home/regehr/f/souper-yang/build/sclang -O2 small.c -o small1 ; ./small1
$ /home/regehr/f/souper-yang/build/sclang -O0 small.c -o small1 ; ./small1
x
$ gcc -O2 small.c -o small1 ; ./small1
x

from souper.

regehr commented on May 6, 2024

Program from the previous comment is a good one since it results in only one souper optimization.

%0 = block 2
%1:i32 = var
%2:i1 = eq 0:i32, %1
blockpc %0 0 %2 1:i1
%3:i8 = var
%4:i1 = eq 0:i8, %3
blockpc %0 0 %4 1:i1
blockpc %0 1 %2 1:i1
%5 = block 2
%6:i32 = sext %3
%7:i32 = srem 1:i32, %6
%8:i32 = phi %0, %7, %6
%9:i1 = eq 0:i32, %8
%10:i1 = phi %5, 0:i1, %9
%11:i1 = slt %1, 1:i32
%12:i1 = or %10, %11
cand %12 1:i1

from souper.

regehr commented on May 6, 2024

int printf(const char *, ...);
int x0, x1, x2, x3;
volatile int x4;
void x5(x6) {
  x1 = 0;
  for (; x1 >= 0; x1--)
    if (x4) {
      x6 = x0;
      if (x6)
        x2 = 0;
    }
  x3 = x6 >= 1;
}

int main() {
  x5(0);
  printf("%x", x3);
  return 0;
}

from souper.

regehr commented on May 6, 2024

int printf(const char *, ...);
int x0, x1, x2, x3, x4, x5;
char x6;
int main() {
  x6 = x4;
  if (x6) {
    x5 = x1 == 0 ?: x4 % x1;
    if (x5)
      x4 = 0;
  } else {
    if (x0)
      x4 = 9;
    x3--;
  }
  x2 = 1 <= x4;
  printf("%x", x2);
  return 0;
}

from souper.

regehr commented on May 6, 2024

int printf(const char *, ...);
int x0, x1, x2, x3, x4, x5;
short x6;
void x7() {
  x2 = 5;
  for (; x2; x2--) {
    x6 = x4 ?: x1 > x4;
    x5 = x6 == 0 ?: x0 / x6;
    if (x5)
      break;
  }
}

int main() {
  x7();
  if (x4)
    for (; x3;)
      printf("x\n");
  return 0;
}

from souper.

chenyang78 commented on May 6, 2024

I will investigate those. Thanks, John.

from souper.

chenyang78 commented on May 6, 2024

OK, I see Souper's issue for your second program, i.e,

int printf(const char *, ...);
char x0;
int x1 = 1;
int x2() { return x0 ?: 1 % x0; }

int main() {
  if ((x1 || x2()) ^ 0) {
    if (x1 < 1) {
      ;
    } else {
      printf("x\n");
    }
  }
  return 0;
}

Here Souper wrongfully exploited the "UB" in the function x2. The program is UB-free, but Souper is not aware of the fact that x2() will never get invoked. The following simplified souper IR illustrates the problem:

%0 = block 2
%3:i8 = var
%4:i1 = eq 0:i8, %3
blockpc %0 0 %4 1:i1
%6:i32 = sext %3
%7:i32 = srem 1:i32, %6
%8:i32 = phi %0, %7, %6
...

blockpc %0 0 %4 1:i1 asserts that %3 is 0, and therefore %6 is 0. Consequently, we get UB for instruction %7.

from souper.

regehr commented on May 6, 2024

I believe that all of these problems occur when -souper-exploit-ub=true but please double check.

from souper.

regehr commented on May 6, 2024

Ok I double checked and this optimization does happen when UB is turned off:

$ ~/f/souper-yang/build/souper -souper-exploit-ub=false -stp-path=/usr/local/bin/stp small.bc
; Listing valid replacements.
; Using solver: STP + internal cache

; Static profile 1
; Function: main
%0 = block 2
%1:i32 = var
%2:i1 = eq 0:i32, %1
blockpc %0 0 %2 1:i1
%3:i8 = var
%4:i1 = eq 0:i8, %3
blockpc %0 0 %4 1:i1
blockpc %0 1 %2 1:i1
%5 = block 2
%6:i32 = sext %3
%7:i32 = srem 1:i32, %6
%8:i32 = phi %0, %7, %6
%9:i1 = eq 0:i32, %8
%10:i1 = phi %5, 0:i1, %9
%11:i1 = slt %1, 1:i32
%12:i1 = or %10, %11
cand %12 1:i1

from souper.

regehr commented on May 6, 2024

Here's the llvm (which looks correct):

; ModuleID = 'small.c'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@x1 = global i32 1, align 4
@x0 = common global i8 0, align 1
@str = private unnamed_addr constant [2 x i8] c"x\00"

; Function Attrs: nounwind readonly uwtable
define i32 @x2() #0 {
  %1 = load i8* @x0, align 1, !tbaa !1
  %2 = sext i8 %1 to i32
  %3 = icmp eq i8 %1, 0
  br i1 %3, label %4, label %6

; <label>:4                                       ; preds = %0
  %5 = srem i32 1, %2
  br label %6

; <label>:6                                       ; preds = %0, %4
  %7 = phi i32 [ %5, %4 ], [ %2, %0 ]
  ret i32 %7
}

; Function Attrs: nounwind uwtable
define i32 @main() #1 {
  %1 = load i32* @x1, align 4, !tbaa !4
  %2 = icmp eq i32 %1, 0
  br i1 %2, label %3, label %10

; <label>:3                                       ; preds = %0
  %4 = load i8* @x0, align 1, !tbaa !1
  %5 = sext i8 %4 to i32
  %6 = icmp eq i8 %4, 0
  br i1 %6, label %7, label %x2.exit

; <label>:7                                       ; preds = %3
  %8 = srem i32 1, %5
  br label %x2.exit

x2.exit:                                          ; preds = %3, %7
  %9 = phi i32 [ %8, %7 ], [ %5, %3 ]
  %phitmp = icmp eq i32 %9, 0
  br label %10

; <label>:10                                      ; preds = %0, %x2.exit
  %11 = phi i1 [ false, %0 ], [ %phitmp, %x2.exit ]
  %12 = icmp slt i32 %1, 1
  %or.cond = or i1 %11, %12
  br i1 %or.cond, label %14, label %13

; <label>:13                                      ; preds = %10
  %puts = tail call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i64 0, i64 0))
  br label %14

; <label>:14                                      ; preds = %10, %13
  ret i32 0
}

; Function Attrs: nounwind
declare i32 @puts(i8* nocapture readonly) #2

attributes #0 = { nounwind readonly uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"clang version 3.6.0 (trunk 223586)"}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"omnipotent char", metadata !3, i64 0}
!3 = metadata !{metadata !"Simple C/C++ TBAA"}
!4 = metadata !{metadata !5, metadata !5, i64 0}
!5 = metadata !{metadata !"int", metadata !2, i64 0}

from souper.

regehr commented on May 6, 2024

Below is a partial annotation of the Souper. It looks to me like the blockpcs are telling us that x1 == 0 regardless of which branch was taken. Is that correct, Yang?

%0 = block 2
%5 = block 2

%1:i32 = var               ; x1
%2:i1 = eq 0:i32, %1       ; x1 == 0
%3:i8 = var                ; x0
%4:i1 = eq 0:i8, %3        ; x0 == 0
%6:i32 = sext %3           ; x0 promoted to int
%7:i32 = srem 1:i32, %6    ; 1 % x0

blockpc %0 0 %2 1:i1       ; x1 == 0 if pred 0
blockpc %0 1 %2 1:i1       ; x1 == 0 if pred 1
blockpc %0 0 %4 1:i1       ; x0 == 0 if pred 0
%8:i32 = phi %0, %7, %6    ;

%9:i1 = eq 0:i32, %8       ;
%10:i1 = phi %5, 0:i1, %9
%11:i1 = slt %1, 1:i32     ; x1 < 1
%12:i1 = or %10, %11       ; 
cand %12 1:i1

from souper.

jeroenk commented on May 6, 2024

Going by Yang's earlier explanation, I think this says that for %8:i32 = phi %0, %7, %6, we may assume that:

For the the first alternative of phi node: %4 == 1 and %2 == 1
For the second alternative of phi node: %2 == 1

from souper.

regehr commented on May 6, 2024

Jeroen that is my understanding as well. The problem here, of course, is that x1 != 0.

from souper.

jeroenk commented on May 6, 2024

But, %8:i32 = phi %0, %7, %6 corresponds to %9 = phi i32 [ %8, %7 ], [ %5, %3 ] in the bitcode, which is only reachable in case x1 == 0. Or am I missing something?

from souper.

regehr commented on May 6, 2024

Jeroen, I think you're right. So I'm still not sure what's going wrong here.

One thing we should consider is what the no-UB mode does when there is a divide by zero. My understanding is that division by zero is permitted in SMTLIB2 code but that the result is unspecified. Would that be contributing to the problem here?

from souper.

chenyang78 commented on May 6, 2024

Thanks, John and Jeroen. I will take a close look at it tonight.

from souper.

regehr commented on May 6, 2024

I'm pretty sure the divide by zero isn't the problem...

from souper.

rsas commented on May 6, 2024

Div by zero records UB if the second op is a constant 0. In turn, KLEEBuilder returns constant 0 as the result. If any path conditions constrain the second op to be 0, solver doesn't complain. I did the following test:

%0:i32 = var
pc %0 0:i32
%1:i32 = udiv 1:i32, %0
cand %1 1:i32

$ souper-check $SOUPER_SOLVER -souper-exploit-ub=1 divbyzero.opt 
; LGTM

$ souper-check $SOUPER_SOLVER -souper-exploit-ub=0 divbyzero.opt 
Invalid, e.g.

%0 = 0

from souper.

jeroenk commented on May 6, 2024

@regehr Bitvector division in SMTLib is a total function, so anything divided by zero will have a well-defined answer. However, the solver is not allowed to make any assumptions about the result of the function in case the divisor is 0.

For a program in which the division by zero cannot actually be executed, I would expect the SMT query that is generated to be such that the value of the division does not matter for the satisfiability of the query.

from souper.

rsas commented on May 6, 2024

Thanks @jeroenk. I ran the test above with z3. stp says in both cases LGTM.

from souper.

jeroenk commented on May 6, 2024

The KLEE expression is as follows (writing A ==> B for (Or (Eq false A) B)) and with some trivial simplification:

(And (N0:(Read 0 blockpred0) ==> (And N1:(Eq 0 N2:(Read w32 0 a1)) (Eq 0 N3:(Read w8 0 a3))))
     (Or N0 N1))

    ==>

(Or (And (Eq false (Read 0 blockpred)) (Eq 0 (Select w32 N0 (SRem w32 1 N4:(SExt w32 N3)) N4)))
    (Slt N2 1))

Here, N0 represents the blockpc, N2 represents %1 (or x1 in the original C code), and N3 represents %3 (or x0 in the original). For the optimisation to hold the expression must be valid, i.e., whatever values I pick for N0, N1, and N2, the result must be true. Of course, this is not what we want in this case.

Let's concentrate on the case that N2 != 0 (I believe that the N2 == 0 case is properly encoded by the expression). If N2 != 0, the N1 == false, and the expression simplifies to:

(And (N0:(Read 0 blockpred0) ==> false)
        N0)

    ==>

(Or (And (Eq false (Read 0 blockpred)) (Eq 0 (Select w32 N0 (SRem w32 1 N4:(SExt w32 N3)) N4)))
    (Slt N2 1))

The second half of the formula simplifies depending on N2 >= 1, or N2 < 0, but this doesn't really matter, as

(And (N0:(Read 0 blockpred0) ==> false) N0)

is always false. And, hence, the whole formula is true if N2 != 0. So, N0 affects the validity of the expression even in case N2 != 0, and this should not happen, as it's related to the execution path we didn't choose in this case! It seems that there's something missing from the generated KLEE expression, but I'm not totally sure what that is (needs more thought).

from souper.

rsas commented on May 6, 2024

I think I'm missing here something, because for me the souper optimization output looks correct. Regardless the branch of %8:i32 = phi %0, %7, %6, %1 is always constrained to 0 (blockpcs are equal for both branches). Consequently, %11 is always 1 and herewith cand %12 1:i1 is correct. In other words, the optimization is equivalent to the opt with all blockpcs replaced with a single pc %2 1:i1. What is wrong with my reasoning?

from souper.

regehr commented on May 6, 2024

Raimondas, if you can't find a bug in the Souper optimization, there must be a harvesting bug, right?

from souper.

jeroenk commented on May 6, 2024

@rsas Naive question: wouldn't constraining %1 to 0 require blockpc's that refer to %5?

from souper.

rsas commented on May 6, 2024

Yes, if others prove me wrong. I don't see a bug in the Souper output and I see a problem in the LLVM as follows:

The first branching instruction br i1 %2, label %3, label %10 can jump to label %10 if %1 != 0. I hope we all agree here.

At label 10, the execution of %or.cond = or i1 %11, %12 is constrained by blockpcs (collected through %11) which in turn constrain %1. This is not ok, because there are no constraints on %1 when coming from the initial false branch and jumping to label 10.

from souper.

rsas commented on May 6, 2024

In summary, %12 can be 0 or 1 when coming from the initial false branch. On that path, there are no constraints on %1!

from souper.

chenyang78 commented on May 6, 2024

Thanks guys. @rsas, I think you are right. Looks like it relates to issue #157. We shouldn't skip constraints on branches, which can result in both missing-optimization and mis-optimization.

from souper.

rsas commented on May 6, 2024

@jeroenk can you please rephrase the question?

from souper.

jeroenk commented on May 6, 2024

@rsas Rephrasing: I think I'm fundamentally confused about what a list of souper instructions is. From the documentation and all previous discussions I got the impression that there's some importance in the order of the instructions, but following your reasoning above, it seems that I should view as a set and not as a list. We write it as a list to conveniently express some non-circular dependencies that may occur.

from souper.

rsas commented on May 6, 2024

Correct. You always start from the bottom cand instruction, collect all constraints recursively and add them at the top level.

from souper.

chenyang78 commented on May 6, 2024

OK, I think the fundamental problem here is that we are not taking control-flow facts into account. For the given example, two blockpcs on %2 cannot be viewed as a pure "union". Thinking about a way to encode control-flow info...

from souper.

vinodgro commented on May 6, 2024

I think so too. I suspect you'll need to encode both data flow and control
flow information into blockpc.

On Mon, Feb 9, 2015 at 6:27 PM, Yang Chen [email protected] wrote:

OK, I think the fundamental problem here is that we are not taking
control-flow facts into account. For the given example, two blockpcs on %2
cannot be viewed as a pure "union". Thinking about a way to encode
control-flow info...

—
Reply to this email directly or view it on GitHub
#166 (comment).

from souper.

rsas commented on May 6, 2024

@chenyang78 I think it's time for us to rethink the design of pcs/blockpcs/UBs/Phis. They all have things in common and the code is not trivial at all.

from souper.

rsas commented on May 6, 2024

At the same time, we have to encode poison values too.

from souper.

chenyang78 commented on May 6, 2024

@rsas I agree.

from souper.

regehr commented on May 6, 2024

One of the earliest major Souper bugs was where UB constraints weren't properly filtered through phi nodes. Raimondas, you solved that -- can you please explain the flavor of the solution and which aspects of it can be used (and not used) here?

Regarding poison/undef/UB, this is hard, but Nuno implemented a good solution in Alive. I'll mail everyone our draft Alive paper where this is explained pretty carefully. We're going to need to do something similar here, but note that there are some differences between poison/undef/UB as they are explained in the LangRef and as they really are treated by LLVM. It sucks to not have a real specification.

from souper.

rsas commented on May 6, 2024

The flow of UB constraints is encoded using Phi branch predicates. We add the UBs only if the corresponding Preds are true (Pred => UB). I'm not sure if we can easily adapt this concept to blockpcs though.

from souper.

regehr commented on May 6, 2024

Do we want to add a flag for turning off blockpc until we can resolve this?

from souper.

rsas commented on May 6, 2024

I second that.

from souper.

chenyang78 commented on May 6, 2024

blockpc constraints are generated in the same way as UBs. I will add a flag to disable blockpc for now.

from souper.

regehr commented on May 6, 2024

Thanks Yang.
Yes, I feel like there can be some shared mechanisms here. Let's take the time and get this right -- in the meantime Souper works perfectly well in degraded (no UB, no blockpc) mode.

from souper.

rsas commented on May 6, 2024

@chenyang78 I just created a pull request that clears the collected BPCs so that they are neither printed nor exploited. If we disable them in KLEEBuilder, they would be still printed which might be confusing. Let me know what you think.

from souper.

chenyang78 commented on May 6, 2024

@rsas Thanks for implementing it. I was considering to disable blockpc by default. It's fine to make it on by default. Enabling blockpc by default requires fewer changes to the relevant tests.

from souper.

rsas commented on May 6, 2024

Leaving the default option true encourages us to work on the fix faster ;)

from souper.

regehr commented on May 6, 2024

I was going to say the same thing.

from souper.

chenyang78 commented on May 6, 2024

@regehr @rsas. Agreed ;)

from souper.

blockpc is unsound about souper HOT 62 OPEN

Comments (62)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent