Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WRONG code, likely JumpThreadingPass #79175

Closed
JonPsson1 opened this issue Jan 23, 2024 · 8 comments · Fixed by #79294
Closed

WRONG code, likely JumpThreadingPass #79175

JonPsson1 opened this issue Jan 23, 2024 · 8 comments · Fixed by #79294

Comments

@JonPsson1
Copy link
Contributor

wrong0.tar.gz
(reduced c test case, should print 0)

runline:
clang -O3 -march=arch13 wrong0.i -o a.out -w -mllvm -available-load-scan-limit=12

The function n() below is called two times in the reduced test case. The first time f[0] has a value of 0 at the start of the function, and a value of 1 at the end. The second time n() is called, f[0] has a value of 1 at the top, while it is then set to 0 by k().

long k(long l, int m); // returns 0 

const int *n() {
  for (; C <= 0;) {
    A = (f[0] == 0 ? 1 : A % f[0]);
    f[C] = k(A, 0);
    g = &f[0];
    f[C] = 1 > *g;
    if (f[C])
      return &e;
    break;
  }
  return 0;
}

This is the transformation of jump threading:

                                                                          >     *** IR Dump After JumpThreadingPass on n ***
                                                                          >     ; Function Attrs: nounwind
define dso_local ptr @n() local_unnamed_addr #1 {                               define dso_local ptr @n() local_unnamed_addr #1 {
entry:                                                                          entry:
  %0 = load i64, ptr @C, align 8, !tbaa !4                                        %0 = load i64, ptr @C, align 8, !tbaa !4
  %cmp = icmp slt i64 %0, 1                                                       %cmp = icmp slt i64 %0, 1
  br i1 %cmp, label %for.body, label %for.end                                     br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %entry              for.body:                                         ; preds = %entry
  %1 = load i32, ptr @f, align 4, !tbaa !8                                        %1 = load i32, ptr @f, align 4, !tbaa !8
  %cmp1 = icmp eq i32 %1, 0                                                       %cmp1 = icmp eq i32 %1, 0
  br i1 %cmp1, label %cond.end, label %cond.false                         |       br i1 %cmp1, label %cond.end.thread, label %cond.end

cond.false:                                       ; preds = %for.body     |     cond.end.thread:                                  ; preds = %for.body
                                                                          >       store i64 1, ptr @A, align 8, !tbaa !4
                                                                          >       br label %3
                                                                          >
                                                                          >     cond.end:                                         ; preds = %for.body
  %2 = load i64, ptr @A, align 8, !tbaa !4                                        %2 = load i64, ptr @A, align 8, !tbaa !4
  %conv = sext i32 %1 to i64                                                      %conv = sext i32 %1 to i64
  %rem = srem i64 %2, %conv                                                       %rem = srem i64 %2, %conv
  br label %cond.end                                                      |       store i64 %rem, ptr @A, align 8, !tbaa !4
                                                                          >       %cmp.i = icmp sgt i64 %rem, 0
                                                                          >       %cond.fr = freeze i1 %cmp.i
                                                                          >       br i1 %cond.fr, label %3, label %4
                                                                          >
                                                                          >     3:                                                ; preds = %cond.end.
                                                                          >       %.pr = load i32, ptr @f, align 4, !tbaa !8
                                                                          >       br label %4

cond.end:                                         ; preds = %for.body,    |     4:                                                ; preds = %cond.end,
  %cond = phi i64 [ %rem, %cond.false ], [ 1, %for.body ]                 |       %5 = phi i32 [ %1, %cond.end ], [ %.pr, %3 ]
  store i64 %cond, ptr @A, align 8, !tbaa !4                              |       %6 = phi i64 [ 0, %3 ], [ %rem, %cond.end ]
  %cmp.i = icmp sgt i64 %cond, 0                                          |       %conv2 = trunc i64 %6 to i32
  %cond.i = select i1 %cmp.i, i64 0, i64 %cond                            <
  %conv2 = trunc i64 %cond.i to i32                                       <
  %arrayidx = getelementptr inbounds [1 x i32], ptr @f, i64 0, i64 %0             %arrayidx = getelementptr inbounds [1 x i32], ptr @f, i64 0, i64 %0
  store i32 %conv2, ptr %arrayidx, align 4, !tbaa !8                              store i32 %conv2, ptr %arrayidx, align 4, !tbaa !8
  store ptr @f, ptr @g, align 8, !tbaa !10                                        store ptr @f, ptr @g, align 8, !tbaa !10
  %3 = load i32, ptr @f, align 4, !tbaa !8                                |       %cmp3 = icmp slt i32 %5, 1
  %cmp3 = icmp slt i32 %3, 1                                              <
  %conv4 = zext i1 %cmp3 to i32                                                   %conv4 = zext i1 %cmp3 to i32
  store i32 %conv4, ptr %arrayidx, align 4, !tbaa !8                              store i32 %conv4, ptr %arrayidx, align 4, !tbaa !8
  br i1 %cmp3, label %return, label %for.end                                      br i1 %cmp3, label %return, label %for.end

for.end:                                          ; preds = %cond.end,    |     for.end:                                          ; preds = %4, %entry
  br label %return                                                                br label %return

return:                                           ; preds = %cond.end,    |     return:                                           ; preds = %4, %for.e
  %retval.0 = phi ptr [ null, %for.end ], [ @e, %cond.end ]               |       %retval.0 = phi ptr [ null, %for.end ], [ @e, %4 ]
  ret ptr %retval.0                                                               ret ptr %retval.0
}                                                                               }
~

Before (left), @f is reloaded for the final icmp in cond.end (%3), which is necessary as the store to %conv2 just above goes to the same address. However, to the right JumpThreading has changed the final icmp to use the %5 value, which does not reflect the stored value of %conv2. This seems wrong and maybe JT has missed the fact that %arrayidx aliases @f?

@jmorse @MatzeB @nikic

@jmorse
Copy link
Member

jmorse commented Jan 23, 2024

Here's a godbolt link seemingly showing the same thing: https://godbolt.org/z/dMPaP7161 . I can't replicate this at 6a7abea with just opt --passes=jump-threading, but can if it's part of clang and inspected with print-after-all.

@jmorse
Copy link
Member

jmorse commented Jan 23, 2024

Ah -- because it's sensitive to the -available-load-scan-limit=12, which defaults to six and needs to be at least ten for this problem to appear. Stepping through bits of opt, the AA trace believes something that's wrong:

Start   %arrayidx = getelementptr inbounds [1 x i32], ptr @f, i64 0, i64 %0 @ LocationSize::precise(4), @f = dso_local global [1 x i32] zeroinitializer, align 4 @ LocationSize::precise(4)
  Start @f = dso_local global [1 x i32] zeroinitializer, align 4 @ LocationSize::beforeOrAfterPointer, @f = dso_local global [1 x i32] zeroinitializer, align 4 @ LocationSize::beforeOrAfterPointer
  End @f = dso_local global [1 x i32] zeroinitializer, align 4 @ LocationSize::beforeOrAfterPointer, @f = dso_local global [1 x i32] zeroinitializer, align 4 @ LocationSize::beforeOrAfterPointer = MustAlias
End   %arrayidx = getelementptr inbounds [1 x i32], ptr @f, i64 0, i64 %0 @ LocationSize::precise(4), @f = dso_local global [1 x i32] zeroinitializer, align 4 @ LocationSize::precise(4) = NoAlias

Which I think is saying that the GEP is believed to not-alias the "f" variable, wheras it might alias depending on the value of %0. Stepping through other things during the alias query, I see isKnownNonNullFromDominatingCondition returning true for %0, i.e. the load from "C". I would imagine this means there's something wrong with the dominator tree that makes this code think it's only executing on a path where %0 is always-zero. (EDIT: actually it means always-non-zero).

(However this isn't my normal wheelhouse, so that might not be true!).

@nikic
Copy link
Contributor

nikic commented Jan 23, 2024

That sounds about right to me. JumpThreading performs lazy DT updates, so it's not legal to use DT during the transform.

BasicAA in principle already supports working without DT, but it may be a bit tricky to avoid the DT use just in JumpThreading, given how this is all wired up in the pass manager.

@nikic
Copy link
Contributor

nikic commented Jan 24, 2024

Somewhat reduced test case:

; RUN: opt -S -passes=jump-threading < %s
@f = external global i32

define void @test(i64 %idx, i32 %val) {
entry:
  %cmp = icmp slt i64 %idx, 1
  br i1 %cmp, label %for.body, label %return

for.body:
  %f = load i32, ptr @f, align 4
  %cmp1 = icmp eq i32 %f, 0
  br i1 %cmp1, label %cond.end, label %cond.false

cond.false:
  br label %cond.end

cond.end:
  %phi = phi i32 [ %val, %cond.false ], [ 1, %for.body ]
  %cmp.i = icmp sgt i32 %phi, 0
  %sel = select i1 %cmp.i, i32 0, i32 %phi
  %f.idx = getelementptr inbounds i32, ptr @f, i64 %idx
  store i32 %sel, ptr %f.idx, align 4
  %f.reload = load i32, ptr @f, align 4
  %cmp3 = icmp slt i32 %f.reload, 1
  br i1 %cmp3, label %return, label %return

return:
  ret void
}

@nikic
Copy link
Contributor

nikic commented Feb 1, 2024

/cherry-pick 89dae79 7143b45 4f32f5d

llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 1, 2024
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 1, 2024
…9294)

JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.

Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.

Fixes llvm#79175.

(cherry picked from commit 4f32f5d)
@llvmbot
Copy link
Member

llvmbot commented Feb 1, 2024

/pull-request #80274

@nikic nikic moved this from Needs Triage to Needs Review in LLVM Release Status Feb 1, 2024
@EugeneZelenko
Copy link
Contributor

Not merged yet.

@EugeneZelenko EugeneZelenko reopened this Feb 2, 2024
@tstellar
Copy link
Collaborator

tstellar commented Feb 3, 2024

PR has been created, we will track the status there.

@tstellar tstellar closed this as completed Feb 3, 2024
@tstellar tstellar moved this from Needs Review to Needs Merge in LLVM Release Status Feb 3, 2024
@tstellar tstellar moved this from Needs Merge to Done in LLVM Release Status Feb 3, 2024
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 5, 2024
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 5, 2024
…9294)

JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.

Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.

Fixes llvm#79175.

(cherry picked from commit 4f32f5d)
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
…9294)

JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.

Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.

Fixes llvm#79175.

(cherry picked from commit 4f32f5d)
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
…9294)

JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.

Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.

Fixes llvm#79175.

(cherry picked from commit 4f32f5d)
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
…9294)

JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.

Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.

Fixes llvm#79175.

(cherry picked from commit 4f32f5d)
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
tstellar pushed a commit to tstellar/llvm-project that referenced this issue Feb 14, 2024
…9294)

JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.

Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.

Fixes llvm#79175.

(cherry picked from commit 4f32f5d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment