Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IndVars][SCEVExpander] Incorrect reuse of disjoint or instructions #79861

Closed
zhendongsu opened this issue Jan 29, 2024 · 15 comments · Fixed by #80458
Closed

[IndVars][SCEVExpander] Incorrect reuse of disjoint or instructions #79861

zhendongsu opened this issue Jan 29, 2024 · 15 comments · Fixed by #80458

Comments

@zhendongsu
Copy link

It appears to be a recent regression as it doesn't reproduce with 17.0.1.

Compiler Explorer: https://godbolt.org/z/W5WEb7Eev

[642] % clangtk -v
clang version 19.0.0git (https://github.com/llvm/llvm-project.git febb4c42b192ed7c88c17f91cb903a59acf20baf)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /local/suz-local/opfuzz/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/10
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Candidate multilib: .;@m64
Selected multilib: .;@m64
[643] % 
[643] % clangtk -O2 small.c; ./a.out
[644] % 
[644] % clangtk -O3 small.c
[645] % ./a.out
Aborted
[646] % 
[646] % cat small.c
short a, e;
int b[2][5] = {{0, 0, 3, 0, 0}, {0, 0, 0, 0, 0}}, c, d, *f, *g;
short h(short j) { return j ? a % j : 0; }
void k() {
  int **l = &f;
  for (int i = 0; i < 2; i++)
    g = &c;
  d = 2;
  for (; d; d--) {
    *l = g;
    **l = 0;
    for (e = 0; e < 2; e++) {
      h(d);
      b[e][d + 2] = 0;
      if (d)
        *l = 0;
    }
  }
}
int main() {
  k();
  if (b[0][2] != 3)
    __builtin_abort();
  return 0;
}
@dcb314
Copy link

dcb314 commented Jan 30, 2024

It seems to go wrong sometime before git hash bc82cfb,
from about 21 January.

@antoniofrighetto
Copy link
Contributor

I think the idiosyncrasy here stems from IndVarSimplifyPass / SCEV. In O2, the IR of function k before IndVarSimplifyPass is:

for.body2.i:                                      ; preds = %for.body2.i, %entry
  %0 = phi i32 [ 2, %entry ], [ %dec.i, %for.body2.i ]
  %1 = and i32 %0, 65535
  %tobool.not.i.i = icmp eq i32 %1, 0
  %add21.i = add nsw i32 %0, 2
  %add.i = or disjoint i32 %0, 2
  %idxprom8.idxprom822.v.i = select i1 %tobool.not.i.i, i32 %add.i, i32 %add21.i
  %idxprom8.idxprom822.i = sext i32 %idxprom8.idxprom822.v.i to i64
  ; memory accesses omitted
  %dec.i = add nsw i32 %0, -1
  %tobool.not.i = icmp eq i32 %dec.i, 0
  br i1 %tobool.not.i, label %k.exit, label %for.body2.i

Conversely, in O3 the IR looks:

for.body2.i:                                      ; preds = %for.body2.i, %entry
  %0 = phi i32 [ 2, %entry ], [ %dec.i, %for.body2.i ]
  %1 = and i32 %0, 65535
  %tobool.not.i.i = icmp eq i32 %1, 0
  %add.i = or disjoint i32 %0, 2
  %add21.i = add nsw i32 %0, 2
  %idxprom822.sink33.v.i = select i1 %tobool.not.i.i, i32 %add.i, i32 %add21.i
  %idxprom822.sink33.i = sext i32 %idxprom822.sink33.v.i to i64
  ; memory accesses omitted
  %dec.i = add nsw i32 %0, -1
  %tobool.not.i = icmp eq i32 %dec.i, 0
  br i1 %tobool.not.i, label %k.exit, label %for.body2.i

InstCombine arranges the add and the or disjoint differently here, which looks valid to me. However, after IndVarSimplify the two array indexes lead to two different memory accesses:
O2:

for.body2.i:                                      ; preds = %for.body2.i, %entry
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body2.i ], [ 2, %entry ]
  %0 = add nuw nsw i64 %indvars.iv, 2
  %arrayidx923.i = getelementptr inbounds [2 x [5 x i32]], ptr @b, i64 0, i64 0, i64 %0
  %arrayidx923.1.i = getelementptr inbounds [2 x [5 x i32]], ptr @b, i64 0, i64 1, i64 %0

O3:

for.body2.i:                                      ; preds = %for.body2.i, %entry
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body2.i ], [ 2, %entry ]
  %0 = or disjoint i64 %indvars.iv, 2
  %arrayidx923.i = getelementptr inbounds [2 x [5 x i32]], ptr @b, i64 0, i64 0, i64 %0
  %arrayidx923.1.i = getelementptr inbounds [2 x [5 x i32]], ptr @b, i64 0, i64 1, i64 %0

I can take a look, maybe @nikic might confirm this though (Alive2 says source functions are UB: https://alive2.llvm.org/ce/z/vK-zUe).

@nikic
Copy link
Contributor

nikic commented Jan 30, 2024

@antoniofrighetto Just from your description, I suspect this may be related to SCEVExpander value reuse. SCEV will add a mapping from (%indvars.iv + 2) to or disjoint and then SCEVExpander could use it to expand the addition.

@dcb314
Copy link

dcb314 commented Jan 30, 2024

I suspect releases are on a different branch to the mainstream, but the date
of the 17.0.1 release is 2023-09-09, probably about the time of the commit 98e6deb.

With my previous comment, that gives me a git range, so I will have a look at half-way,
about date 2023-11-13 commit 638a839 and see if that is broken or not.

@dcb314
Copy link

dcb314 commented Jan 30, 2024

Date 2023-11-13 seems good, trying 2023-12-13, commit 2c5fe14.

@dcb314
Copy link

dcb314 commented Jan 30, 2024

2023-12-13 tested bad, so trying 2023-11-29 commit 771e9cd

@dcb314
Copy link

dcb314 commented Jan 30, 2024

Date 2023-11-29 looks good, so trying date 2023-12-06 commit 546a9ce.

@dcb314
Copy link

dcb314 commented Jan 30, 2024

2023-12-06 looks bad, so current range of git hashes is 771e9cd .. 546a9ce,
which is 803 commits.

@nikic
Copy link
Contributor

nikic commented Feb 1, 2024

Reduced test case:

declare void @use(i64)

define void @test() {
entry:
  br label %loop

loop:
  %iv = phi i64 [ 2, %entry ], [ %iv.dec, %loop ]
  %iv.zero = icmp eq i64 %iv, 0
  %or = or disjoint i64 %iv, 1
  %add = add nsw i64 %iv, 1
  %sel = select i1 %iv.zero, i64 %or, i64 %add
  call void @use(i64 %sel)

  %iv.dec = add nsw i64 %iv, -1
  %exit.cond = icmp eq i64 %iv.dec, 0
  br i1 %exit.cond, label %exit, label %loop

exit:
  ret void
}

@nikic nikic self-assigned this Feb 1, 2024
@nikic
Copy link
Contributor

nikic commented Feb 1, 2024

Okay, the guess about SCEVExpander wasn't right (though I suspect there may still be an issue there). This particular problem is introduced by eliminateIdentitySCEV().

It replaces an instruction with an instruction operand if they have the same SCEV -- but this may not be poison-safe.

Here's another variant that doesn't use the new or disjoint flag:

define void @test2(i64 %n) {
entry:
  br label %loop

loop:
  %iv = phi i64 [ 0, %entry ], [ %iv.inc, %loop ]
  %add1 = add nuw nsw i64 %iv, 123
  %add2 = add i64 %iv, 123
  %sel = select i1 false, i64 %add1, i64 %add2
  call void @use(i64 %sel)

  %iv.inc = add i64 %iv, 1
  %exit.cond = icmp eq i64 %iv.inc, %n
  br i1 %exit.cond, label %exit, label %loop

exit:
  ret void
}

This will pick the add instruction with the nowrap flags, which may not be correct.

@nikic
Copy link
Contributor

nikic commented Feb 1, 2024

Here's a variant that demonstrates the SCEVExpander issue:

target datalayout = "n64"

declare void @use(i64)

define void @test3(i64 %n) {
entry:
  %or = or disjoint i64 %n, 1
  br label %loop

loop:
  %iv = phi i64 [ 0, %entry ], [ %iv.inc, %loop ]
  %iv.inc = add i64 %iv, 1
  %add = add i64 %iv, %or
  call void @use(i64 %add)
  %cmp = icmp ult i64 %iv, %n
  br i1 %cmp, label %loop, label %exit

exit:
  ret void
}

This will rewrite the condition to compare with %or, while it should be n + 1, which are not the same if @use is not UB for poison values.

@nikic
Copy link
Contributor

nikic commented Feb 1, 2024

I've put up #80281 for the SCEVExpander part of this. I think the IndVars side fix will involve reusing the same legality analysis SCEVExpander performs, so it's good to fix that first.

nikic added a commit to nikic/llvm-project that referenced this issue Feb 1, 2024
SCEV treats "or disjoint" the same as "add nsw nuw". However,
when expanding, we cannot generally replace an add SCEV node with
an "or disjoint" instruction. Just dropping the poison flag is
insufficient in this case, we would have to actually convert the
or into an add.

This is a partial fix for llvm#79861.
@nikic nikic changed the title wrong code at -O3 on x86_64-linux-gnu [IndVars][SCEVExpander] Incorrect reuse of disjoint or instructions Feb 1, 2024
@EugeneZelenko EugeneZelenko added llvm:SCEV Scalar Evolution and removed llvm:optimizations labels Feb 1, 2024
smithp35 pushed a commit to smithp35/llvm-project that referenced this issue Feb 1, 2024
carlosgalvezp pushed a commit to carlosgalvezp/llvm-project that referenced this issue Feb 1, 2024
nikic added a commit that referenced this issue Feb 2, 2024
SCEV treats "or disjoint" the same as "add nsw nuw". However, when
expanding, we cannot generally replace an add SCEV node with an "or
disjoint" instruction. Just dropping the poison flag is insufficient in
this case, we would have to actually convert the or into an add.

This is a partial fix for #79861.
nikic added a commit to nikic/llvm-project that referenced this issue Feb 2, 2024
IndVars may replace an instruction with one of its operands, if
they have the same SCEV expression. However, such a replacement
may be more poisonous.

First, check whether the operand being poison implies that the
instruction is also poison, in which case the replacement is
always safe. If this fails, check whether SCEV can determine that
reusing the instruction is safe, using the same check as
SCEVExpander.

Fixes llvm#79861.
nikic added a commit that referenced this issue Feb 5, 2024
IndVars may replace an instruction with one of its operands, if they
have the same SCEV expression. However, such a replacement may be more
poisonous.

First, check whether the operand being poison implies that the
instruction is also poison, in which case the replacement is always
safe. If this fails, check whether SCEV can determine that reusing the
instruction is safe, using the same check as SCEVExpander.

Fixes #79861.
@EugeneZelenko EugeneZelenko added llvm:transforms and removed llvm:SCEV Scalar Evolution labels Feb 5, 2024
agozillon pushed a commit to agozillon/llvm-project that referenced this issue Feb 5, 2024
agozillon pushed a commit to agozillon/llvm-project that referenced this issue Feb 5, 2024
SCEV treats "or disjoint" the same as "add nsw nuw". However, when
expanding, we cannot generally replace an add SCEV node with an "or
disjoint" instruction. Just dropping the poison flag is insufficient in
this case, we would have to actually convert the or into an add.

This is a partial fix for llvm#79861.
agozillon pushed a commit to agozillon/llvm-project that referenced this issue Feb 5, 2024
)

IndVars may replace an instruction with one of its operands, if they
have the same SCEV expression. However, such a replacement may be more
poisonous.

First, check whether the operand being poison implies that the
instruction is also poison, in which case the replacement is always
safe. If this fails, check whether SCEV can determine that reusing the
instruction is safe, using the same check as SCEVExpander.

Fixes llvm#79861.
@nikic nikic added this to the LLVM 18.X Release milestone Feb 6, 2024
@nikic
Copy link
Contributor

nikic commented Feb 6, 2024

/cherry-pick c105848 5b8e1a6 7d2b6f0

llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 6, 2024
(cherry picked from commit c105848)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 6, 2024
SCEV treats "or disjoint" the same as "add nsw nuw". However, when
expanding, we cannot generally replace an add SCEV node with an "or
disjoint" instruction. Just dropping the poison flag is insufficient in
this case, we would have to actually convert the or into an add.

This is a partial fix for llvm#79861.

(cherry picked from commit 5b8e1a6)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 6, 2024
)

IndVars may replace an instruction with one of its operands, if they
have the same SCEV expression. However, such a replacement may be more
poisonous.

First, check whether the operand being poison implies that the
instruction is also poison, in which case the replacement is always
safe. If this fails, check whether SCEV can determine that reusing the
instruction is safe, using the same check as SCEVExpander.

Fixes llvm#79861.

(cherry picked from commit 7d2b6f0)
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 6, 2024

/pull-request #80832

@nikic
Copy link
Contributor

nikic commented Feb 16, 2024

/cherry-pick c105848 5b8e1a6 43dd1e8 7d2b6f0

llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 16, 2024
(cherry picked from commit c105848)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 16, 2024
SCEV treats "or disjoint" the same as "add nsw nuw". However, when
expanding, we cannot generally replace an add SCEV node with an "or
disjoint" instruction. Just dropping the poison flag is insufficient in
this case, we would have to actually convert the or into an add.

This is a partial fix for llvm#79861.

(cherry picked from commit 5b8e1a6)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 16, 2024
)

IndVars may replace an instruction with one of its operands, if they
have the same SCEV expression. However, such a replacement may be more
poisonous.

First, check whether the operand being poison implies that the
instruction is also poison, in which case the replacement is always
safe. If this fails, check whether SCEV can determine that reusing the
instruction is safe, using the same check as SCEVExpander.

Fixes llvm#79861.

(cherry picked from commit 7d2b6f0)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 21, 2024
(cherry picked from commit c105848)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 21, 2024
SCEV treats "or disjoint" the same as "add nsw nuw". However, when
expanding, we cannot generally replace an add SCEV node with an "or
disjoint" instruction. Just dropping the poison flag is insufficient in
this case, we would have to actually convert the or into an add.

This is a partial fix for llvm#79861.

(cherry picked from commit 5b8e1a6)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Feb 21, 2024
)

IndVars may replace an instruction with one of its operands, if they
have the same SCEV expression. However, such a replacement may be more
poisonous.

First, check whether the operand being poison implies that the
instruction is also poison, in which case the replacement is always
safe. If this fails, check whether SCEV can determine that reusing the
instruction is safe, using the same check as SCEVExpander.

Fixes llvm#79861.

(cherry picked from commit 7d2b6f0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment