[LSR] Fix matching vscale immediates #100080

MacDue · 2024-07-23T08:27:37Z

Somewhat confusingly a SCEVMulExpr is a SCEVNAryExpr, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the SCEVMulExpr, so would ignore any operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a vscale * vscale multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first 16 * vscale * vscale (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by #4, mul vl, which is an offset of 16 * vscale (* 4 bytes).

llvmbot · 2024-07-23T08:28:09Z

@llvm/pr-subscribers-llvm-transforms

Author: Benjamin Maxwell (MacDue)

Changes

Somewhat confusingly a SCEVMulExpr is a SCEVNAryExpr, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the SCEVMulExpr, so would ignore any operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a vscale * vscale multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first 16 * vscale * vscale (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by #4, mul vl, which is an offset of 16 * vscale (* 4 bytes).

Full diff: https://github.com/llvm/llvm-project/pull/100080.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp (+2-1)
(modified) llvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-fixups.ll (+51)

diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index 11f9f7822a15c..3ba56a1a3af9d 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -947,7 +947,8 @@ static Immediate ExtractImmediate(const SCEV *&S, ScalarEvolution &SE) {
                            SCEV::FlagAnyWrap);
     return Result;
   } else if (EnableVScaleImmediates)
-    if (const SCEVMulExpr *M = dyn_cast<SCEVMulExpr>(S))
+    if (const SCEVMulExpr *M = dyn_cast<SCEVMulExpr>(S);
+        M && M->getNumOperands() == 2)
       if (const SCEVConstant *C = dyn_cast<SCEVConstant>(M->getOperand(0)))
         if (isa<SCEVVScale>(M->getOperand(1))) {
           S = SE.getConstant(M->getType(), 0);
diff --git a/llvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-fixups.ll b/llvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-fixups.ll
index 483955c1c57a0..588696d20227f 100644
--- a/llvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-fixups.ll
+++ b/llvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-fixups.ll
@@ -384,4 +384,55 @@ for.exit:
   ret void
 }
 
+;; Here are two writes that should be `16 * vscale * vscale` apart, so MUL VL
+;; addressing cannot be used to offset the second write, as for example,
+;; `#4, mul vl` would only be an offset of `16 * vscale` (dropping a vscale).
+define void @vscale_squared_offset(ptr %alloc) #0 {
+; COMMON-LABEL: vscale_squared_offset:
+; COMMON:       // %bb.0: // %entry
+; COMMON-NEXT:    rdvl x9, #1
+; COMMON-NEXT:    fmov z0.s, #4.00000000
+; COMMON-NEXT:    mov x8, xzr
+; COMMON-NEXT:    lsr x9, x9, #4
+; COMMON-NEXT:    fmov z1.s, #8.00000000
+; COMMON-NEXT:    cntw x10
+; COMMON-NEXT:    ptrue p0.s, vl1
+; COMMON-NEXT:    umull x9, w9, w9
+; COMMON-NEXT:    lsl x9, x9, #6
+; COMMON-NEXT:    cmp x8, x10
+; COMMON-NEXT:    b.ge .LBB6_2
+; COMMON-NEXT:  .LBB6_1: // %for.body
+; COMMON-NEXT:    // =>This Inner Loop Header: Depth=1
+; COMMON-NEXT:    add x11, x0, x9
+; COMMON-NEXT:    st1w { z0.s }, p0, [x0]
+; COMMON-NEXT:    add x8, x8, #1
+; COMMON-NEXT:    st1w { z1.s }, p0, [x11]
+; COMMON-NEXT:    addvl x0, x0, #1
+; COMMON-NEXT:    cmp x8, x10
+; COMMON-NEXT:    b.lt .LBB6_1
+; COMMON-NEXT:  .LBB6_2: // %for.exit
+; COMMON-NEXT:    ret
+entry:
+  %vscale = call i64 @llvm.vscale.i64()
+  %c4_vscale = mul i64 %vscale, 4
+  br label %for.check
+for.check:
+  %i = phi i64 [ %next_i, %for.body ], [ 0, %entry ]
+  %is_lt = icmp slt i64 %i, %c4_vscale
+  br i1 %is_lt, label %for.body, label %for.exit
+for.body:
+  %mask = call <vscale x 4 x i1> @llvm.aarch64.sve.whilelt.nxv4i1.i64(i64 0, i64 1)
+  %upper_offset = mul i64 %i, %c4_vscale
+  %upper_ptr = getelementptr float, ptr %alloc, i64 %upper_offset
+  call void @llvm.masked.store.nxv4f32.p0(<vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 4.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), ptr %upper_ptr, i32 4, <vscale x 4 x i1> %mask)
+  %lower_i = add i64 %i, %c4_vscale
+  %lower_offset = mul i64 %lower_i, %c4_vscale
+  %lower_ptr = getelementptr float, ptr %alloc, i64 %lower_offset
+  call void @llvm.masked.store.nxv4f32.p0(<vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 8.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), ptr %lower_ptr, i32 4, <vscale x 4 x i1> %mask)
+  %next_i = add i64 %i, 1
+  br label %for.check
+for.exit:
+  ret void
+}
+
 attributes #0 = { "target-features"="+sve2" vscale_range(1,16) }

MacDue · 2024-07-23T08:28:20Z

Note: The pre-commit Precommit vscale-fixups.ll test shows the current (incorrect) codegen.

huntergr-arm

LGTM.

Precommit test for #100080.

Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the `SCEVMulExpr`, so would ignore any operands after the first two. This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a `vscale * vscale` multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first `16 * vscale * vscale` (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by `llvm#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Precommit test for llvm#100080.

nikic · 2024-07-24T12:42:51Z

/cherry-pick c1b70fa 7fad04e

Precommit test for llvm#100080. (cherry picked from commit c1b70fa)

Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the `SCEVMulExpr`, so would ignore any operands after the first two. This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a `vscale * vscale` multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first `16 * vscale * vscale` (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by `#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes). (cherry picked from commit 7fad04e)

llvmbot · 2024-07-24T12:47:40Z

/pull-request #100359

Precommit test for llvm#100080. (cherry picked from commit c1b70fa)

Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the `SCEVMulExpr`, so would ignore any operands after the first two. This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a `vscale * vscale` multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first `16 * vscale * vscale` (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by `#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes). (cherry picked from commit 7fad04e)

Summary: Precommit test for #100080. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251062

Summary: Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the `SCEVMulExpr`, so would ignore any operands after the first two. This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a `vscale * vscale` multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first `16 * vscale * vscale` (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by `#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes). Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250721

Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the `SCEVMulExpr`, so would ignore any operands after the first two. This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a `vscale * vscale` multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first `16 * vscale * vscale` (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by `llvm#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).

MacDue requested review from paulwalker-arm and huntergr-arm July 23, 2024 08:27

llvmbot added the llvm:transforms label Jul 23, 2024

MacDue requested a review from banach-space July 23, 2024 08:28

MacDue mentioned this pull request Jul 23, 2024

[Codegen] Add vector transfer + slice foldings in GenericVectorization iree-org/iree#17613

Merged

huntergr-arm approved these changes Jul 23, 2024

View reviewed changes

MacDue added a commit that referenced this pull request Jul 23, 2024

Precommit vscale-fixups.ll test (NFC)

c1b70fa

Precommit test for #100080.

MacDue force-pushed the addressing_fix branch from f8886ff to 0fe5354 Compare July 23, 2024 10:06

paulwalker-arm reviewed Jul 23, 2024

View reviewed changes

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp Outdated Show resolved Hide resolved

sparker-arm pushed a commit to sparker-arm/llvm-project that referenced this pull request Jul 23, 2024

Precommit vscale-fixups.ll test (NFC)

7c56cce

Precommit test for llvm#100080.

Review fixups

69bb025

sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024

Precommit vscale-fixups.ll test (NFC)

37da742

Precommit test for llvm#100080.

paulwalker-arm approved these changes Jul 23, 2024

View reviewed changes

MacDue merged commit 7fad04e into llvm:main Jul 24, 2024
7 checks passed

MacDue deleted the addressing_fix branch July 24, 2024 09:06

nikic added this to the LLVM 19.X Release milestone Jul 24, 2024

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Jul 24, 2024

Precommit vscale-fixups.ll test (NFC)

46b7fc8

Precommit test for llvm#100080. (cherry picked from commit c1b70fa)

tru pushed a commit to llvmbot/llvm-project that referenced this pull request Jul 24, 2024

Precommit vscale-fixups.ll test (NFC)

0934f6d

Precommit test for llvm#100080. (cherry picked from commit c1b70fa)

yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024

Precommit vscale-fixups.ll test (NFC)

de357ca

Summary: Precommit test for #100080. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251062

philnik777 mentioned this pull request Aug 2, 2024

[Clang] Add a release note deprecating __is_nullptr #101638

Merged

mgabka mentioned this pull request Aug 22, 2024

Add release note about ABI mgabka/llvm-project#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LSR] Fix matching vscale immediates #100080

[LSR] Fix matching vscale immediates #100080

MacDue commented Jul 23, 2024

llvmbot commented Jul 23, 2024

MacDue commented Jul 23, 2024 •

edited

Loading

huntergr-arm left a comment

nikic commented Jul 24, 2024

llvmbot commented Jul 24, 2024

[LSR] Fix matching vscale immediates #100080

[LSR] Fix matching vscale immediates #100080

Conversation

MacDue commented Jul 23, 2024

llvmbot commented Jul 23, 2024

MacDue commented Jul 23, 2024 • edited Loading

huntergr-arm left a comment

Choose a reason for hiding this comment

nikic commented Jul 24, 2024

llvmbot commented Jul 24, 2024

MacDue commented Jul 23, 2024 •

edited

Loading