[RISCV][InsertVSETVLI] Eliminate the AVLIsIgnored state #94658

preames · 2024-06-06T18:46:33Z

As noted in one of the existing comments, the job AVLIsIgnored was filing was really more of a demanded field role. Since we recently realized we can use the values of VL on MI even in the backwards pass, let's exploit that to improve demanded fields, and delete AVLIsIgnored.

Note that the test change is a real regression, but only incidental to this patch. The backwards pass doesn't have the information that the VL following a VL-preserving vtype is non-zero. This is an existing problem, this patch just adds a few more cases where we prove vl-preserving is legal.

As noted in one of the existing comments, the job AVLIsIgnored was filing was really more of a demanded field role. Since we recently realized we can use the values of VL on MI even in the backwards pass, let's exploit that to improve demanded fields, and delete AVLIsIgnored. Note that the test change is a real regression, but only incidental to this patch. The backwards pass doesn't have the information that the VL following a VL-preserving vtype is non-zero. This is an existing problem, this patch just adds a few more cases where we prove vl-preserving is legal.

llvmbot · 2024-06-06T18:47:06Z

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

As noted in one of the existing comments, the job AVLIsIgnored was filing was really more of a demanded field role. Since we recently realized we can use the values of VL on MI even in the backwards pass, let's exploit that to improve demanded fields, and delete AVLIsIgnored.

Note that the test change is a real regression, but only incidental to this patch. The backwards pass doesn't have the information that the VL following a VL-preserving vtype is non-zero. This is an existing problem, this patch just adds a few more cases where we prove vl-preserving is legal.

Patch is 82.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94658.diff

9 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp (+10-47)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-expandload-fp.ll (+54-36)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-expandload-int.ll (+36-24)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-llrint.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-lrint.ll (+1-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+143-100)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.mir (+1-1)

diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index a96768240a933..4550923bceab8 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -397,7 +397,9 @@ DemandedFields getDemanded(const MachineInstr &MI, const RISCVSubtarget *ST) {
   if (RISCVII::hasSEWOp(TSFlags)) {
     Res.demandVTYPE();
     if (RISCVII::hasVLOp(TSFlags))
-      Res.demandVL();
+      if (const MachineOperand &VLOp = MI.getOperand(getVLOpNum(MI));
+          !VLOp.isReg() || !VLOp.isUndef())
+        Res.demandVL();
 
     // Behavior is independent of mask policy.
     if (!RISCVII::usesMaskPolicy(TSFlags))
@@ -517,7 +519,6 @@ class VSETVLIInfo {
     AVLIsReg,
     AVLIsImm,
     AVLIsVLMAX,
-    AVLIsIgnored,
     Unknown,
   } State = Uninitialized;
 
@@ -557,12 +558,9 @@ class VSETVLIInfo {
 
   void setAVLVLMAX() { State = AVLIsVLMAX; }
 
-  void setAVLIgnored() { State = AVLIsIgnored; }
-
   bool hasAVLImm() const { return State == AVLIsImm; }
   bool hasAVLReg() const { return State == AVLIsReg; }
   bool hasAVLVLMAX() const { return State == AVLIsVLMAX; }
-  bool hasAVLIgnored() const { return State == AVLIsIgnored; }
   Register getAVLReg() const {
     assert(hasAVLReg() && AVLRegDef.DefReg.isVirtual());
     return AVLRegDef.DefReg;
@@ -593,8 +591,6 @@ class VSETVLIInfo {
       setAVLRegDef(Info.getAVLVNInfo(), Info.getAVLReg());
     else if (Info.hasAVLVLMAX())
       setAVLVLMAX();
-    else if (Info.hasAVLIgnored())
-      setAVLIgnored();
     else {
       assert(Info.hasAVLImm());
       setAVLImm(Info.getAVLImm());
@@ -615,8 +611,6 @@ class VSETVLIInfo {
     }
     if (hasAVLVLMAX())
       return true;
-    if (hasAVLIgnored())
-      return false;
     return false;
   }
 
@@ -638,9 +632,6 @@ class VSETVLIInfo {
     if (hasAVLVLMAX())
       return Other.hasAVLVLMAX() && hasSameVLMAX(Other);
 
-    if (hasAVLIgnored())
-      return Other.hasAVLIgnored();
-
     return false;
   }
 
@@ -816,8 +807,6 @@ class VSETVLIInfo {
       OS << "AVLImm=" << (unsigned)AVLImm;
     if (hasAVLVLMAX())
       OS << "AVLVLMAX";
-    if (hasAVLIgnored())
-      OS << "AVLIgnored";
     OS << ", "
        << "VLMul=" << (unsigned)VLMul << ", "
        << "SEW=" << (unsigned)SEW << ", "
@@ -936,7 +925,8 @@ RISCVInsertVSETVLI::getInfoForVSETVLI(const MachineInstr &MI) const {
       NewInfo.setAVLRegDef(VNI, AVLReg);
     else {
       assert(MI.getOperand(1).isUndef());
-      NewInfo.setAVLIgnored();
+      // Otherwise use an AVL of 1 to avoid depending on previous vl.
+      NewInfo.setAVLImm(1);
     }
   }
   NewInfo.setVTYPE(MI.getOperand(2).getImm());
@@ -1012,14 +1002,14 @@ RISCVInsertVSETVLI::computeInfoForInstr(const MachineInstr &MI) const {
       InstrInfo.setAVLRegDef(VNI, VLOp.getReg());
     } else {
       assert(VLOp.isUndef());
-      InstrInfo.setAVLIgnored();
+      // Otherwise use an AVL of 1 to avoid depending on previous vl.
+      InstrInfo.setAVLImm(1);
     }
   } else {
     assert(isScalarExtractInstr(MI));
-    // TODO: If we are more clever about x0,x0 insertion then we should be able
-    // to deduce that the VL is ignored based off of DemandedFields, and remove
-    // the AVLIsIgnored state. Then we can just use an arbitrary immediate AVL.
-    InstrInfo.setAVLIgnored();
+    // Pick a random value for state tracking purposes, will be ignored via
+    // the demanded fields mechanism
+    InstrInfo.setAVLImm(1);
   }
 #ifndef NDEBUG
   if (std::optional<unsigned> EEW = getEEWForLoadStore(MI)) {
@@ -1099,28 +1089,6 @@ void RISCVInsertVSETVLI::insertVSETVLI(MachineBasicBlock &MBB,
     return;
   }
 
-  if (Info.hasAVLIgnored()) {
-    // We can only use x0, x0 if there's no chance of the vtype change causing
-    // the previous vl to become invalid.
-    if (PrevInfo.isValid() && !PrevInfo.isUnknown() &&
-        Info.hasSameVLMAX(PrevInfo)) {
-      auto MI = BuildMI(MBB, InsertPt, DL, TII->get(RISCV::PseudoVSETVLIX0))
-                    .addReg(RISCV::X0, RegState::Define | RegState::Dead)
-                    .addReg(RISCV::X0, RegState::Kill)
-                    .addImm(Info.encodeVTYPE())
-                    .addReg(RISCV::VL, RegState::Implicit);
-      LIS->InsertMachineInstrInMaps(*MI);
-      return;
-    }
-    // Otherwise use an AVL of 1 to avoid depending on previous vl.
-    auto MI = BuildMI(MBB, InsertPt, DL, TII->get(RISCV::PseudoVSETIVLI))
-                  .addReg(RISCV::X0, RegState::Define | RegState::Dead)
-                  .addImm(1)
-                  .addImm(Info.encodeVTYPE());
-    LIS->InsertMachineInstrInMaps(*MI);
-    return;
-  }
-
   if (Info.hasAVLVLMAX()) {
     Register DestReg = MRI->createVirtualRegister(&RISCV::GPRRegClass);
     auto MI = BuildMI(MBB, InsertPt, DL, TII->get(RISCV::PseudoVSETVLIX0))
@@ -1529,11 +1497,6 @@ void RISCVInsertVSETVLI::doPRE(MachineBasicBlock &MBB) {
       return;
   }
 
-  // If the AVL isn't used in its predecessors then bail, since we have no AVL
-  // to insert a vsetvli with.
-  if (AvailableInfo.hasAVLIgnored())
-    return;
-
   // Model the effect of changing the input state of the block MBB to
   // AvailableInfo.  We're looking for two issues here; one legality,
   // one profitability.
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-expandload-fp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-expandload-fp.ll
index 48e820243c957..8b31166e313de 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-expandload-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-expandload-fp.ll
@@ -44,15 +44,16 @@ define <2 x half> @expandload_v2f16(ptr %base, <2 x half> %src0, <2 x i1> %mask)
 ; RV32-NEXT:    ret
 ; RV32-NEXT:  .LBB1_3: # %cond.load
 ; RV32-NEXT:    flh fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e16, m2, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e16, m2, tu, ma
 ; RV32-NEXT:    vfmv.s.f v8, fa5
 ; RV32-NEXT:    addi a0, a0, 2
 ; RV32-NEXT:    andi a1, a1, 2
 ; RV32-NEXT:    beqz a1, .LBB1_2
 ; RV32-NEXT:  .LBB1_4: # %cond.load1
 ; RV32-NEXT:    flh fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; RV32-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; RV32-NEXT:    vfmv.s.f v9, fa5
+; RV32-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
 ; RV32-NEXT:    vslideup.vi v8, v9, 1
 ; RV32-NEXT:    ret
 ;
@@ -69,15 +70,16 @@ define <2 x half> @expandload_v2f16(ptr %base, <2 x half> %src0, <2 x i1> %mask)
 ; RV64-NEXT:    ret
 ; RV64-NEXT:  .LBB1_3: # %cond.load
 ; RV64-NEXT:    flh fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e16, m2, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e16, m2, tu, ma
 ; RV64-NEXT:    vfmv.s.f v8, fa5
 ; RV64-NEXT:    addi a0, a0, 2
 ; RV64-NEXT:    andi a1, a1, 2
 ; RV64-NEXT:    beqz a1, .LBB1_2
 ; RV64-NEXT:  .LBB1_4: # %cond.load1
 ; RV64-NEXT:    flh fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; RV64-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; RV64-NEXT:    vfmv.s.f v9, fa5
+; RV64-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
 ; RV64-NEXT:    vslideup.vi v8, v9, 1
 ; RV64-NEXT:    ret
   %res = call <2 x half> @llvm.masked.expandload.v2f16(ptr align 2 %base, <2 x i1> %mask, <2 x half> %src0)
@@ -105,15 +107,16 @@ define <4 x half> @expandload_v4f16(ptr %base, <4 x half> %src0, <4 x i1> %mask)
 ; RV32-NEXT:    ret
 ; RV32-NEXT:  .LBB2_5: # %cond.load
 ; RV32-NEXT:    flh fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 4, e16, m2, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e16, m2, tu, ma
 ; RV32-NEXT:    vfmv.s.f v8, fa5
 ; RV32-NEXT:    addi a0, a0, 2
 ; RV32-NEXT:    andi a2, a1, 2
 ; RV32-NEXT:    beqz a2, .LBB2_2
 ; RV32-NEXT:  .LBB2_6: # %cond.load1
 ; RV32-NEXT:    flh fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e16, mf2, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; RV32-NEXT:    vfmv.s.f v9, fa5
+; RV32-NEXT:    vsetivli zero, 2, e16, mf2, tu, ma
 ; RV32-NEXT:    vslideup.vi v8, v9, 1
 ; RV32-NEXT:    addi a0, a0, 2
 ; RV32-NEXT:    andi a2, a1, 4
@@ -152,15 +155,16 @@ define <4 x half> @expandload_v4f16(ptr %base, <4 x half> %src0, <4 x i1> %mask)
 ; RV64-NEXT:    ret
 ; RV64-NEXT:  .LBB2_5: # %cond.load
 ; RV64-NEXT:    flh fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 4, e16, m2, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e16, m2, tu, ma
 ; RV64-NEXT:    vfmv.s.f v8, fa5
 ; RV64-NEXT:    addi a0, a0, 2
 ; RV64-NEXT:    andi a2, a1, 2
 ; RV64-NEXT:    beqz a2, .LBB2_2
 ; RV64-NEXT:  .LBB2_6: # %cond.load1
 ; RV64-NEXT:    flh fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e16, mf2, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; RV64-NEXT:    vfmv.s.f v9, fa5
+; RV64-NEXT:    vsetivli zero, 2, e16, mf2, tu, ma
 ; RV64-NEXT:    vslideup.vi v8, v9, 1
 ; RV64-NEXT:    addi a0, a0, 2
 ; RV64-NEXT:    andi a2, a1, 4
@@ -216,15 +220,16 @@ define <8 x half> @expandload_v8f16(ptr %base, <8 x half> %src0, <8 x i1> %mask)
 ; RV32-NEXT:    ret
 ; RV32-NEXT:  .LBB3_9: # %cond.load
 ; RV32-NEXT:    flh fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 8, e16, m2, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e16, m2, tu, ma
 ; RV32-NEXT:    vfmv.s.f v8, fa5
 ; RV32-NEXT:    addi a0, a0, 2
 ; RV32-NEXT:    andi a2, a1, 2
 ; RV32-NEXT:    beqz a2, .LBB3_2
 ; RV32-NEXT:  .LBB3_10: # %cond.load1
 ; RV32-NEXT:    flh fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e16, m1, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; RV32-NEXT:    vfmv.s.f v9, fa5
+; RV32-NEXT:    vsetivli zero, 2, e16, m1, tu, ma
 ; RV32-NEXT:    vslideup.vi v8, v9, 1
 ; RV32-NEXT:    addi a0, a0, 2
 ; RV32-NEXT:    andi a2, a1, 4
@@ -307,15 +312,16 @@ define <8 x half> @expandload_v8f16(ptr %base, <8 x half> %src0, <8 x i1> %mask)
 ; RV64-NEXT:    ret
 ; RV64-NEXT:  .LBB3_9: # %cond.load
 ; RV64-NEXT:    flh fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 8, e16, m2, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e16, m2, tu, ma
 ; RV64-NEXT:    vfmv.s.f v8, fa5
 ; RV64-NEXT:    addi a0, a0, 2
 ; RV64-NEXT:    andi a2, a1, 2
 ; RV64-NEXT:    beqz a2, .LBB3_2
 ; RV64-NEXT:  .LBB3_10: # %cond.load1
 ; RV64-NEXT:    flh fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e16, m1, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; RV64-NEXT:    vfmv.s.f v9, fa5
+; RV64-NEXT:    vsetivli zero, 2, e16, m1, tu, ma
 ; RV64-NEXT:    vslideup.vi v8, v9, 1
 ; RV64-NEXT:    addi a0, a0, 2
 ; RV64-NEXT:    andi a2, a1, 4
@@ -412,15 +418,16 @@ define <2 x float> @expandload_v2f32(ptr %base, <2 x float> %src0, <2 x i1> %mas
 ; RV32-NEXT:    ret
 ; RV32-NEXT:  .LBB5_3: # %cond.load
 ; RV32-NEXT:    flw fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e32, m4, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e32, m4, tu, ma
 ; RV32-NEXT:    vfmv.s.f v8, fa5
 ; RV32-NEXT:    addi a0, a0, 4
 ; RV32-NEXT:    andi a1, a1, 2
 ; RV32-NEXT:    beqz a1, .LBB5_2
 ; RV32-NEXT:  .LBB5_4: # %cond.load1
 ; RV32-NEXT:    flw fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; RV32-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; RV32-NEXT:    vfmv.s.f v9, fa5
+; RV32-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
 ; RV32-NEXT:    vslideup.vi v8, v9, 1
 ; RV32-NEXT:    ret
 ;
@@ -437,15 +444,16 @@ define <2 x float> @expandload_v2f32(ptr %base, <2 x float> %src0, <2 x i1> %mas
 ; RV64-NEXT:    ret
 ; RV64-NEXT:  .LBB5_3: # %cond.load
 ; RV64-NEXT:    flw fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e32, m4, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e32, m4, tu, ma
 ; RV64-NEXT:    vfmv.s.f v8, fa5
 ; RV64-NEXT:    addi a0, a0, 4
 ; RV64-NEXT:    andi a1, a1, 2
 ; RV64-NEXT:    beqz a1, .LBB5_2
 ; RV64-NEXT:  .LBB5_4: # %cond.load1
 ; RV64-NEXT:    flw fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; RV64-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; RV64-NEXT:    vfmv.s.f v9, fa5
+; RV64-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
 ; RV64-NEXT:    vslideup.vi v8, v9, 1
 ; RV64-NEXT:    ret
   %res = call <2 x float> @llvm.masked.expandload.v2f32(ptr align 4 %base, <2 x i1> %mask, <2 x float> %src0)
@@ -473,15 +481,16 @@ define <4 x float> @expandload_v4f32(ptr %base, <4 x float> %src0, <4 x i1> %mas
 ; RV32-NEXT:    ret
 ; RV32-NEXT:  .LBB6_5: # %cond.load
 ; RV32-NEXT:    flw fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 4, e32, m4, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e32, m4, tu, ma
 ; RV32-NEXT:    vfmv.s.f v8, fa5
 ; RV32-NEXT:    addi a0, a0, 4
 ; RV32-NEXT:    andi a2, a1, 2
 ; RV32-NEXT:    beqz a2, .LBB6_2
 ; RV32-NEXT:  .LBB6_6: # %cond.load1
 ; RV32-NEXT:    flw fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; RV32-NEXT:    vfmv.s.f v9, fa5
+; RV32-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
 ; RV32-NEXT:    vslideup.vi v8, v9, 1
 ; RV32-NEXT:    addi a0, a0, 4
 ; RV32-NEXT:    andi a2, a1, 4
@@ -520,15 +529,16 @@ define <4 x float> @expandload_v4f32(ptr %base, <4 x float> %src0, <4 x i1> %mas
 ; RV64-NEXT:    ret
 ; RV64-NEXT:  .LBB6_5: # %cond.load
 ; RV64-NEXT:    flw fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 4, e32, m4, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e32, m4, tu, ma
 ; RV64-NEXT:    vfmv.s.f v8, fa5
 ; RV64-NEXT:    addi a0, a0, 4
 ; RV64-NEXT:    andi a2, a1, 2
 ; RV64-NEXT:    beqz a2, .LBB6_2
 ; RV64-NEXT:  .LBB6_6: # %cond.load1
 ; RV64-NEXT:    flw fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; RV64-NEXT:    vfmv.s.f v9, fa5
+; RV64-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
 ; RV64-NEXT:    vslideup.vi v8, v9, 1
 ; RV64-NEXT:    addi a0, a0, 4
 ; RV64-NEXT:    andi a2, a1, 4
@@ -584,15 +594,16 @@ define <8 x float> @expandload_v8f32(ptr %base, <8 x float> %src0, <8 x i1> %mas
 ; RV32-NEXT:    ret
 ; RV32-NEXT:  .LBB7_9: # %cond.load
 ; RV32-NEXT:    flw fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 8, e32, m4, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e32, m4, tu, ma
 ; RV32-NEXT:    vfmv.s.f v8, fa5
 ; RV32-NEXT:    addi a0, a0, 4
 ; RV32-NEXT:    andi a2, a1, 2
 ; RV32-NEXT:    beqz a2, .LBB7_2
 ; RV32-NEXT:  .LBB7_10: # %cond.load1
 ; RV32-NEXT:    flw fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; RV32-NEXT:    vfmv.s.f v10, fa5
+; RV32-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
 ; RV32-NEXT:    vslideup.vi v8, v10, 1
 ; RV32-NEXT:    addi a0, a0, 4
 ; RV32-NEXT:    andi a2, a1, 4
@@ -675,15 +686,16 @@ define <8 x float> @expandload_v8f32(ptr %base, <8 x float> %src0, <8 x i1> %mas
 ; RV64-NEXT:    ret
 ; RV64-NEXT:  .LBB7_9: # %cond.load
 ; RV64-NEXT:    flw fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 8, e32, m4, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e32, m4, tu, ma
 ; RV64-NEXT:    vfmv.s.f v8, fa5
 ; RV64-NEXT:    addi a0, a0, 4
 ; RV64-NEXT:    andi a2, a1, 2
 ; RV64-NEXT:    beqz a2, .LBB7_2
 ; RV64-NEXT:  .LBB7_10: # %cond.load1
 ; RV64-NEXT:    flw fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; RV64-NEXT:    vfmv.s.f v10, fa5
+; RV64-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
 ; RV64-NEXT:    vslideup.vi v8, v10, 1
 ; RV64-NEXT:    addi a0, a0, 4
 ; RV64-NEXT:    andi a2, a1, 4
@@ -780,15 +792,16 @@ define <2 x double> @expandload_v2f64(ptr %base, <2 x double> %src0, <2 x i1> %m
 ; RV32-NEXT:    ret
 ; RV32-NEXT:  .LBB9_3: # %cond.load
 ; RV32-NEXT:    fld fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e64, m8, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e64, m8, tu, ma
 ; RV32-NEXT:    vfmv.s.f v8, fa5
 ; RV32-NEXT:    addi a0, a0, 8
 ; RV32-NEXT:    andi a1, a1, 2
 ; RV32-NEXT:    beqz a1, .LBB9_2
 ; RV32-NEXT:  .LBB9_4: # %cond.load1
 ; RV32-NEXT:    fld fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfmv.s.f v9, fa5
+; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; RV32-NEXT:    vslideup.vi v8, v9, 1
 ; RV32-NEXT:    ret
 ;
@@ -805,15 +818,16 @@ define <2 x double> @expandload_v2f64(ptr %base, <2 x double> %src0, <2 x i1> %m
 ; RV64-NEXT:    ret
 ; RV64-NEXT:  .LBB9_3: # %cond.load
 ; RV64-NEXT:    fld fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e64, m8, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e64, m8, tu, ma
 ; RV64-NEXT:    vfmv.s.f v8, fa5
 ; RV64-NEXT:    addi a0, a0, 8
 ; RV64-NEXT:    andi a1, a1, 2
 ; RV64-NEXT:    beqz a1, .LBB9_2
 ; RV64-NEXT:  .LBB9_4: # %cond.load1
 ; RV64-NEXT:    fld fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV64-NEXT:    vfmv.s.f v9, fa5
+; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; RV64-NEXT:    vslideup.vi v8, v9, 1
 ; RV64-NEXT:    ret
   %res = call <2 x double> @llvm.masked.expandload.v2f64(ptr align 8 %base, <2 x i1> %mask, <2 x double> %src0)
@@ -841,15 +855,16 @@ define <4 x double> @expandload_v4f64(ptr %base, <4 x double> %src0, <4 x i1> %m
 ; RV32-NEXT:    ret
 ; RV32-NEXT:  .LBB10_5: # %cond.load
 ; RV32-NEXT:    fld fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 4, e64, m8, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e64, m8, tu, ma
 ; RV32-NEXT:    vfmv.s.f v8, fa5
 ; RV32-NEXT:    addi a0, a0, 8
 ; RV32-NEXT:    andi a2, a1, 2
 ; RV32-NEXT:    beqz a2, .LBB10_2
 ; RV32-NEXT:  .LBB10_6: # %cond.load1
 ; RV32-NEXT:    fld fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e64, m1, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfmv.s.f v10, fa5
+; RV32-NEXT:    vsetivli zero, 2, e64, m1, tu, ma
 ; RV32-NEXT:    vslideup.vi v8, v10, 1
 ; RV32-NEXT:    addi a0, a0, 8
 ; RV32-NEXT:    andi a2, a1, 4
@@ -888,15 +903,16 @@ define <4 x double> @expandload_v4f64(ptr %base, <4 x double> %src0, <4 x i1> %m
 ; RV64-NEXT:    ret
 ; RV64-NEXT:  .LBB10_5: # %cond.load
 ; RV64-NEXT:    fld fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 4, e64, m8, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e64, m8, tu, ma
 ; RV64-NEXT:    vfmv.s.f v8, fa5
 ; RV64-NEXT:    addi a0, a0, 8
 ; RV64-NEXT:    andi a2, a1, 2
 ; RV64-NEXT:    beqz a2, .LBB10_2
 ; RV64-NEXT:  .LBB10_6: # %cond.load1
 ; RV64-NEXT:    fld fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e64, m1, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV64-NEXT:    vfmv.s.f v10, fa5
+; RV64-NEXT:    vsetivli zero, 2, e64, m1, tu, ma
 ; RV64-NEXT:    vslideup.vi v8, v10, 1
 ; RV64-NEXT:    addi a0, a0, 8
 ; RV64-NEXT:    andi a2, a1, 4
@@ -952,15 +968,16 @@ define <8 x double> @expandload_v8f64(ptr %base, <8 x double> %src0, <8 x i1> %m
 ; RV32-NEXT:    ret
 ; RV32-NEXT:  .LBB11_9: # %cond.load
 ; RV32-NEXT:    fld fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 8, e64, m8, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e64, m8, tu, ma
 ; RV32-NEXT:    vfmv.s.f v8, fa5
 ; RV32-NEXT:    addi a0, a0, 8
 ; RV32-NEXT:    andi a2, a1, 2
 ; RV32-NEXT:    beqz a2, .LBB11_2
 ; RV32-NEXT:  .LBB11_10: # %cond.load1
 ; RV32-NEXT:    fld fa5, 0(a0)
-; RV32-NEXT:    vsetivli zero, 2, e64, m1, tu, ma
+; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfmv.s.f v12, fa5
+; RV32-NEXT:    vsetivli zero, 2, e64, m1, tu, ma
 ; RV32-NEXT:    vslideup.vi v8, v12, 1
 ; RV32-NEXT:    addi a0, a0, 8
 ; RV32-NEXT:    andi a2, a1, 4
@@ -1043,15 +1060,16 @@ define <8 x double> @expandload_v8f64(ptr %base, <8 x double> %src0, <8 x i1> %m
 ; RV64-NEXT:    ret
 ; RV64-NEXT:  .LBB11_9: # %cond.load
 ; RV64-NEXT:    fld fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 8, e64, m8, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e64, m8, tu, ma
 ; RV64-NEXT:    vfmv.s.f v8, fa5
 ; RV64-NEXT:    addi a0, a0, 8
 ; RV64-NEXT:    andi a2, a1, 2
 ; RV64-NEXT:    beqz a2, .LBB11_2
 ; RV64-NEXT:  .LBB11_10: # %cond.load1
 ; RV64-NEXT:    fld fa5, 0(a0)
-; RV64-NEXT:    vsetivli zero, 2, e64, m1, tu, ma
+; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV64-NEXT:    vfmv.s.f v12, fa5
+; RV64-NEXT:    vsetivli zero, 2, e64, m1, tu, ma
 ; RV64-NEXT:    vslideup.vi v8, v12, 1
 ; RV64-NEXT:    addi a0, a0, 8
 ; RV64-NEXT:    andi a2, a1, 4
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-expandload-int.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-expandload-int.ll
index d6aca55fbde59..5bf8b07efc1da 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-expandload-int.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-expandload-int.ll
@@ -33,15 +33,16 @@ define <2 x i8> @expandload_v2i8(ptr %base, <2 x i8> %src0, <2 x i1> %mask) {
 ; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB1_3: # %cond.load
 ; CHECK-NEXT:    lbu a2, 0(a0)
-; CHECK-NEXT:    ...
[truncated]

Stacked on llvm#94658. We recently moved RISCVInsertVSETVLI from before vector register allocation to after vector register allocation. When doing so, we added an unconditional dependency on LiveIntervals - even at O0 where LiveIntevals hadn't previously run. As reported in llvm#93587, this was apparently not safe to do. This change makes LiveIntervals optional, and adjusts all the update code to only run wen live intervals is present. The only real tricky part of this change is the abstract state tracking in the dataflow. We need to represent a "register w/unknown definition" state - but only when we don't have LiveIntervals. This adjust the abstract state definition so that the AVLIsReg state can represent either a register + valno, or a register + unknown definition. With LiveIntervals, wehave an exact definition for each AVL use. Without LiveIntervals, we treat the definition of a register AVL as being unknown. The key semantic change is that we now have a state in the lattice for which something is known about the AVL value, but for which two identical lattice elements do *not* neccessarily represent the same AVL value at runtime. Previously, the only case which could result in such an unknown AVL was the fully unknown state (where VTYPE is also fully unknown). This requires a small adjustment to hasSameAVL and lattice state equality to draw this important distinction. The net effect of this patch is that we remove the LiveIntervals dependency at O0, and O0 code quality will regress for cases involving register AVL values. This patch is an alternative to llvm#93796 and llvm#94340. It is very directly inspired by review conversation around them, and thus should be considered coauthored by Luke.

lukel97

LGTM

The backwards pass doesn't have the information that the VL following a VL-preserving vtype is non-zero.

I think #89089 should fix this, I plan on returning to it once the LiveIntervals stuff is sorted

Stacked on #94658. We recently moved RISCVInsertVSETVLI from before vector register allocation to after vector register allocation. When doing so, we added an unconditional dependency on LiveIntervals - even at O0 where LiveIntevals hadn't previously run. As reported in #93587, this was apparently not safe to do. This change makes LiveIntervals optional, and adjusts all the update code to only run wen live intervals is present. The only real tricky part of this change is the abstract state tracking in the dataflow. We need to represent a "register w/unknown definition" state - but only when we don't have LiveIntervals. This adjust the abstract state definition so that the AVLIsReg state can represent either a register + valno, or a register + unknown definition. With LiveIntervals, we have an exact definition for each AVL use. Without LiveIntervals, we treat the definition of a register AVL as being unknown. The key semantic change is that we now have a state in the lattice for which something is known about the AVL value, but for which two identical lattice elements do *not* necessarily represent the same AVL value at runtime. Previously, the only case which could result in such an unknown AVL was the fully unknown state (where VTYPE is also fully unknown). This requires a small adjustment to hasSameAVL and lattice state equality to draw this important distinction. The net effect of this patch is that we remove the LiveIntervals dependency at O0, and O0 code quality will regress for cases involving register AVL values. This patch is an alternative to #93796 and #94340. It is very directly inspired by review conversation around them, and thus should be considered coauthored by Luke.

preames requested review from lukel97, BeMg, topperc and yetingk June 6, 2024 18:46

llvmbot added the backend:RISC-V label Jun 6, 2024

preames mentioned this pull request Jun 6, 2024

[RISCV] Teach RISCVInsertVSETVLI to work without LiveIntervals #94686

Merged

lukel97 approved these changes Jun 11, 2024

View reviewed changes

Merge branch 'main' into pr-insertvsetvli-kill-avl-ignored

cdabb91

preames merged commit 964c92d into llvm:main Jun 17, 2024
4 of 6 checks passed

preames deleted the pr-insertvsetvli-kill-avl-ignored branch June 17, 2024 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV][InsertVSETVLI] Eliminate the AVLIsIgnored state #94658

[RISCV][InsertVSETVLI] Eliminate the AVLIsIgnored state #94658

preames commented Jun 6, 2024

llvmbot commented Jun 6, 2024

lukel97 left a comment

[RISCV][InsertVSETVLI] Eliminate the AVLIsIgnored state #94658

[RISCV][InsertVSETVLI] Eliminate the AVLIsIgnored state #94658

Conversation

preames commented Jun 6, 2024

llvmbot commented Jun 6, 2024

lukel97 left a comment

Choose a reason for hiding this comment