[RISCV] Move vmerge same mask peephole to RISCVVectorPeephole #106108

lukel97 · 2024-08-26T17:23:41Z

We currently fold a vmerge.vvm into its true operand if the true operand is a masked pseudo with the same mask.

We can move this over to RISCVVectorPeephole by instead splitting it up into a smaller peephole which converts it to a vmv.v.v first. The existing foldVMV_V_V peephole will then take care of folding it if needed.

This is very similar to the existing all-ones mask peephole and we could potentially do it inside of it. I opted to put it in a separate peephole to make it easier to reason about, given that the duplication is small, but I could be persuaded either way.

We currently fold a vmerge.vvm into its true operand if the true operand is a masked pseudo with the same mask. We can move this over to RISCVVectorPeephole by instead splitting it up into a smaller peephole which converts it to a vmv.v.v first. The existing foldVMV_V_V peephole will then take care of folding it if needed. This is very similar to the existing all-ones mask peephole and we could potentially do it inside of it. I opted to put it in a separate peephole to make it easier to reason about, given that the duplication is small, but I could be persuaded either way.

llvmbot · 2024-09-05T16:32:38Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

We currently fold a vmerge.vvm into its true operand if the true operand is a masked pseudo with the same mask.

We can move this over to RISCVVectorPeephole by instead splitting it up into a smaller peephole which converts it to a vmv.v.v first. The existing foldVMV_V_V peephole will then take care of folding it if needed.

This is very similar to the existing all-ones mask peephole and we could potentially do it inside of it. I opted to put it in a separate peephole to make it easier to reason about, given that the duplication is small, but I could be persuaded either way.

Full diff: https://github.com/llvm/llvm-project/pull/106108.diff

4 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp (+5-37)
(modified) llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp (+63-10)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+10-12)
(modified) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir (+70)

diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 4580f3191d1389..ff4c0e9bbd50e7 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3833,15 +3833,8 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
   uint64_t TrueTSFlags = TrueMCID.TSFlags;
   bool HasTiedDest = RISCVII::isFirstDefTiedToFirstUse(TrueMCID);
 
-  bool IsMasked = false;
   const RISCV::RISCVMaskedPseudoInfo *Info =
       RISCV::lookupMaskedIntrinsicByUnmasked(TrueOpc);
-  if (!Info && HasTiedDest) {
-    Info = RISCV::getMaskedPseudoInfo(TrueOpc);
-    IsMasked = true;
-  }
-  assert(!(IsMasked && !HasTiedDest) && "Expected tied dest");
-
   if (!Info)
     return false;
 
@@ -3853,19 +3846,6 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
       return false;
   }
 
-  // If True is masked then the vmerge must have either the same mask or an all
-  // 1s mask, since we're going to keep the mask from True.
-  if (IsMasked) {
-    // FIXME: Support mask agnostic True instruction which would have an
-    // undef passthru operand.
-    SDValue TrueMask =
-        getMaskSetter(True->getOperand(Info->MaskOpIdx),
-                      True->getOperand(True->getNumOperands() - 1));
-    assert(TrueMask);
-    if (!usesAllOnesMask(Mask, Glue) && getMaskSetter(Mask, Glue) != TrueMask)
-      return false;
-  }
-
   // Skip if True has side effect.
   if (TII->get(TrueOpc).hasUnmodeledSideEffects())
     return false;
@@ -3930,24 +3910,13 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
       (Mask && !usesAllOnesMask(Mask, Glue)))
     return false;
 
-  // If we end up changing the VL or mask of True, then we need to make sure it
-  // doesn't raise any observable fp exceptions, since changing the active
-  // elements will affect how fflags is set.
-  if (TrueVL != VL || !IsMasked)
-    if (mayRaiseFPException(True.getNode()) &&
-        !True->getFlags().hasNoFPExcept())
-      return false;
+  // Make sure it doesn't raise any observable fp exceptions, since changing the
+  // active elements will affect how fflags is set.
+  if (mayRaiseFPException(True.getNode()) && !True->getFlags().hasNoFPExcept())
+    return false;
 
   SDLoc DL(N);
 
-  // From the preconditions we checked above, we know the mask and thus glue
-  // for the result node will be taken from True.
-  if (IsMasked) {
-    Mask = True->getOperand(Info->MaskOpIdx);
-    Glue = True->getOperand(True->getNumOperands() - 1);
-    assert(Glue.getValueType() == MVT::Glue);
-  }
-
   unsigned MaskedOpc = Info->MaskedPseudo;
 #ifndef NDEBUG
   const MCInstrDesc &MaskedMCID = TII->get(MaskedOpc);
@@ -3977,8 +3946,7 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
   Ops.push_back(False);
 
   const bool HasRoundingMode = RISCVII::hasRoundModeOp(TrueTSFlags);
-  const unsigned NormalOpsEnd = TrueVLIndex - IsMasked - HasRoundingMode;
-  assert(!IsMasked || NormalOpsEnd == Info->MaskOpIdx);
+  const unsigned NormalOpsEnd = TrueVLIndex - HasRoundingMode;
   Ops.append(True->op_begin() + HasTiedDest, True->op_begin() + NormalOpsEnd);
 
   Ops.push_back(Mask);
diff --git a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
index a612a03106f024..790a206f39e74c 100644
--- a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
@@ -65,7 +65,8 @@ class RISCVVectorPeephole : public MachineFunctionPass {
   bool convertToVLMAX(MachineInstr &MI) const;
   bool convertToWholeRegister(MachineInstr &MI) const;
   bool convertToUnmasked(MachineInstr &MI) const;
-  bool convertVMergeToVMv(MachineInstr &MI) const;
+  bool convertAllOnesVMergeToVMv(MachineInstr &MI) const;
+  bool convertSameMaskVMergeToVMv(MachineInstr &MI) const;
   bool foldUndefPassthruVMV_V_V(MachineInstr &MI);
   bool foldVMV_V_V(MachineInstr &MI);
 
@@ -342,17 +343,14 @@ bool RISCVVectorPeephole::convertToWholeRegister(MachineInstr &MI) const {
   return true;
 }
 
-// Transform (VMERGE_VVM_<LMUL> pt, false, true, allones, vl, sew) to
-// (VMV_V_V_<LMUL> pt, true, vl, sew). It may decrease uses of VMSET.
-bool RISCVVectorPeephole::convertVMergeToVMv(MachineInstr &MI) const {
+static unsigned getVMV_V_VOpcodeForVMERGE_VVM(const MachineInstr &MI) {
 #define CASE_VMERGE_TO_VMV(lmul)                                               \
   case RISCV::PseudoVMERGE_VVM_##lmul:                                         \
-    NewOpc = RISCV::PseudoVMV_V_V_##lmul;                                      \
+    return RISCV::PseudoVMV_V_V_##lmul;                                        \
     break;
-  unsigned NewOpc;
   switch (MI.getOpcode()) {
   default:
-    return false;
+    return 0;
     CASE_VMERGE_TO_VMV(MF8)
     CASE_VMERGE_TO_VMV(MF4)
     CASE_VMERGE_TO_VMV(MF2)
@@ -361,14 +359,68 @@ bool RISCVVectorPeephole::convertVMergeToVMv(MachineInstr &MI) const {
     CASE_VMERGE_TO_VMV(M4)
     CASE_VMERGE_TO_VMV(M8)
   }
+}
 
+/// Convert a PseudoVMERGE_VVM with an all ones mask to a PseudoVMV_V_V.
+///
+/// %x = PseudoVMERGE_VVM %passthru, %false, %true, %allones, sew, vl
+/// ->
+/// %x = PseudoVMV_V_V %passthru, %true, vl, sew, tu_mu
+bool RISCVVectorPeephole::convertAllOnesVMergeToVMv(MachineInstr &MI) const {
+  unsigned NewOpc = getVMV_V_VOpcodeForVMERGE_VVM(MI);
+  if (!NewOpc)
+    return false;
   assert(MI.getOperand(4).isReg() && MI.getOperand(4).getReg() == RISCV::V0);
   if (!isAllOnesMask(V0Defs.lookup(&MI)))
     return false;
 
   MI.setDesc(TII->get(NewOpc));
-  MI.removeOperand(2);  // False operand
-  MI.removeOperand(3);  // Mask operand
+  MI.removeOperand(2); // False operand
+  MI.removeOperand(3); // Mask operand
+  MI.addOperand(
+      MachineOperand::CreateImm(RISCVII::TAIL_UNDISTURBED_MASK_UNDISTURBED));
+
+  // vmv.v.v doesn't have a mask operand, so we may be able to inflate the
+  // register class for the destination and passthru operands e.g. VRNoV0 -> VR
+  MRI->recomputeRegClass(MI.getOperand(0).getReg());
+  if (MI.getOperand(1).getReg() != RISCV::NoRegister)
+    MRI->recomputeRegClass(MI.getOperand(1).getReg());
+  return true;
+}
+
+/// If a PseudoVMERGE_VVM's true operand is a masked pseudo and both have the
+/// same mask, and the masked pseudo's passthru is the same as the false
+/// operand, we can convert the PseudoVMERGE_VVM to a PseudoVMV_V_V.
+///
+/// %true = PseudoVADD_VV_M1_MASK %false, %x, %y, %mask, vl1, sew, policy
+/// %x = PseudoVMERGE_VVM %passthru, %false, %true, %mask, vl2, sew
+/// ->
+/// %true = PseudoVADD_VV_M1_MASK %false, %x, %y, %mask, vl1, sew, policy
+/// %x = PseudoVMV_V_V %passthru, %true, vl2, sew, tu_mu
+bool RISCVVectorPeephole::convertSameMaskVMergeToVMv(MachineInstr &MI) const {
+  unsigned NewOpc = getVMV_V_VOpcodeForVMERGE_VVM(MI);
+  if (!NewOpc)
+    return false;
+  MachineInstr *True = MRI->getVRegDef(MI.getOperand(3).getReg());
+  if (!True || !RISCV::getMaskedPseudoInfo(True->getOpcode()) ||
+      !hasSameEEW(MI, *True))
+    return false;
+
+  // True's passthru needs to be equivalent to False
+  Register TruePassthruReg = True->getOperand(1).getReg();
+  Register FalseReg = MI.getOperand(2).getReg();
+  if (TruePassthruReg != RISCV::NoRegister && TruePassthruReg != FalseReg)
+    return false;
+
+  const MachineInstr *TrueV0Def = V0Defs.lookup(True);
+  const MachineInstr *MIV0Def = V0Defs.lookup(&MI);
+  assert(TrueV0Def->isCopy() && MIV0Def->isCopy());
+  if (TrueV0Def->getOperand(1).getReg() != MIV0Def->getOperand(1).getReg())
+    return false;
+
+  MI.setDesc(TII->get(NewOpc));
+  MI.removeOperand(2); // False operand
+  MI.removeOperand(3); // Mask operand
   MI.addOperand(
       MachineOperand::CreateImm(RISCVII::TAIL_UNDISTURBED_MASK_UNDISTURBED));
 
@@ -622,7 +674,8 @@ bool RISCVVectorPeephole::runOnMachineFunction(MachineFunction &MF) {
       Changed |= tryToReduceVL(MI);
       Changed |= convertToUnmasked(MI);
       Changed |= convertToWholeRegister(MI);
-      Changed |= convertVMergeToVMv(MI);
+      Changed |= convertAllOnesVMergeToVMv(MI);
+      Changed |= convertSameMaskVMergeToVMv(MI);
       if (foldUndefPassthruVMV_V_V(MI)) {
         Changed |= true;
         continue; // MI is erased
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
index e57b6a22dd6eab..569ada7949b1b5 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
@@ -62,12 +62,11 @@ define void @gather_masked(ptr noalias nocapture %A, ptr noalias nocapture reado
 ; CHECK-NEXT:    li a4, 5
 ; CHECK-NEXT:  .LBB1_1: # %vector.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vmv1r.v v9, v8
-; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, mu
-; CHECK-NEXT:    vlse8.v v9, (a1), a4, v0.t
-; CHECK-NEXT:    vle8.v v10, (a0)
-; CHECK-NEXT:    vadd.vv v9, v10, v9
-; CHECK-NEXT:    vse8.v v9, (a0)
+; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, ma
+; CHECK-NEXT:    vlse8.v v8, (a1), a4, v0.t
+; CHECK-NEXT:    vle8.v v9, (a0)
+; CHECK-NEXT:    vadd.vv v8, v9, v8
+; CHECK-NEXT:    vse8.v v8, (a0)
 ; CHECK-NEXT:    addi a0, a0, 32
 ; CHECK-NEXT:    addi a1, a1, 160
 ; CHECK-NEXT:    bne a0, a2, .LBB1_1
@@ -344,12 +343,11 @@ define void @scatter_masked(ptr noalias nocapture %A, ptr noalias nocapture read
 ; CHECK-NEXT:    li a4, 5
 ; CHECK-NEXT:  .LBB7_1: # %vector.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, mu
-; CHECK-NEXT:    vle8.v v9, (a1)
-; CHECK-NEXT:    vmv1r.v v10, v8
-; CHECK-NEXT:    vlse8.v v10, (a0), a4, v0.t
-; CHECK-NEXT:    vadd.vv v9, v10, v9
-; CHECK-NEXT:    vsse8.v v9, (a0), a4, v0.t
+; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, ma
+; CHECK-NEXT:    vle8.v v8, (a1)
+; CHECK-NEXT:    vlse8.v v9, (a0), a4, v0.t
+; CHECK-NEXT:    vadd.vv v8, v9, v8
+; CHECK-NEXT:    vsse8.v v8, (a0), a4, v0.t
 ; CHECK-NEXT:    addi a1, a1, 32
 ; CHECK-NEXT:    addi a0, a0, 160
 ; CHECK-NEXT:    bne a1, a2, .LBB7_1
diff --git a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir
index 19a918148e6eb8..875d4229bbc6e1 100644
--- a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir
@@ -68,3 +68,73 @@ body: |
     $v0 = COPY %mask
     %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, %avl, 5
 ...
+---
+name: same_mask
+body: |
+  bb.0:
+    liveins: $v8, $v9, $v0
+    ; CHECK-LABEL: name: same_mask
+    ; CHECK: liveins: $v8, $v9, $v0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %pt:vr = COPY $v8
+    ; CHECK-NEXT: %false:vrnov0 = COPY $v9
+    ; CHECK-NEXT: %mask:vr = COPY $v0
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %true:vrnov0 = PseudoVADD_VV_M1_MASK %false, $noreg, $noreg, $v0, 4, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %x:vr = PseudoVMV_V_V_M1 %pt, %true, 8, 5 /* e32 */, 0 /* tu, mu */
+    %pt:vrnov0 = COPY $v8
+    %false:vrnov0 = COPY $v9
+    %mask:vr = COPY $v0
+    $v0 = COPY %mask
+    %true:vrnov0 = PseudoVADD_VV_M1_MASK %false, $noreg, $noreg, $v0, 4, 5 /* e32 */, 0 /* tu, mu */
+    $v0 = COPY %mask
+    %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */
+...
+---
+# Shouldn't be converted because false operands are different
+name: same_mask_different_false
+body: |
+  bb.0:
+    liveins: $v8, $v9, $v0
+    ; CHECK-LABEL: name: same_mask_different_false
+    ; CHECK: liveins: $v8, $v9, $v0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %pt:vrnov0 = COPY $v8
+    ; CHECK-NEXT: %false:vrnov0 = COPY $v9
+    ; CHECK-NEXT: %mask:vr = COPY $v0
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %true:vrnov0 = PseudoVADD_VV_M1_MASK %pt, $noreg, $noreg, $v0, 4, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */
+    %pt:vrnov0 = COPY $v8
+    %false:vrnov0 = COPY $v9
+    %mask:vr = COPY $v0
+    $v0 = COPY %mask
+    %true:vrnov0 = PseudoVADD_VV_M1_MASK %pt, $noreg, $noreg, $v0, 4, 5 /* e32 */, 0 /* tu, mu */
+    $v0 = COPY %mask
+    %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */
+...
+---
+# Shouldn't be converted because EEWs are different
+name: same_mask_different_eew
+body: |
+  bb.0:
+    liveins: $v8, $v9, $v0
+    ; CHECK-LABEL: name: same_mask_different_eew
+    ; CHECK: liveins: $v8, $v9, $v0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %pt:vrnov0 = COPY $v8
+    ; CHECK-NEXT: %false:vrnov0 = COPY $v9
+    ; CHECK-NEXT: %mask:vr = COPY $v0
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %true:vrnov0 = PseudoVADD_VV_M1_MASK %false, $noreg, $noreg, $v0, 4, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */
+    %pt:vrnov0 = COPY $v8
+    %false:vrnov0 = COPY $v9
+    %mask:vr = COPY $v0
+    $v0 = COPY %mask
+    %true:vrnov0 = PseudoVADD_VV_M1_MASK %false, $noreg, $noreg, $v0, 4, 4 /* e16 */, 0 /* tu, mu */
+    $v0 = COPY %mask
+    %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */

llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp

topperc

LGTM

llvm-ci · 2024-09-06T01:35:51Z

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 10 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/5046

Here is the relevant piece of the build log for the reference

Step 10 (Add check check-offload) failure: 1200 seconds without output running [b'ninja', b'-j 32', b'check-offload'], attempting to kill
...
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/test_libc.cpp (869 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49779.cpp (870 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug47654.cpp (871 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug50022.cpp (872 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/wtime.c (873 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/bug49021.cpp (874 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/std_complex_arithmetic.cpp (875 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/complex_reduction.cpp (876 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp (877 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49021.cpp (878 of 879)
command timed out: 1200 seconds without output running [b'ninja', b'-j 32', b'check-offload'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1235.780149

rofirrim · 2024-09-09T07:04:46Z

llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp

+  assert(TrueV0Def && TrueV0Def->isCopy() && MIV0Def && MIV0Def->isCopy());
+  if (TrueV0Def->getOperand(1).getReg() != MIV0Def->getOperand(1).getReg())
+    return false;
+


Hi @lukel97, shouldn't we check around here that the mask for the false operand also matches?

I'm seeing this case

$v0 = COPY %15:vr %20:vrnov0 = PseudoVRSUB_VI_MF8_MASK $noreg(tied-def 0), killed %19:vrnov0, -2, $v0, %12:gprnox0, 3, 1 … $v0 = COPY %14:vr ; <---- Different mask! %26:vrnov0 = PseudoVOR_VI_MF8_MASK $noreg(tied-def 0), killed %25:vrnov0, 1, $v0, %12:gprnox0, 3, 1 $v0 = COPY %15:vr %27:vrnov0 = PseudoVMERGE_VVM_MF8 $noreg(tied-def 0), killed %26:vrnov0, killed %20:vrnov0, $v0, %12:gprnox0, 3

being turned into

$v0 = COPY %15:vr %27:vr = PseudoVMV_V_V_MF8 $noreg(tied-def 0), killed %20:vrnov0, %12:gprnox0, 3, 0

which I don't think is equivalent.

I can open an issue with a reproducer if that helps.

Thanks for catching this, I think the False == TruePassthru check needs to be the other way round, i.e we check that False is NoRegister, not TruePassthru. The MIR should be enough to recreate a test case, I'll take a look into this now

I've opened #107827 to fix it, I believe the underlying issue was that we were discarding the false operand when the true operand's passthru was undef.

In your example above, in theory I think we can still replace the VMERGE_VVM with VMV_V_V as long as the VRSUB_VI's passthru becomes the false operand. In practice though we would need to move the VRSUB_VI down to access %26, but we can't move past a copy to $v0, so the peephole should just bail instead after the patch.

This fixes the issue raised in llvm#106108 (comment) True's passthru needs to be equivalent to vmerge's false, but we also allow true's passthru to be undef. However if it's undef then we need to replace it with vmerge's false, otherwise we end up discarding the false operand entirely. The changes in fixed-vectors-strided-load-store-asm.ll undo the changes in llvm#106108 where we introduced this miscompile.

This fixes the issue raised in #106108 (comment) True's passthru needs to be equivalent to vmerge's false, but we also allow true's passthru to be undef. However if it's undef then we need to replace it with false, otherwise we end up discarding the false operand entirely. The changes in fixed-vectors-strided-load-store-asm.ll undo the changes in #106108 where we introduced this miscompile.

…06108) We currently fold a vmerge.vvm into its true operand if the true operand is a masked pseudo with the same mask. We can move this over to RISCVVectorPeephole by instead splitting it up into a smaller peephole which converts it to a vmv.v.v first. The existing foldVMV_V_V peephole will then take care of folding it if needed. This is very similar to the existing all-ones mask peephole and we could potentially do it inside of it. I opted to put it in a separate peephole to make it easier to reason about, given that the duplication is small, but I could be persuaded either way.

…107827) This fixes the issue raised in llvm#106108 (comment) True's passthru needs to be equivalent to vmerge's false, but we also allow true's passthru to be undef. However if it's undef then we need to replace it with false, otherwise we end up discarding the false operand entirely. The changes in fixed-vectors-strided-load-store-asm.ll undo the changes in llvm#106108 where we introduced this miscompile.

commit 56905dab7da50bccfcceaeb496b206ff476127e1 Author: JinjinLi868 <lijinjin.868@bytedance.com> Date: Tue Sep 10 10:47:33 2024 +0800 [clang] fix half && bfloat16 convert node expr codegen (#89051) Data type conversion between fp16 and bf16 will generate fptrunc and fpextend nodes, but they are actually bitcast nodes. commit ffcff4af59712792712b33648f8ea148b299c364 Author: Yingwei Zheng <dtcxzyw2333@gmail.com> Date: Tue Sep 10 10:38:21 2024 +0800 [ValueTracking] Infer is-power-of-2 from assumptions. (#107745) This patch tries to infer is-power-of-2 from assumptions. I don't see that this kind of assumption exists in my dataset. Related issue: https://github.com/rust-lang/rust/issues/129795 Close https://github.com/llvm/llvm-project/issues/58996. commit eb0e4b1415800e34b86319ce1d57ad074d5ca202 Author: Petr Hosek <phosek@google.com> Date: Mon Sep 9 19:21:59 2024 -0700 [Fuzzer] Passthrough zlib CMake paths into the test (#107926) We shouldn't assume that we're using system zlib installation. commit 761bf333e378b52614cf36cd5db2837d5e4e0ae4 Author: Yuxuan Chen <ych@fb.com> Date: Mon Sep 9 18:57:39 2024 -0700 [LLVM][Coroutines] Switch CoroAnnotationElidePass to a FunctionPass (#107897) After landing https://github.com/llvm/llvm-project/pull/99285 we found that the call graph update was causing the following crash when expensive checks are turned on ``` llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC || RC->isAncestorOf(Targe tRC)) && "New call edge is not trivial!"' failed. ``` I have to admit I believe that the call graph update process I did for that patch could be wrong. After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced that `CoroAnnotationElidePass` can be a FunctionPass and rely on the adaptor to update the call graph for us, so long as we properly invalidate the caller's analyses. After this patch, `llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer fails under expensive checks. commit 7a8e9dfe5cc6f049f918e528ef476d9e7aada8a5 Author: Jordan Rupprecht <rupprecht@google.com> Date: Mon Sep 9 20:34:43 2024 -0500 [bazel][libc][NFC] Add missing layering deps (#107947) After 277371943fa48f2550df02870951f5e5a77efef5 e.g. ``` external/llvm-project/libc/test/src/math/smoke/NextTowardTest.h:12:10: error: module llvm-project//libc/test/src/math/smoke:nexttowardf_test does not depend on a module exporting 'src/__support/CPP/bit.h' ``` commit 1ca411ca451e0e86caf9207779616f32ed9fd908 Author: wanglei <wanglei@loongson.cn> Date: Tue Sep 10 09:28:15 2024 +0800 [LoongArch] Codegen for concat_vectors with LASX Fixes: #107355 Reviewed By: SixWeining Pull Request: https://github.com/llvm/llvm-project/pull/107523 commit e64a1c00c1d612dccd976c06fdac85afa3b06fbe Author: Mircea Trofin <mtrofin@google.com> Date: Mon Sep 9 18:25:50 2024 -0700 Fix unintended extra commit in PR #107499 commit f7479b5ff43261a20258743da5fa583a0c729564 Author: Rahul Joshi <rjoshi@nvidia.com> Date: Mon Sep 9 18:24:07 2024 -0700 [NFC][TableGen] Simplify DirectiveEmitter using range for loops (#107909) Make constructors that take const Record * implicit, allowing us to simplify some range based loops to use that class instance as the loop variable. Change remaining constructor calls to use () instead of {} to construct objects. commit a111f9119a5ec77c19a514ec09454218f739454f Author: Yingwei Zheng <dtcxzyw2333@gmail.com> Date: Tue Sep 10 09:19:39 2024 +0800 [LoongArch][ISel] Check the number of sign bits in `PatGprGpr_32` (#107432) After https://github.com/llvm/llvm-project/pull/92205, LoongArch ISel selects `div.w` for `trunc i64 (sdiv i64 3202030857, (sext i32 X to i64)) to i32`. It is incorrect since `3202030857` is not a signed 32-bit constant. It will produce wrong result when `X == 2`: https://alive2.llvm.org/ce/z/pzfGZZ This patch adds additional `sexti32` checks to operands of `PatGprGpr_32`. Alive2 proof: https://alive2.llvm.org/ce/z/AkH5Mp Fix #107414. commit f3b4e47b34e59625e2c8420ce8bf789373177d6d Author: Longsheng Mou <longshengmou@gmail.com> Date: Tue Sep 10 09:19:22 2024 +0800 [mlir][linalg][NFC] Drop redundant rankReductionStrategy (#107875) This patch drop redundant rankReductionStrategy in `populateFoldUnitExtentDimsViaSlicesPatterns` and fixes comment typos. commit 3b2261809471a018de50e745c0d475b048c66fd4 Author: Mircea Trofin <mtrofin@google.com> Date: Mon Sep 9 18:16:24 2024 -0700 [ctx_prof] Insert the ctx prof flattener after the module inliner (#107499) This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore. Issue [#89287](https://github.com/llvm/llvm-project/issues/89287) commit b0d2411b53a0b55baf6d6dc7986d285ce59807fa Author: Alex MacLean <amaclean@nvidia.com> Date: Mon Sep 9 17:37:09 2024 -0700 [NVPTX] Support copysign PTX instruction (#107800) Lower `fcopysign` SDNodes into `copysign` PTX instructions where possible. See [PTX ISA: 9.7.3.2. Floating Point Instructions: copysign] (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-copysign). commit 81ef8e2fdbdfac4e186e12a874242b294d05d4e0 Author: Vitaly Buka <vitalybuka@google.com> Date: Mon Sep 9 17:00:06 2024 -0700 [NFC][sanitizer] Extract GetDTLSRange (#107934) commit ae02211eaef305f957b419e5c39499aa472b956e Author: vporpo <vporpodas@google.com> Date: Mon Sep 9 16:52:54 2024 -0700 [SandboxIR] Implement UndefValue (#107628) This patch implements sandboxir::UndefValue mirroring llvm::UndefValue. commit 33c1325a73c4bf6bacdb865c2550038afe4377d2 Author: Anton Korobeynikov <anton@korobeynikov.info> Date: Mon Sep 9 16:34:41 2024 -0700 [PAC] Make __is_function_overridden pauth-aware on ELF platforms (#107498) Apparently, there are two almost identical implementations: one for MachO and another one for ELF. The ELF bits somehow slipped while https://github.com/llvm/llvm-project/pull/84573 was reviewed. The particular implementation is identical to MachO case. commit 88bd507dc2dd9c235b54d718cf84e4ef80d94bc9 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon Sep 9 11:07:38 2024 -0700 [X86] Handle shifts + and in `LowerSELECTWithCmpZero` shifts are the same as sub where rhs == 0 is identity. and is the inverted case where: `SELECT (AND(X,1) == 0), (AND Y, Z), Y` -> `(AND Y, (OR NEG(AND(X, 1)), Z))` With -1 as the identity. Closes #107910 commit d148a1a40461ed27863f4b17ac2bd5914499f413 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon Sep 9 11:07:36 2024 -0700 [X86] Add tests support shifts + and in `LowerSELECTWithCmpZero`; NFC commit 26b786ae2f15bfbf6f0925856a788ae0bfb2f8c1 Author: Artem Belevich <tra@google.com> Date: Mon Sep 9 16:15:00 2024 -0700 [NVPTX] Restrict combining to properly aligned v16i8 vectors. (#107919) Fixes generation of invalid loads leading to misaligned access errors. The bug got exposed by SLP vectorizer change ec360d6 which allowed SLP to produce `v16i8` vectors. Also updated the tests to use automatic check generator. commit f12e10b513686a12f20f0c897dcc9ffc00cbce09 Author: vporpo <vporpodas@google.com> Date: Mon Sep 9 15:41:30 2024 -0700 [SandboxVec] Implement Pass class (#107617) This patch implements the Pass base class and the FunctionPass sub-class that operate on Sandbox IR. commit bdf02249e7f8f95177ff58c881caf219699acb98 Author: Rahul Joshi <rjoshi@nvidia.com> Date: Mon Sep 9 14:33:21 2024 -0700 [TableGen] Change CGIOperandList::OperandInfo::Rec to const pointer (#107858) Change CGIOperandList::OperandInfo::Rec and CGIOperandList::TheDef to const pointer. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089 commit a9a5a18a0e99b0251c0fe6ce61c5e699bf6b379b Author: Tim Gymnich <tgymnich@icloud.com> Date: Mon Sep 9 23:27:27 2024 +0200 [SPIRV] Add sign intrinsic part 1 (#101987) partially fixes #70078 - Added `int_spv_sign` intrinsic in `IntrinsicsSPIRV.td` - Added lowering and map to `int_spv_sign in `SPIRVInstructionSelector.cpp` - Added SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/sign.ll` - https://github.com/llvm/llvm-project/pull/101988 - https://github.com/llvm/llvm-project/pull/101989 commit 66e9078f827383f77c1c239f6c09f2b07a963649 Author: Steven Wu <stevenwu@apple.com> Date: Mon Sep 9 14:12:12 2024 -0700 [LTO] Fix a use-after-free in legacy LTO C APIs (#107896) Fix a bug that `lto_runtime_lib_symbols_list` is returning the address of a local variable that will be freed when getting out of scope. This is a regression from #98512 that rewrites the runtime libcall function lists into a SmallVector. rdar://135559037 commit d9a996020394a8181d17e4f0a0fc89d59371f9af Author: ChiaHungDuan <chiahungduan@google.com> Date: Mon Sep 9 13:59:03 2024 -0700 [scudo] Add fragmentation info for each memory group (#107475) This information helps with tuning the heuristic of selecting memory groups to release the unused pages. commit 6f8d2781f604cfcf9ea6facecc0bea8e4d682e1e Author: Sterling-Augustine <56981066+Sterling-Augustine@users.noreply.github.com> Date: Mon Sep 9 20:49:49 2024 +0000 [SandboxIR] Add missing VectorType functions (#107650) Fills in many missing functions from VectorType commit 53a81d4d26f0409de8a0655d7af90f2bea222a12 Author: Charlie Barto <chbarto@microsoft.com> Date: Mon Sep 9 13:41:08 2024 -0700 Reland [asan][windows] Eliminate the static asan runtime on windows (#107899) This reapplies 8fa66c6ca7272268747835a0e86805307b62399c ([asan][windows] Eliminate the static asan runtime on windows) for a second time. That PR bounced off the tests because it caused failures in the other sanitizer runtimes, these have been fixed by only building interception, sanitizer_common, and asan with /MD, and continuing to build the rest of the runtimes with /MT. This does mean that any usage of the static ubsan/fuzzer/etc runtimes will mean you're mixing different runtime library linkages in the same app, the interception, sanitizer_common, and asan runtimes are designed for this, however it does result in some linker warnings. Additionally, it turns out when building in release-mode with LLVM_ENABLE_PDBs the build system forced /OPT:ICF. This totally breaks asan's "new" method of doing "weak" functions on windows, and so /OPT:NOICF was explicitly added to asan's link flags. --------- Co-authored-by: Amy Wishnousky <amyw@microsoft.com> commit 34034381b7d54da864f8794f578d9c501d6d4f3b Author: Florian Hahn <flo@fhahn.com> Date: Mon Sep 9 21:35:59 2024 +0100 [VPlan] Consistently use VTC for vector trip count in vplan-printing.ll. The inconsistency surfaced in https://github.com/llvm/llvm-project/pull/95305. Split off the reduce the diff. commit 3f22756f391e20040fa3581206b77c409433bd9f Author: Justin Bogner <mail@justinbogner.com> Date: Mon Sep 9 13:21:22 2024 -0700 [DirectX] Lower `@llvm.dx.typedBufferLoad` to DXIL ops The `@llvm.dx.typedBufferLoad` intrinsic is lowered to `@dx.op.bufferLoad`. There's some complexity here in translating to scalarized IR, which I've abstracted out into a function that should be useful for samples, gathers, and CBuffer loads. I've also updated the DXILResources.rst docs to match what I'm doing here and the proposal in llvm/wg-hlsl#59. I've removed the content about stores and raw buffers for now with the expectation that it will be added along with the work. Note that this change includes a bit of a hack in how it deals with `getOverloadKind` for the `dx.ResRet` types - we need to adjust how we deal with operation overloads to generate a table directly rather than proxy through the OverloadKind enum, but that's left for a later change here. Part of #91367 Pull Request: https://github.com/llvm/llvm-project/pull/104252 commit 985600dcd3fcef4095097bea5b556e84c8143a7f Author: Rahul Joshi <rjoshi@nvidia.com> Date: Mon Sep 9 13:09:53 2024 -0700 [TableGen] Migrate CodeGenHWModes to use const RecordKeeper (#107851) Migrate CodeGenHWModes to use const RecordKeeper and const Record pointers. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089 commit b3d2d5039b9b8aa10a86c593387f200b15c02aef Author: Alexey Bataev <a.bataev@outlook.com> Date: Mon Sep 9 12:32:45 2024 -0700 [SLP][NFC]Reorder code for better structural complexity, NFC commit e62bf7cd0beb530bc0842bb7aa8ff162607a82b9 Author: Sean Perry <perry@ca.ibm.com> Date: Mon Sep 9 15:24:16 2024 -0400 [z/OS] Set the default arch for z/OS to be arch10 (#89854) The default arch level on z/OS is arch10. Update the code so z/OS has arch10 without changing the default for zLinux. commit 98815f7878c3240e27f516e331255532087f5fcb Author: c8ef <c8ef@outlook.com> Date: Tue Sep 10 03:13:29 2024 +0800 [clang][docs] Add clang-tutor to External Clang Examples (#107665) commit 3681d8552fb9e6cb15e9d45849ff2e34a25c518e Author: Nikita Popov <nikita.ppv@gmail.com> Date: Mon Sep 9 21:10:12 2024 +0200 Revert "[Clang][Sema] Use the correct lookup context when building overloaded 'operator->' in the current instantiation (#104458)" This reverts commit 3cdb30ebbc18fa894d3bd67aebcff76ce7c741ac. Breaks clang bootstrap. commit ab82f83dae065a9aa4716618524eddf4aad5fcf0 Author: Mingming Liu <mingmingl@google.com> Date: Mon Sep 9 11:53:07 2024 -0700 [LTO][NFC] Fix forward declaration (#107902) Fix after https://github.com/llvm/llvm-project/pull/107792 commit 6776d65ceaea84fe815845da3c41b2f1621521fb Author: NoumanAmir-10xe <66777536+NoumanAmir657@users.noreply.github.com> Date: Mon Sep 9 23:49:22 2024 +0500 [libc++] Implement LWG3953 (#107535) Closes #105303 commit eec1ee8ef10820c61c03b00b68d242d8c87d478a Author: Abhina Sree <Abhina.Sreeskantharajan@ibm.com> Date: Mon Sep 9 14:37:53 2024 -0400 [SystemZ][z/OS] Enable lit testing for z/OS (#107631) This patch fixes various errors to enable llvm-lit to run on z/OS commit 78c1009c3e54e59b6177deb4d74dd3a3083a3f01 Author: Rahul Joshi <rjoshi@nvidia.com> Date: Mon Sep 9 11:35:13 2024 -0700 [NFC][TableGen] DirectiveEmitter code cleanup (#107775) Eliminate unnecessary llvm:: prefix as this code is in llvm namespace. Use ArrayRef<> instead of std::vector references when appropriate. Use .empty() instead of .size() == 0. commit 99ea357f7b5e7e01e42b8d68dd211dc304b3115b Author: Aiden Grossman <aidengrossman@google.com> Date: Mon Sep 9 11:34:53 2024 -0700 [MLGO] Fix logging verbosity in scripts (#107818) This patch fixes issues related to logging verbosity in the MLGO python scripts. This was an oversight when converting from absl.logging to the python logging API as absl natively supports a --verbosity flag to set the desired logging level. This patch adds a flag to support similar functionality in Python's logging library and additionally updates docstrings where relevant to point to the new values. commit a7c26aaf2eca61cd5d885194872471c63d68f3bc Author: Zequan Wu <zequanwu@google.com> Date: Mon Sep 9 11:34:13 2024 -0700 Revert "[Coverage] Ignore unused functions if the count is 0." (#107901) Reverts llvm/llvm-project#107661 Breaks llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp commit 02fff933d0eff71db8ff44f4acf1641bb1ad4d38 Author: Aiden Grossman <aidengrossman@google.com> Date: Mon Sep 9 18:28:23 2024 +0000 [MLGO] Remove unused imports Remove unused imports from python files in the MLGO library. commit 048e46ad53bedef076df868524f0a15eb7cbd38c Author: Brian Cain <bcain@quicinc.com> Date: Mon Sep 9 13:27:13 2024 -0500 [clang, hexagon] Update copyright, license text (#107161) When this file was first contributed - `28b01c59c93d ([hexagon] Add {hvx,}hexagon_{protos,circ_brev...}, 2021-06-30)` - I incorrectly included a QuIC copyright statement with "All rights reserved". I should have contributed this file with the `Apache+LLVM exception` license. commit b1b9b7b853fc4301aedd9ad6b7c22b75f5546b94 Author: Eduard Satdarov <sath@yandex-team.ru> Date: Mon Sep 9 21:17:53 2024 +0300 [libc++] Cache file attributes during directory iteration (#93316) This patch adds caching of file attributes during directory iteration on Windows. This improves the performance when working with files being iterated on in a directory. commit 09b231cb38755e1bd122dbab9c57c4847bf64204 Author: Mingming Liu <mingmingl@google.com> Date: Mon Sep 9 11:16:58 2024 -0700 Re-apply "[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF" (#107792) Fix the use-after-free bug and re-apply https://github.com/llvm/llvm-project/pull/106193 * Without the fix, the string referenced by `objSym.Name` could be destroyed even if string saver keeps a copy of the referenced string. This caused use-after-free. * The fix ([latest commit](https://github.com/llvm/llvm-project/pull/107792/commits/9776ed44cfb26172480145aed8f59ba78a6fa2ea)) updates `objSym.Name` to reference (via `StringRef`) the string saver's copy. Test: 1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible with `-DLLVM_USE_SANITIZER=Address` and gone with the fix. 3. Run all tests by following https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes. * Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at `@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined [here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30) * With the fix, the [multi-stage test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh) pass stage2 {asan, ubsan, masan}. This is also the test used by https://lab.llvm.org/buildbot/#/builders/169 **Original commit message** `StringMap<T>` creates a [copy of the string](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMapEntry.h#L55-L58) for entry insertions and intentionally keep copies [since the implementation optimizes string memory usage](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMap.h#L124). On the other hand, linker keeps copies of symbol names [1] in `lld::elf::parseFiles` [2] before invoking `compileBitcodeFiles` [3]. This change proposes to optimize away string copies inside [LTO::GlobalResolutions](https://github.com/llvm/llvm-project/blob/24e791b4164986a1ca7776e3ae0292ef20d20c47/llvm/include/llvm/LTO/LTO.h#L409), which will make LTO indexing more memory efficient for ELF. There are similar opportunities for other (COFF, wasm, MachO) formats. The optimization takes place for lld (ELF) only. For the rest of use cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep copies and use global resolution key for de-duplication. Together with @kazutakahirata's work to make `ComputeCrossModuleImport` more memory efficient, we see a ~20% peak memory usage reduction in a binary where peak memory usage needs to go down. Thanks to the optimization in https://github.com/llvm/llvm-project/commit/329ba523ccbbe68a12434926c92fd9a86494d958, the max (as opposed to the sum) of `ComputeCrossModuleImport` or `GlobalResolution` shows up in peak memory usage. * Regarding correctness, the set of [resolved](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/lib/LTO/LTO.cpp#L739) [per-module symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L188-L191) is a subset of [llvm::lto::InputFile::Symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L120). And bitcode symbol parsing saves symbol name when iterating `obj->symbols` in `BitcodeFile::parse` already. This change updates `BitcodeFile::parseLazy` to keep copies of per-module undefined symbols. * Presumably the undefined symbols in a LTO unit (copied in this patch in linker unique saver) is a small set compared with the set of symbols in global-resolution (copied before this patch), making this a worthwhile trade-off. Benchmarking this change alone shows measurable memory savings across various benchmarks. [1] ELF https://github.com/llvm/llvm-project/blob/1cea5c2138bef3d8fec75508df6dbb858e6e3560/lld/ELF/InputFiles.cpp#L1748 [2] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2863 [3] https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2995 commit 277371943fa48f2550df02870951f5e5a77efef5 Author: lntue <35648136+lntue@users.noreply.github.com> Date: Mon Sep 9 14:15:46 2024 -0400 [libc][bazel] Update bazel overlay for math functions and their tests. (#107862) commit 4a501a4556bb191bd6eb5398a7330a28437e5087 Author: Artem Belevich <tra@google.com> Date: Mon Sep 9 11:14:41 2024 -0700 [CUDA/HIP] propagate -cuid to a host-only compilation. (#107483) Right now we're bailing out too early, and `-cuid` does not get set for the host-only compilations. commit 6850410562123b6e4fbb039e7ba4a2325b994b84 Author: Zequan Wu <zequanwu@google.com> Date: Mon Sep 9 11:14:21 2024 -0700 [Coverage] Ignore unused functions if the count is 0. (#107661) Relax the condition to ignore the case when count is 0. This fixes a bug on https://github.com/llvm/llvm-project/commit/381e9d2386facea7f2acc0f8c16a6d0731267f80. This was reported at https://discourse.llvm.org/t/coverage-from-multiple-test-executables/81024/. commit 5f74671c85877e03622e8d308aee15ed73ccee7c Author: Tarun Prabhu <tarun@lanl.gov> Date: Mon Sep 9 12:10:16 2024 -0600 [flang][Driver] Support -Xlinker in flang (#107472) Partially addresses: https://github.com/llvm/llvm-project/issues/89888 commit 0f349b7a9cde0080e626f6cfd362885341eb63b4 Author: Sarah Spall <spall@users.noreply.github.com> Date: Mon Sep 9 11:07:20 2024 -0700 [HLSL] Implement support for HLSL intrinsic - select (#107129) Implement support for HLSL intrinsic select. This would close issue #75377 commit 34e3007c69eb91c16f23f20548305a2fb8feb75e Author: Kazu Hirata <kazu@google.com> Date: Mon Sep 9 10:51:52 2024 -0700 [ARM] Fix a warning This patch fixes: llvm/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h:214:5: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] commit 6cc0138ca3dbdb21f4c4a5fa39cf05c38da4bb75 Author: Chris B <chris.bieneman@me.com> Date: Mon Sep 9 12:34:50 2024 -0500 Fix implicit conversion rank ordering (#106811) DXC prefers dimension-preserving conversions over precision-losing conversions. This means a double4 -> float4 conversion is preferred over a double4 -> double3 or double4 -> double conversion. commit cd8229bb4bfa4de45528ce101d9dceb9be8bff9e Author: Valentin Clement (バレンタインクレメン) <clementval@gmail.com> Date: Mon Sep 9 10:32:35 2024 -0700 [flang][cuda] Support c_devptr in c_f_pointer intrinsic (#107470) This is an extension of CUDA Fortran. The iso_c_binding intrinsic can accept a `TYPE(c_devptr)` as its first argument. This patch relax the semantic check to accept it and update the lowering to unwrap the cptr field from the c_devptr. commit 7543d09b852695187d08aa5d56d50016fea8f706 Author: Andrew Ng <andrew.ng@sony.com> Date: Mon Sep 9 18:18:41 2024 +0100 [llvm-ml] Fix RIP-relative addressing for ptr operands (#107618) Fixes #54773 commit 7f90479b2300b3758fd90015a2e6e7e94cfcf1e7 Author: Leandro Lupori <leandro.lupori@linaro.org> Date: Mon Sep 9 14:09:45 2024 -0300 [flang][OpenMP] Don't abort when default is used on an invalid directive (#107586) The previous assert was not considering programs with semantic errors. Fixes https://github.com/llvm/llvm-project/issues/107495 Fixes https://github.com/llvm/llvm-project/issues/93437 commit 95831f012d76558fe78f5f3e71b1003a773384e5 Author: David Green <david.green@arm.com> Date: Mon Sep 9 18:04:38 2024 +0100 [ARM] Add a default unreachable case to AddrModeToString. NFC Fixes #107739 commit c36c462cc719d47aa2408bca91a028300b2be6d4 Author: Kazu Hirata <kazu@google.com> Date: Mon Sep 9 09:44:37 2024 -0700 [LTO] Simplify calculateCallGraphRoot (NFC) (#107765) The function returns an instance of FunctionSummary populated by calculateCallGraphRoot regardless of whether Edges is empty or not. commit 7d371725cdf993d16f6debf74cf740c3aea84f9b Author: Mingming Liu <mingmingl@google.com> Date: Mon Sep 9 09:43:47 2024 -0700 [NFCI][BitcodeReader]Read real GUID from VI as opposed to storing it in map (#107735) Currently, `ValueIdToValueInfoMap` [1] stores `std::tuple<ValueInfo, GlobalValue::GUID /* original GUID */, GlobalValue::GUID /* real GUID*/ >`. This change updates the stored value type to `std::pair<ValueInfo, GlobalValue::GUID /* original GUID */>`, and reads real GUID from ValueInfo. When an entry is inserted into `ValueIdToValueInfoMap`, ValueInfo is created or inserted using real GUID [2]. ValueInfo keeps a pointer to GlobalValueMap [3], using either `GUID` or `{GUID, Name}` [4] when reading per-module summaries to create a combined summary. [1] owned by per module-summary bitcode reader https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L947-L950 [2] [first](https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7130-L7133), [second](https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7221-L7222), [third](https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7622-L7623) [3] https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1427-L1431 [4] https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1631 and https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1621 --------- Co-authored-by: Kazu Hirata <kazu@google.com> commit 60f052edc66a5b5b346635656f231930c436a008 Author: Petr Hosek <phosek@google.com> Date: Mon Sep 9 09:43:02 2024 -0700 [CMake] Passthrough variables for packages to subbuilds (#107611) These packaged are imported by LLVMConfig.cmake and so we should be passing through the necessary variables from the parent build into the subbuilds. We use `CMAKE_CACHE_DEFAULT_ARGS` so subbuilds can override these variables if needed. commit 5c8fd1eece8fff69871cef57a2363dc0f734a7d1 Author: Sam Clegg <sbc@chromium.org> Date: Mon Sep 9 09:28:08 2024 -0700 [lld][WebAssembly] Fix use of uninitialized stack data with --wasm64 (#107780) In the case of `--wasm64` we were setting the type of the init expression to be 64-bit but were only setting the low 32-bits of the value (by assigning to Int32). Fixes: https://github.com/emscripten-core/emscripten/issues/22538 commit 95753ffa49f57c284a4682a8ca03e05d59f2c112 Author: LLVM GN Syncbot <llvmgnsyncbot@gmail.com> Date: Mon Sep 9 16:13:05 2024 +0000 [gn build] Port ea2da571c761 commit db6051dae085c35020c1273ae8d38508c9958bc7 Author: Pavel Skripkin <paskripkin@gmail.com> Date: Mon Sep 9 19:12:38 2024 +0300 [analyzer] fix crash on binding to symbolic region with `void *` type (#107572) As reported in https://github.com/llvm/llvm-project/pull/103714#issuecomment-2295769193. CSA crashes on trying to bind value to symbolic region with `void *`. This happens when such region gets passed as inline asm input and engine tries to bind `UnknownVal` to that region. Fix it by changing type from void to char before calling `GetElementZeroRegion` commit 3cdb30ebbc18fa894d3bd67aebcff76ce7c741ac Author: Krystian Stasiowski <sdkrystian@gmail.com> Date: Mon Sep 9 12:06:45 2024 -0400 [Clang][Sema] Use the correct lookup context when building overloaded 'operator->' in the current instantiation (#104458) Currently, clang erroneously rejects the following: ``` struct A { template<typename T> void f(); }; template<typename T> struct B { void g() { (*this)->template f<int>(); // error: no member named 'f' in 'B<T>' } A* operator->(); }; ``` This happens because `Sema::ActOnStartCXXMemberReference` does not adjust the `ObjectType` parameter when `ObjectType` is a dependent type (except when the type is a `PointerType` and the class member access is the `->` form). Since the (possibly adjusted) `ObjectType` parameter (`B<T>` in the above example) is passed to `Parser::ParseOptionalCXXScopeSpecifier`, we end up looking up `f` in `B` rather than `A`. This patch fixes the issue by identifying cases where the type of the object expression `T` is a dependent, non-pointer type and: - `T` is the current instantiation and lookup for `operator->` finds a member of the current instantiation, or - `T` has at least one dependent base case, and `operator->` is not found in the current instantiation and using `ASTContext::DependentTy` as the type of the object expression when the optional _nested-name-specifier_ is parsed. Fixes #104268. commit eba6160deec5a32e4b31c2a446172d0e388195c9 Author: Tarun Prabhu <tarun@lanl.gov> Date: Mon Sep 9 09:57:49 2024 -0600 [flang][Driver] Support --no-warnings option (#107455) Because of the way visibility is implemented in Options.td, options that are aliases do not inherit the visibility of the option being aliased. Therefore, explicitly set the visibility of the alias to be the same as the aliased option. This partially addresses https://github.com/llvm/llvm-project/issues/89888 commit 914ab366c24cf494a798ce3a178686456731861a Author: sstipanovic <146831748+sstipanovic@users.noreply.github.com> Date: Mon Sep 9 17:54:30 2024 +0200 [AMDGPU] Overload image atomic swap to allow float as well. (#107283) LLPC can generate llvm.amdgcn.image.atomic.swap intrinsic with data argument as float type as well as float return type. This went unnoticed until CreateIntrinsic with implicit mangling was used. commit ea2da571c761066542f8d2273933d2523279e631 Author: Tyler Nowicki <tyler.nowicki@amd.com> Date: Mon Sep 9 11:50:27 2024 -0400 [Coroutines] Move the SuspendCrossingInfo analysis helper into its own header/source (#106306) * Move the SuspendCrossingInfo analysis helper into its own header/source See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057 Co-authored-by: tnowicki <tnowicki.nowicki@amd.com> commit 1651014960b90bd1398f61bec0866d4a187910ef Author: Rahul Joshi <rjoshi@nvidia.com> Date: Mon Sep 9 08:47:42 2024 -0700 [TableGen] Change SetTheory set/vec to use const Record * (#107692) Change SetTheory::RecSet/RecVec to use const Record pointers. commit e46f03bc31a61a903416f1d3c68063ab75aebe6e Author: Teresa Johnson <tejohnson@google.com> Date: Mon Sep 9 08:17:41 2024 -0700 [MemProf] Remove unnecessary data structure (NFC) (#107643) Recent change #106623 added the CallToFunc map, but I subsequently realized the same information is already available for the calls being examined in the StackIdToMatchingCalls map we're iterating through. commit 86e5c5468ae3fcd65b23fd7b3cb0182e676829bd Author: Nicolas van Kempen <nvankemp@gmail.com> Date: Mon Sep 9 11:15:28 2024 -0400 [clang-tidy][run-clang-tidy] Fix minor shutdown noise (#105724) On my new machine, the script outputs some shutdown noise: ``` Ctrl-C detected, goodbye. Traceback (most recent call last): File "/home/nvankempen/llvm-project/./clang-tools-extra/clang-tidy/tool/run-clang-tidy.py", line 626, in <module> asyncio.run(main()) File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete self.run_forever() File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() File "/usr/lib/python3.10/asyncio/base_events.py", line 1871, in _run_once event_list = self._selector.select(timeout) File "/usr/lib/python3.10/selectors.py", line 469, in select fd_event_list = self._selector.poll(timeout, max_ev) KeyboardInterrupt ``` This fixes it. Also remove an unused typing import. Relevant documentation: https://docs.python.org/3/library/asyncio-runner.html#handling-keyboard-interruption commit 763bc9249cf0b7da421182e24716d9a569fb5184 Author: Jakub Kuderski <jakub@nod-labs.com> Date: Mon Sep 9 11:12:26 2024 -0400 [mlir][amdgpu] Align Chipset with TargetParser (#107720) Update the Chipset struct to follow the `IsaVersion` definition from llvm's `TargetParser`. This is a follow up to https://github.com/llvm/llvm-project/pull/106169#discussion_r1733955012. * Add the stepping version. Note: This may break downstream code that compares against the minor version directly. * Use comparisons with full Chipset version where possible. Note that we can't use the code in `TargetParser` directly because the chipset utility is outside of `mlir/Target` that re-exports llvm's target library. commit 6cc3bf7d1d343f910b40cee24d4cda873a6ddd55 Author: Quinn Dawkins <quinn.dawkins@gmail.com> Date: Mon Sep 9 11:05:37 2024 -0400 [mlir][tensor] Add canonicalization to fold consecutive tensor.pad ops (#107302) `tensor.pad(tensor.pad)` with the same constant padding value can be combined into a single pad that pads to the sum of the high and low padding amounts. commit ea9204505cf1099b98b1fdcb898f0bd35e463984 Author: Lei Huang <lei@ca.ibm.com> Date: Mon Sep 9 11:01:22 2024 -0400 Fix codegen for transparent_union function params (#104816) Update codegen for func param with transparent_union attr to be that of the first union member. This is a followup to #101738 to fix non-ppc codegen and closes #76773. commit 6634d44e5e6079e19efe54c2de35e2e63108b085 Author: Amy Wang <kai.ting.wang@huawei.com> Date: Mon Sep 9 10:57:13 2024 -0400 [MLIR][Transform] Allow stateInitializer and stateExporter for applyTransforms (#101186) This is discussed in RFC: https://discourse.llvm.org/t/rfc-making-the-constructor-of-the-transformstate-class-protected/80377 commit 111932d5cae0199d9c59669b37232a011f8b8757 Author: Luke Lau <luke@igalia.com> Date: Mon Sep 9 22:45:44 2024 +0800 [RISCV] Fix same mask vmerge peephole discarding false operand (#107827) This fixes the issue raised in https://github.com/llvm/llvm-project/pull/106108#discussion_r1749677510 True's passthru needs to be equivalent to vmerge's false, but we also allow true's passthru to be undef. However if it's undef then we need to replace it with false, otherwise we end up discarding the false operand entirely. The changes in fixed-vectors-strided-load-store-asm.ll undo the changes in #106108 where we introduced this miscompile. commit 2d338bed00b2bba713bceb4915400063b95929b2 Author: Tobias Stadler <mail@stadler-tobias.de> Date: Mon Sep 9 16:30:44 2024 +0200 [CodeGen] Refactor DeadMIElim isDead and GISel isTriviallyDead (#105956) Merge GlobalISel's isTriviallyDead and DeadMachineInstructionElim's isDead code and remove all unnecessary checks from the hot path by looping over the operands before doing any other checks. See #105950 for why DeadMIElim needs to remove LIFETIME markers even though they probably shouldn't generally be considered dead. x86 CTMark O3: -0.1% AArch64 GlobalISel CTMark O0: -0.6%, O2: -0.2% commit a2f659c1349cb70c09b183eb214e2a24cf04c2c6 Author: Kazu Hirata <kazu@google.com> Date: Mon Sep 9 07:15:12 2024 -0700 [StructurizeCFG] Avoid repeated hash lookups (NFC) (#107797) commit ab95ed5ce0b099913eb5c9b03fef7f322c24acd2 Author: Kazu Hirata <kazu@google.com> Date: Mon Sep 9 07:14:40 2024 -0700 [IPO] Avoid repeated hash lookups (NFC) (#107796) commit 3940a1ba1454afec916be86385bb2031526e3e13 Author: Kazu Hirata <kazu@google.com> Date: Mon Sep 9 07:13:52 2024 -0700 [Float2Int] Avoid repeated hash lookups (NFC) (#107795) commit 563dc226fe17f7638d02a957d1b2870dfa968f01 Author: Kazu Hirata <kazu@google.com> Date: Mon Sep 9 07:13:27 2024 -0700 [Analysis] Avoid repeated hash lookups (NFC) (#107794) commit 620b8d994b8abdcf31271d9f4db7e7422fc9bd65 Author: Samuel Thibault <samuel.thibault@ens-lyon.org> Date: Mon Sep 9 15:53:33 2024 +0200 [hurd] Fix accessing f_type field of statvfs (#71851) f4719c4d2cda ("Add support for GNU Hurd in Path.inc and other places") made llvm use an internal __f_type name for the f_type field (which it is not supposed to since accessing double-underscore names is explicitly not supported by standards). In glibc 2.39 this field was renamed to f_type so application can now access the field as the standard says. commit eaac4a26136ca8e3633bf91795343cd060d7af87 Author: Pierre van Houtryve <pierre.vanhoutryve@amd.com> Date: Mon Sep 9 15:35:28 2024 +0200 [AMDGPU] Document & Finalize GFX12 Memory Model (#98599) Documents the memory model implemented as of #98591, with some fixes/optimizations to the implementation. commit 1a5a1e97817c9a3db4d1f9795789c99790cf88e2 Author: Florian Hahn <flo@fhahn.com> Date: Mon Sep 9 14:26:08 2024 +0100 [VPlan] Assert that VFxUF is always used. Add assertion to ensure invariant discussed in https://github.com/llvm/llvm-project/pull/95305. commit 1f2a634c44dedef11f590956f297b2c7a1659fcf Author: Sergey Kachkov <sergey.kachkov@syntacore.com> Date: Wed Sep 4 17:42:03 2024 +0300 Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice. Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are moved into separate duplicated-phis.ll test with explicit x86 target requirement to fix bots which are not building this target. commit 17f0c5dfaab8bc72e19cb68e73b0944e5ee27b88 Author: Sergey Kachkov <sergey.kachkov@syntacore.com> Date: Fri Aug 30 16:00:42 2024 +0300 [LSR][NFC] Add pre-commit test commit aa158bf40285925d3c019d9e697cd2c88421297a Author: Florian Hahn <flo@fhahn.com> Date: Mon Sep 9 14:10:12 2024 +0100 [LV] Update tests to replace some code with loop varying instructions. Update some tests with loop-invariant instructions, where hoisting them out of the loop changes the vectorization decision. This should preserve their original spirit when making further improvements. commit e25eb1433110d94d16fd69e5aca9bdf72259263d Author: Florian Hahn <flo@fhahn.com> Date: Mon Sep 9 13:05:54 2024 +0100 [ConstraintElim] Add tests for loops with chained header conditions. commit 1199e5b9ce5a001445463ba8da1f70fa4558fbcc Author: Nikita Popov <npopov@redhat.com> Date: Mon Sep 9 12:45:48 2024 +0200 [MemCpyOpt] Add more tests for memcpy passed to readonly arg (NFC) commit cf8fb4320f1be29c55909adf5ff8ad47e02b2dbe Author: Momchil Velikov <momchil.velikov@arm.com> Date: Mon Sep 9 13:34:41 2024 +0100 [AArch64] Implement NEON vamin/vamax intrinsics (#99041) This patch implements the intrinsics of the form floatNxM_t vamin[q]_fN(floatNxM_t vn, floatNxM_t vm); floatNxM_t vamax[q]_fN(floatNxM_t vn, floatNxM_t vm); as defined in https://github.com/ARM-software/acle/pull/324 --------- Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com> commit 32cef07885e112d05bc2b1c285f40e353d80e18f Author: Rahul Joshi <rjoshi@nvidia.com> Date: Mon Sep 9 05:27:38 2024 -0700 [LLDB][TableGen] Migrate lldb-tblgen to use const RecordKeeper (#107536) Migrate LLDB TableGen backend to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089 commit cca54e347ac34912cdfb9983533c61836db135e0 Author: Martin Storsjö <martin@martin.st> Date: Mon Sep 9 15:08:19 2024 +0300 Revert "Reapply "[Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer" (#97308)" This reverts commit 45c8766973bb3bb73dd8d996231e114dcf45df9f and 049512e39d96995cb373a76cf2d009a86eaf3aab. This change triggers failed asserts on inputs like this: struct a { } constexpr b; class c { public: c(a); }; class B { public: using d = int; struct e { enum { f } g; int h; c i; d j{}; }; }; B::e k{B::e::f, int(), b}; Compiled like this: clang -target x86_64-linux-gnu -c repro.cpp clang: ../../clang/lib/CodeGen/CGExpr.cpp:3105: clang::CodeGen::LValue clang::CodeGen::CodeGenFunction::EmitDeclRefLValue(const clang::DeclRefExpr*): Assertion `(ND->isUsed(false) || !isa<VarDecl>(ND) || E->isNonOdrUse() || !E->getLocation().isValid()) && "Should not use decl without marking it used!"' failed. commit 7a930ce327fdbc5c77b50ee6304645084100c037 Author: Jeremy Morse <jeremy.morse@sony.com> Date: Mon Sep 9 12:54:45 2024 +0100 [DWARF] Emit a minimal line-table for totally empty functions (#107267) In degenerate but legal inputs, we can have functions that have no source locations at all -- all the DebugLocs attached to instructions are empty. LLVM didn't produce any source location for the function; with this patch it will at least emit the function-scope source location. Demonstrated by empty-line-info.ll The XCOFF test modified has similar symptoms -- with this patch, the size of the ".dwline" section grows a bit, thus shifting some of the file internal offsets, which I've updated. commit 959d84044a70da08923fe221f999f4e406094ee9 Author: pvanhout <pierre.vanhoutryve@amd.com> Date: Mon Sep 9 13:50:48 2024 +0200 [AMDGPU] Remove unused SplitGraph::Node::getFullCost commit b8b8fbe19dea2825b801c4738ff78dbf26aae430 Author: Rahul Joshi <rjoshi@nvidia.com> Date: Mon Sep 9 04:18:55 2024 -0700 [NFC][TableGen] Migrate LLVM Attribute Emitter to const RecordKeeper (#107698) Migrate LLVM Attribute Emitter to const RecordKeeper. commit d84d9559bdc7aeb4ce14c251f6a3490c66db8d3a Author: Nicolas van Kempen <nvankemp@gmail.com> Date: Mon Sep 9 07:12:46 2024 -0400 [clang][analyzer] Fix #embed crash (#107764) Fix #107724. commit 09c00b6f0463f6936be5d2100f9d47c0077700f8 Author: Benjamin Kramer <benny.kra@googlemail.com> Date: Mon Sep 9 13:03:38 2024 +0200 [bazel] Add missing dependencies for 345cc47ba7a28811ae4ec7d113059ccb39c500a3 commit 049512e39d96995cb373a76cf2d009a86eaf3aab Author: yronglin <yronglin777@gmail.com> Date: Mon Sep 9 19:01:11 2024 +0800 [NFC][clang] Fix clang version in the test for the implementation of cwg1815 (#107838) This PR fix the clang version in https://github.com/llvm/llvm-project/pull/97308 . Signed-off-by: yronglin <yronglin777@gmail.com> commit 345cc47ba7a28811ae4ec7d113059ccb39c500a3 Author: Daniil Fukalov <dfukalov@gmail.com> Date: Mon Sep 9 12:44:03 2024 +0200 [NFC] Add explicit #include llvm-config.h where its macros are used, lldb part. (#107603) (this is lldb part) Without these explicit includes, removing other headers, who implicitly include llvm-config.h, may have non-trivial side effects. For example, `clangd` may report even `llvm-config.h` as "no used" in case it defines a macro, that is explicitly used with #ifdef. It is actually amplified with different build configs which use different set of macros. commit dbd81ba2e85c2f244f22c983d96a106eae65c06a Author: Mikhail Goncharov <goncharov.mikhail@gmail.com> Date: Mon Sep 9 11:47:47 2024 +0200 complete rename of __orc_rt namespace for 3e04ad428313dde40c779af6d675b162e150125e it's bizzare that none of the builbots were broken, only bazel build https://buildkite.com/llvm-project/upstream-bazel/builds/109623#0191d5d0-2b3e-4ee7-b8dd-1e2580977e9b commit 663e9cec9c96169aa4e72ab9b6bf08b2d6603093 Author: Artem Kroviakov <71938912+akroviakov@users.noreply.github.com> Date: Mon Sep 9 11:49:16 2024 +0200 [Func][GPU] Use SymbolUserOpInterface in func::ConstantOp (#107748) This PR enables `func::ConstantOp` creation and usage for device functions inside GPU modules. The current main returns error for referencing device functions via `func::ConstantOp`, because during the `ConstantOp` verification it only checks symbols in `ModuleOp` symbol table, which, of course, does not contain device functions that are defined in `GPUModuleOp`. This PR proposes a more general solution. Co-authored-by: Artem Kroviakov <artem.kroviakov@tum.de> commit aa21ce4a792c170074193c32e8ba8dd35e57c628 Author: Jonas Rickert <Jonas.Rickert@amd.com> Date: Mon Sep 9 11:48:13 2024 +0200 [mlir] Do not set lastToken in AsmParser's resetToken function and add a unit test for AsmParsers's locations (#105529) This changes the function `resetToken` to not update `lastToken`. The member `lastToken` is the last token that was consumed by the parser. Resetting the lexer position to a different position does not cause any token to be consumed, so `lastToken` should not be updated. Setting it to `curToken` can cause the scopeLoc.end location of `OperationDefinition `to be off-by-one, pointing to the first token after the operation. An example for an operation for which the scopeLoc.end location was wrong before is: ``` %0 = torch.vtensor.literal(dense_resource<__elided__> : tensor<768xbf16>) : !torch.vtensor<[768],bf16> ``` Here the scope end loc always pointed to the next token This also adds a test for the Locations of `OperationDefinitions`. Without the change to `resetToken` the test failes, with the scope end location for `llvm.mlir.undef` pointing to the `func.return` in the next line commit b98aa6fb1d5f5fa904ce6d789a8fa4a245a90ee6 Author: Simon Pilgrim <llvm-dev@redking.me.uk> Date: Mon Sep 9 10:29:04 2024 +0100 [X86] LowerABD - lower i8/i16 cases directly to CMOV(SUB(X,Y),SUB(Y,X)) pattern Better codegen (shorter dependency chain for better ILP) than via the TRUNC(ABS(SUB(EXT(LHS),EXT(RHS)))) expansion commit d57be195e37f9c11a26e8e3fe8da5ef62bb921af Author: Lukacma <Marian.Lukac@arm.com> Date: Mon Sep 9 10:28:01 2024 +0100 [AArch64] replace SVE intrinsics with no active lanes with zero (#107413) This patch extends https://github.com/llvm/llvm-project/pull/73964 and optimises SVE intrinsics into zero constants when predicate is zero. commit 476b1a661f6846537d232e9a3bc5a68c5f15efb3 Author: Jerry-Ge <jerry.ge@arm.com> Date: Mon Sep 9 02:26:39 2024 -0700 [TOSA] Update input name for Sin and Cos operators (#107606) Update the dialect input names from input to input1 for Sin/Cos for consistency. Signed-off-by: Jerry Ge <jerry.ge@arm.com> commit da11ede57d034767a6f5d5e211c06c1c3089d7fd Author: vabridgers <58314289+vabridgers@users.noreply.github.com> Date: Mon Sep 9 03:47:39 2024 -0500 [analyzer] Remove overzealous "No dispatcher registered" assertion (#107294) Random testing revealed it's possible to crash the analyzer with the command line invocation: clang -cc1 -analyze -analyzer-checker=nullability empty.c where the source file, empty.c is an empty source file. ``` clang: <root>/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp:56: void clang::ento::CheckerManager::finishedCheckerRegistration(): Assertion `Event.second.HasDispatcher && "No dispatcher registered for an event"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ Stack dump: 0. Program arguments: clang -cc1 -analyze -analyzer-checker=nullability nullability-nocrash.c ... clang::AnalyzerOptions&, clang::Preprocessor const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, llvm::ArrayRef<std::function<void (clang::ento::CheckerRegistry&)>>) ``` This commit removes the assertion which failed here, because it was logically incorrect: it required that if an Event is handled by some (enabled) checker, then there must be an **enabled** checker which can emit that kind of Event. It should be OK to disable the event-producing checkers but enable an event-consuming checker which has different responsibilities in addition to handling the events. Note that this assertion was in an `#ifndef NDEBUG` block, so this change does not impact the non-debug builds. Co-authored-by: Vince Bridgers <vince.a.bridgers@ericsson.com> commit 04742f34b343af87dda93edacbb06f6e98a1d80f Author: Nikita Popov <npopov@redhat.com> Date: Mon Sep 9 10:24:54 2024 +0200 [SCCP] Add test for nonnull argument inference (NFC) commit 3b1146e050657f40954e8e1f977837f884df2488 Author: Aiden Grossman <aidengrossman@google.com> Date: Mon Sep 9 01:27:22 2024 -0700 [llvm-exegesis] Use MCRegister instead of unsigned to hold registers (#107820) commit 74ad2540523ec78122ba5a32e35e0b65ee27b7b3 Author: Aiden Grossman <aidengrossman@google.com> Date: Mon Sep 9 08:10:11 2024 +0000 [Github][MLGO] Fix mlgo-utils path in new-prs-labeler This patch (hopefully) fixes the mlgo-utils path in new-prs-labeler so that it actually matches all files in that directory. Currently it is not catching the files as they are relatively deeply nested within the folder. commit 3e04ad428313dde40c779af6d675b162e150125e Author: Lang Hames <lhames@gmail.com> Date: Mon Sep 9 17:59:47 2024 +1000 [ORC-RT] Remove double underscore from the orc_rt namespace. We should use `orc_rt` as the public C++ API namespace for the ORC runtime and control symbol visibility to hide implementation details, rather than rely on the '__' prefix. commit d5f6f30664ed53ef27d949fad0ce3994ea9988dd Author: Aiden Grossman <aidengrossman@google.com> Date: Mon Sep 9 07:49:54 2024 +0000 [MLGO] Add spaces at the end of lines in multiline string This patch adds spaces at the end of lines in multiline strings in the extract_ir script. Without this patch, the warning/info messages will be printed without spaces between words when there is a line break in the source which looks/reads weird. commit 8549b324bc1f450f4477f46f18db67439dbf6d75 Author: Younan Zhang <zyn7109@gmail.com> Date: Mon Sep 9 15:09:43 2024 +0800 [Clang] Don't assert non-empty packs for FunctionParmPackExprs (#107561) `FunctionParmPackExpr`s are peculiar in that they have to be of unexpanded dependency while they don't introduce any unexpanded packs. So this patch rules them out in the non-empty pack assertion in `DiagnoseUnexpandedParameterPack()`. There was a fix #69224, but that turned out to be insufficient. I also moved the separate tests to a pre-existing file. Fixes https://github.com/llvm/llvm-project/issues/86361 commit 022b3c27e27832f27c61683095899227c26e0cca Author: Piyou Chen <piyou.chen@sifive.com> Date: Mon Sep 9 15:07:39 2024 +0800 [Clang][RISCV] Recognize unsupport target feature by supporting isValidFeatureName (#106495) This patch makes unsupported target attributes emit a warning and ignore the target attribute during semantic checks. The changes include: 1. Adding the RISCVTargetInfo::isValidFeatureName function. 2. Rejecting non-full-arch strings in the handleFullArchString function. 3. Adding test cases to demonstrate the warning behavior. commit 9347b66cfcd9acf84dbbd500b6344041c587f6a9 Author: Pierre van Houtryve <pierre.vanhoutryve@amd.com> Date: Mon Sep 9 09:06:34 2024 +0200 Reland "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)" (#107076) Relands #104763 with - Fixes for EXPENSIVE_CHECKS test failure (due to sorting operator failing if the input is shuffled first) - Fix for broken proposal selection - c3cb27370af40e491446164840766478d3258429 included Original commit description below --- Major rewrite of the AMDGPUSplitModule pass in order to better support it long-term. Highlights: - Removal of the "SML" logging system in favor of just using CL options and LLVM_DEBUG, like any other pass in LLVM. - The SML system started from good intentions, but it was too flawed and messy to be of any real use. It was also a real pain to use and made the code more annoying to maintain. - Graph-based module representation with DOTGraph printing support - The graph represents the module accurately, with bidirectional, typed edges between nodes (a node usually represents one function). - Nodes are assigned IDs starting from 0, which allows us to represent a set of nodes as a BitVector. This makes comparing 2 sets of nodes to find common dependencies a trivial task. Merging two clusters of nodes together is also really trivial. - No more defaulting to "P0" for external calls - Roots that can reach non-copyable dependencies (such as external calls) are now grouped together in a single "cluster" that can go into any partition. - No more defaulting to "P0" for indirect calls - New representation for module splitting proposals that can be graded and compared. - Graph-search algorithm that can explore multiple branches/assignments for a cluster of functions, up to a maximum depth. - With the default max depth of 8, we can create up to 256 propositions to try and find the best one. - We can still fall back to a greedy approach upon reaching max depth. That greedy approach uses almost identical heuristics to the previous version of the pass. All of this gives us a lot of room to experiment with new heuristics or even entirely different splitting strategies if we need to. For instance, the graph representation has room for abstract nodes, e.g. if we need to represent some global variables or external constraints. We could also introduce more edge types to model other type of relations between nodes, etc. I also designed the graph representation & the splitting strategies to be as fast as possible, and it seems to have paid off. Some quick tests showed that we spend pretty much all of our time in the CloneModule function, with the actual splitting logic being >1% of the runtime. commit bdcbfa7fb4ac6f23262095c401d28309d689225e Author: LLVM GN Syncbot <llvmgnsyncbot@gmail.com> Date: Mon Sep 9 06:28:13 2024 +0000 [gn build] Port a416267a5f3f commit a416267a5f3fffb3d1e9d8d53245aef8169c5ddb Author: Yuxuan Chen <ych@fb.com> Date: Sun Sep 8 23:09:40 2024 -0700 [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (#99285) This patch is episode three of the middle end implementation for the coroutine HALO improvement project published on discourse: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 After we attribute the calls to some coroutines as "coro_elide_safe" in the C++ FE and creating a `noalloc` ramp function, we use a new middle end pass to move the call to coroutines to the noalloc variant. This pass should be run after CoroSplit. For each node we process in CoroSplit, we look for its callers and replace the attributed ones in presplit coroutines to the noalloc one. The transformed `noalloc` ramp function will also require a frame pointer to a block of memory it can use as an activation frame. We allocate this on the caller's frame with an alloca. Please note that we cannot safely transform such attributed calls in post-split coroutines due to memory lifetime reasons. The CoroSplit pass is responsible for creating the coroutine frame spills for all the allocas in the coroutine. Therefore it will be unsafe to create new allocas like this one in post-split coroutines. This happens relatively rarely because CGSCC performs the passes on the callees before the caller. However, if multiple coroutines coexist in one SCC, this situation does happen (and prevents us from having potentially unbound frame size due to recursion.) You can find episode 1: Clang FE of this patch series at https://github.com/llvm/llvm-project/pull/99282 Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283 commit 234cc81625030e934651d6ae0ace66e37138ba4a Author: Yuxuan Chen <ych@fb.com> Date: Sun Sep 8 23:09:20 2024 -0700 [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (#99283) This patch is episode two of the coroutine HALO improvement project published on discourse: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 Previously CoroElide depends on inlining, and its analysis does not work very well with code generated by the C++ frontend due the existence of many customization points. There has been issue reported to upstream how ineffective the original CoroElide was in real world applications. For C++ users, this set of patches aim to fix this problem by providing library authors and users deterministic HALO behaviour for some well-behaved coroutine `Task` types. The stack begins with a library side attribute on the `Task` class that guarantees no unstructured concurrency when coroutines are awaited directly with `co_await`ed as a prvalue. This attribute on Task types gives us lifetime guarantees and makes C++ FE capable to telling the ME which coroutine calls are elidable. We convey such information from FE through the attribute `coro_elide_safe`. This patch modifies CoroSplit to create a variant of the coroutine ramp function that 1) does not use heap allocated frame, instead take an additional parameter as the pointer to the frame. Such parameter is attributed with `dereferenceble` and `align` to convey size and align requirements for the frame. 2) always stores cleanup instead of destroy address for `coro.destroy()` actions. In a later patch, we will have a new pass that runs right after CoroSplit to find usages of the callee coroutine attributed `coro_elide_safe` in presplit coroutine callers, allocates the frame on its "stack", transform those usages to call the `noalloc` ramp function variant. (note I put quotes on the word "stack" here, because for presplit coroutine, any alloca will be spilled into the frame when it's being split) The C++ Frontend attribute implementation that works with this change can be found at https://github.com/llvm/llvm-project/pull/99282 The pass that makes use of the new `noalloc` split can be found at https://github.com/llvm/llvm-project/pull/99285 commit e17a39bc314f97231e440c9e68d9f46a9c07af6d Author: Yuxuan Chen <ych@fb.com> Date: Sun Sep 8 23:08:58 2024 -0700 [Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] (#99282) This patch is the frontend implementation of the coroutine elide improvement project detailed in this discourse post: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 This patch proposes a C++ struct/class attribute `[[clang::coro_await_elidable]]`. This notion of await elidable task gives developers and library authors a certainty that coroutine heap elision happens in a predictable way. Originally, after we lower a coroutine to LLVM IR, CoroElide is responsible for analysis of whether an elision can happen. Take this as an example: ``` Task foo(); Task bar() { co_await foo(); } ``` For CoroElide to happen, the ramp function of `foo` must be inlined into `bar`. This inlining happens after `foo` has been split but `bar` is usually still a presplit coroutine. If `foo` is indeed a coroutine, the inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide then runs an analysis to figure out whether the SSA value of `coro.begin()` of `foo` gets destroyed before `bar` terminates. `Task` types are rarely simple enough for the destroy logic of the task to reference the SSA value from `coro.begin()` directly. Hence, the pass is very ineffective for even the most trivial C++ Task types. Improving CoroElide by implementing more powerful analyses is possible, however it doesn't give us the…

lukel97 mentioned this pull request Aug 26, 2024

[RISCV] Preserve tail agnostic policy in foldVMV_V_V #105788

Merged

lukel97 added 2 commits September 5, 2024 23:56

Precommit tests

b45b366

lukel97 force-pushed the vector-peephole-vmerge-to-vmv.v.v-same-mask branch from b11c15a to 34c2cf0 Compare September 5, 2024 16:30

lukel97 changed the title ~~[RISCV] Convert vmerge.vvm with same mask as true to vmv.v.v~~ [RISCV] Move vmerge same mask peephole to RISCVVectorPeephole Sep 5, 2024

lukel97 requested review from preames, wangpc-pp and topperc September 5, 2024 16:31

lukel97 marked this pull request as ready for review September 5, 2024 16:32

llvmbot added the backend:RISC-V label Sep 5, 2024

topperc reviewed Sep 5, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp Outdated Show resolved Hide resolved

lukel97 added 2 commits September 6, 2024 07:50

Remove dead break

f1b20d5

Assert v0defs non null

22684ed

topperc approved these changes Sep 6, 2024

View reviewed changes

lukel97 merged commit 2949720 into llvm:main Sep 6, 2024
6 of 7 checks passed

lukel97 mentioned this pull request Sep 6, 2024

[RISCV] Move performCombineVMergeAndVOps into RISCVFoldMasks #71764

Closed

rofirrim reviewed Sep 9, 2024

View reviewed changes

lukel97 mentioned this pull request Sep 9, 2024

[RISCV] Fix same mask vmerge peephole discarding false operand #107827

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Move vmerge same mask peephole to RISCVVectorPeephole #106108

[RISCV] Move vmerge same mask peephole to RISCVVectorPeephole #106108

lukel97 commented Aug 26, 2024 •

edited

Loading

llvmbot commented Sep 5, 2024

topperc left a comment

llvm-ci commented Sep 6, 2024

rofirrim Sep 9, 2024

lukel97 Sep 9, 2024

lukel97 Sep 9, 2024 •

edited

Loading

[RISCV] Move vmerge same mask peephole to RISCVVectorPeephole #106108

[RISCV] Move vmerge same mask peephole to RISCVVectorPeephole #106108

Conversation

lukel97 commented Aug 26, 2024 • edited Loading

llvmbot commented Sep 5, 2024

topperc left a comment

Choose a reason for hiding this comment

llvm-ci commented Sep 6, 2024

rofirrim Sep 9, 2024

Choose a reason for hiding this comment

lukel97 Sep 9, 2024

Choose a reason for hiding this comment

lukel97 Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

lukel97 commented Aug 26, 2024 •

edited

Loading

lukel97 Sep 9, 2024 •

edited

Loading