Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Move vmerge same mask peephole to RISCVVectorPeephole #106108

Merged

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Aug 26, 2024

We currently fold a vmerge.vvm into its true operand if the true operand is a masked pseudo with the same mask.

We can move this over to RISCVVectorPeephole by instead splitting it up into a smaller peephole which converts it to a vmv.v.v first. The existing foldVMV_V_V peephole will then take care of folding it if needed.

This is very similar to the existing all-ones mask peephole and we could potentially do it inside of it. I opted to put it in a separate peephole to make it easier to reason about, given that the duplication is small, but I could be persuaded either way.

We currently fold a vmerge.vvm into its true operand if the true operand is a masked pseudo with the same mask.

We can move this over to RISCVVectorPeephole by instead splitting it up into a smaller peephole which converts it to a vmv.v.v first. The existing foldVMV_V_V peephole will then take care of folding it if needed.

This is very similar to the existing all-ones mask peephole and we could potentially do it inside of it. I opted to put it in a separate peephole to make it easier to reason about, given that the duplication is small, but I could be persuaded either way.
@lukel97 lukel97 force-pushed the vector-peephole-vmerge-to-vmv.v.v-same-mask branch from b11c15a to 34c2cf0 Compare September 5, 2024 16:30
@lukel97 lukel97 changed the title [RISCV] Convert vmerge.vvm with same mask as true to vmv.v.v [RISCV] Move vmerge same mask peephole to RISCVVectorPeephole Sep 5, 2024
@lukel97 lukel97 marked this pull request as ready for review September 5, 2024 16:32
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 5, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

We currently fold a vmerge.vvm into its true operand if the true operand is a masked pseudo with the same mask.

We can move this over to RISCVVectorPeephole by instead splitting it up into a smaller peephole which converts it to a vmv.v.v first. The existing foldVMV_V_V peephole will then take care of folding it if needed.

This is very similar to the existing all-ones mask peephole and we could potentially do it inside of it. I opted to put it in a separate peephole to make it easier to reason about, given that the duplication is small, but I could be persuaded either way.


Full diff: https://github.com/llvm/llvm-project/pull/106108.diff

4 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp (+5-37)
  • (modified) llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp (+63-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+10-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir (+70)
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 4580f3191d1389..ff4c0e9bbd50e7 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3833,15 +3833,8 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
   uint64_t TrueTSFlags = TrueMCID.TSFlags;
   bool HasTiedDest = RISCVII::isFirstDefTiedToFirstUse(TrueMCID);
 
-  bool IsMasked = false;
   const RISCV::RISCVMaskedPseudoInfo *Info =
       RISCV::lookupMaskedIntrinsicByUnmasked(TrueOpc);
-  if (!Info && HasTiedDest) {
-    Info = RISCV::getMaskedPseudoInfo(TrueOpc);
-    IsMasked = true;
-  }
-  assert(!(IsMasked && !HasTiedDest) && "Expected tied dest");
-
   if (!Info)
     return false;
 
@@ -3853,19 +3846,6 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
       return false;
   }
 
-  // If True is masked then the vmerge must have either the same mask or an all
-  // 1s mask, since we're going to keep the mask from True.
-  if (IsMasked) {
-    // FIXME: Support mask agnostic True instruction which would have an
-    // undef passthru operand.
-    SDValue TrueMask =
-        getMaskSetter(True->getOperand(Info->MaskOpIdx),
-                      True->getOperand(True->getNumOperands() - 1));
-    assert(TrueMask);
-    if (!usesAllOnesMask(Mask, Glue) && getMaskSetter(Mask, Glue) != TrueMask)
-      return false;
-  }
-
   // Skip if True has side effect.
   if (TII->get(TrueOpc).hasUnmodeledSideEffects())
     return false;
@@ -3930,24 +3910,13 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
       (Mask && !usesAllOnesMask(Mask, Glue)))
     return false;
 
-  // If we end up changing the VL or mask of True, then we need to make sure it
-  // doesn't raise any observable fp exceptions, since changing the active
-  // elements will affect how fflags is set.
-  if (TrueVL != VL || !IsMasked)
-    if (mayRaiseFPException(True.getNode()) &&
-        !True->getFlags().hasNoFPExcept())
-      return false;
+  // Make sure it doesn't raise any observable fp exceptions, since changing the
+  // active elements will affect how fflags is set.
+  if (mayRaiseFPException(True.getNode()) && !True->getFlags().hasNoFPExcept())
+    return false;
 
   SDLoc DL(N);
 
-  // From the preconditions we checked above, we know the mask and thus glue
-  // for the result node will be taken from True.
-  if (IsMasked) {
-    Mask = True->getOperand(Info->MaskOpIdx);
-    Glue = True->getOperand(True->getNumOperands() - 1);
-    assert(Glue.getValueType() == MVT::Glue);
-  }
-
   unsigned MaskedOpc = Info->MaskedPseudo;
 #ifndef NDEBUG
   const MCInstrDesc &MaskedMCID = TII->get(MaskedOpc);
@@ -3977,8 +3946,7 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
   Ops.push_back(False);
 
   const bool HasRoundingMode = RISCVII::hasRoundModeOp(TrueTSFlags);
-  const unsigned NormalOpsEnd = TrueVLIndex - IsMasked - HasRoundingMode;
-  assert(!IsMasked || NormalOpsEnd == Info->MaskOpIdx);
+  const unsigned NormalOpsEnd = TrueVLIndex - HasRoundingMode;
   Ops.append(True->op_begin() + HasTiedDest, True->op_begin() + NormalOpsEnd);
 
   Ops.push_back(Mask);
diff --git a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
index a612a03106f024..790a206f39e74c 100644
--- a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
@@ -65,7 +65,8 @@ class RISCVVectorPeephole : public MachineFunctionPass {
   bool convertToVLMAX(MachineInstr &MI) const;
   bool convertToWholeRegister(MachineInstr &MI) const;
   bool convertToUnmasked(MachineInstr &MI) const;
-  bool convertVMergeToVMv(MachineInstr &MI) const;
+  bool convertAllOnesVMergeToVMv(MachineInstr &MI) const;
+  bool convertSameMaskVMergeToVMv(MachineInstr &MI) const;
   bool foldUndefPassthruVMV_V_V(MachineInstr &MI);
   bool foldVMV_V_V(MachineInstr &MI);
 
@@ -342,17 +343,14 @@ bool RISCVVectorPeephole::convertToWholeRegister(MachineInstr &MI) const {
   return true;
 }
 
-// Transform (VMERGE_VVM_<LMUL> pt, false, true, allones, vl, sew) to
-// (VMV_V_V_<LMUL> pt, true, vl, sew). It may decrease uses of VMSET.
-bool RISCVVectorPeephole::convertVMergeToVMv(MachineInstr &MI) const {
+static unsigned getVMV_V_VOpcodeForVMERGE_VVM(const MachineInstr &MI) {
 #define CASE_VMERGE_TO_VMV(lmul)                                               \
   case RISCV::PseudoVMERGE_VVM_##lmul:                                         \
-    NewOpc = RISCV::PseudoVMV_V_V_##lmul;                                      \
+    return RISCV::PseudoVMV_V_V_##lmul;                                        \
     break;
-  unsigned NewOpc;
   switch (MI.getOpcode()) {
   default:
-    return false;
+    return 0;
     CASE_VMERGE_TO_VMV(MF8)
     CASE_VMERGE_TO_VMV(MF4)
     CASE_VMERGE_TO_VMV(MF2)
@@ -361,14 +359,68 @@ bool RISCVVectorPeephole::convertVMergeToVMv(MachineInstr &MI) const {
     CASE_VMERGE_TO_VMV(M4)
     CASE_VMERGE_TO_VMV(M8)
   }
+}
 
+/// Convert a PseudoVMERGE_VVM with an all ones mask to a PseudoVMV_V_V.
+///
+/// %x = PseudoVMERGE_VVM %passthru, %false, %true, %allones, sew, vl
+/// ->
+/// %x = PseudoVMV_V_V %passthru, %true, vl, sew, tu_mu
+bool RISCVVectorPeephole::convertAllOnesVMergeToVMv(MachineInstr &MI) const {
+  unsigned NewOpc = getVMV_V_VOpcodeForVMERGE_VVM(MI);
+  if (!NewOpc)
+    return false;
   assert(MI.getOperand(4).isReg() && MI.getOperand(4).getReg() == RISCV::V0);
   if (!isAllOnesMask(V0Defs.lookup(&MI)))
     return false;
 
   MI.setDesc(TII->get(NewOpc));
-  MI.removeOperand(2);  // False operand
-  MI.removeOperand(3);  // Mask operand
+  MI.removeOperand(2); // False operand
+  MI.removeOperand(3); // Mask operand
+  MI.addOperand(
+      MachineOperand::CreateImm(RISCVII::TAIL_UNDISTURBED_MASK_UNDISTURBED));
+
+  // vmv.v.v doesn't have a mask operand, so we may be able to inflate the
+  // register class for the destination and passthru operands e.g. VRNoV0 -> VR
+  MRI->recomputeRegClass(MI.getOperand(0).getReg());
+  if (MI.getOperand(1).getReg() != RISCV::NoRegister)
+    MRI->recomputeRegClass(MI.getOperand(1).getReg());
+  return true;
+}
+
+/// If a PseudoVMERGE_VVM's true operand is a masked pseudo and both have the
+/// same mask, and the masked pseudo's passthru is the same as the false
+/// operand, we can convert the PseudoVMERGE_VVM to a PseudoVMV_V_V.
+///
+/// %true = PseudoVADD_VV_M1_MASK %false, %x, %y, %mask, vl1, sew, policy
+/// %x = PseudoVMERGE_VVM %passthru, %false, %true, %mask, vl2, sew
+/// ->
+/// %true = PseudoVADD_VV_M1_MASK %false, %x, %y, %mask, vl1, sew, policy
+/// %x = PseudoVMV_V_V %passthru, %true, vl2, sew, tu_mu
+bool RISCVVectorPeephole::convertSameMaskVMergeToVMv(MachineInstr &MI) const {
+  unsigned NewOpc = getVMV_V_VOpcodeForVMERGE_VVM(MI);
+  if (!NewOpc)
+    return false;
+  MachineInstr *True = MRI->getVRegDef(MI.getOperand(3).getReg());
+  if (!True || !RISCV::getMaskedPseudoInfo(True->getOpcode()) ||
+      !hasSameEEW(MI, *True))
+    return false;
+
+  // True's passthru needs to be equivalent to False
+  Register TruePassthruReg = True->getOperand(1).getReg();
+  Register FalseReg = MI.getOperand(2).getReg();
+  if (TruePassthruReg != RISCV::NoRegister && TruePassthruReg != FalseReg)
+    return false;
+
+  const MachineInstr *TrueV0Def = V0Defs.lookup(True);
+  const MachineInstr *MIV0Def = V0Defs.lookup(&MI);
+  assert(TrueV0Def->isCopy() && MIV0Def->isCopy());
+  if (TrueV0Def->getOperand(1).getReg() != MIV0Def->getOperand(1).getReg())
+    return false;
+
+  MI.setDesc(TII->get(NewOpc));
+  MI.removeOperand(2); // False operand
+  MI.removeOperand(3); // Mask operand
   MI.addOperand(
       MachineOperand::CreateImm(RISCVII::TAIL_UNDISTURBED_MASK_UNDISTURBED));
 
@@ -622,7 +674,8 @@ bool RISCVVectorPeephole::runOnMachineFunction(MachineFunction &MF) {
       Changed |= tryToReduceVL(MI);
       Changed |= convertToUnmasked(MI);
       Changed |= convertToWholeRegister(MI);
-      Changed |= convertVMergeToVMv(MI);
+      Changed |= convertAllOnesVMergeToVMv(MI);
+      Changed |= convertSameMaskVMergeToVMv(MI);
       if (foldUndefPassthruVMV_V_V(MI)) {
         Changed |= true;
         continue; // MI is erased
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
index e57b6a22dd6eab..569ada7949b1b5 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
@@ -62,12 +62,11 @@ define void @gather_masked(ptr noalias nocapture %A, ptr noalias nocapture reado
 ; CHECK-NEXT:    li a4, 5
 ; CHECK-NEXT:  .LBB1_1: # %vector.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vmv1r.v v9, v8
-; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, mu
-; CHECK-NEXT:    vlse8.v v9, (a1), a4, v0.t
-; CHECK-NEXT:    vle8.v v10, (a0)
-; CHECK-NEXT:    vadd.vv v9, v10, v9
-; CHECK-NEXT:    vse8.v v9, (a0)
+; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, ma
+; CHECK-NEXT:    vlse8.v v8, (a1), a4, v0.t
+; CHECK-NEXT:    vle8.v v9, (a0)
+; CHECK-NEXT:    vadd.vv v8, v9, v8
+; CHECK-NEXT:    vse8.v v8, (a0)
 ; CHECK-NEXT:    addi a0, a0, 32
 ; CHECK-NEXT:    addi a1, a1, 160
 ; CHECK-NEXT:    bne a0, a2, .LBB1_1
@@ -344,12 +343,11 @@ define void @scatter_masked(ptr noalias nocapture %A, ptr noalias nocapture read
 ; CHECK-NEXT:    li a4, 5
 ; CHECK-NEXT:  .LBB7_1: # %vector.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, mu
-; CHECK-NEXT:    vle8.v v9, (a1)
-; CHECK-NEXT:    vmv1r.v v10, v8
-; CHECK-NEXT:    vlse8.v v10, (a0), a4, v0.t
-; CHECK-NEXT:    vadd.vv v9, v10, v9
-; CHECK-NEXT:    vsse8.v v9, (a0), a4, v0.t
+; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, ma
+; CHECK-NEXT:    vle8.v v8, (a1)
+; CHECK-NEXT:    vlse8.v v9, (a0), a4, v0.t
+; CHECK-NEXT:    vadd.vv v8, v9, v8
+; CHECK-NEXT:    vsse8.v v8, (a0), a4, v0.t
 ; CHECK-NEXT:    addi a1, a1, 32
 ; CHECK-NEXT:    addi a0, a0, 160
 ; CHECK-NEXT:    bne a1, a2, .LBB7_1
diff --git a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir
index 19a918148e6eb8..875d4229bbc6e1 100644
--- a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir
@@ -68,3 +68,73 @@ body: |
     $v0 = COPY %mask
     %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, %avl, 5
 ...
+---
+name: same_mask
+body: |
+  bb.0:
+    liveins: $v8, $v9, $v0
+    ; CHECK-LABEL: name: same_mask
+    ; CHECK: liveins: $v8, $v9, $v0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %pt:vr = COPY $v8
+    ; CHECK-NEXT: %false:vrnov0 = COPY $v9
+    ; CHECK-NEXT: %mask:vr = COPY $v0
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %true:vrnov0 = PseudoVADD_VV_M1_MASK %false, $noreg, $noreg, $v0, 4, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %x:vr = PseudoVMV_V_V_M1 %pt, %true, 8, 5 /* e32 */, 0 /* tu, mu */
+    %pt:vrnov0 = COPY $v8
+    %false:vrnov0 = COPY $v9
+    %mask:vr = COPY $v0
+    $v0 = COPY %mask
+    %true:vrnov0 = PseudoVADD_VV_M1_MASK %false, $noreg, $noreg, $v0, 4, 5 /* e32 */, 0 /* tu, mu */
+    $v0 = COPY %mask
+    %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */
+...
+---
+# Shouldn't be converted because false operands are different
+name: same_mask_different_false
+body: |
+  bb.0:
+    liveins: $v8, $v9, $v0
+    ; CHECK-LABEL: name: same_mask_different_false
+    ; CHECK: liveins: $v8, $v9, $v0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %pt:vrnov0 = COPY $v8
+    ; CHECK-NEXT: %false:vrnov0 = COPY $v9
+    ; CHECK-NEXT: %mask:vr = COPY $v0
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %true:vrnov0 = PseudoVADD_VV_M1_MASK %pt, $noreg, $noreg, $v0, 4, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */
+    %pt:vrnov0 = COPY $v8
+    %false:vrnov0 = COPY $v9
+    %mask:vr = COPY $v0
+    $v0 = COPY %mask
+    %true:vrnov0 = PseudoVADD_VV_M1_MASK %pt, $noreg, $noreg, $v0, 4, 5 /* e32 */, 0 /* tu, mu */
+    $v0 = COPY %mask
+    %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */
+...
+---
+# Shouldn't be converted because EEWs are different
+name: same_mask_different_eew
+body: |
+  bb.0:
+    liveins: $v8, $v9, $v0
+    ; CHECK-LABEL: name: same_mask_different_eew
+    ; CHECK: liveins: $v8, $v9, $v0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %pt:vrnov0 = COPY $v8
+    ; CHECK-NEXT: %false:vrnov0 = COPY $v9
+    ; CHECK-NEXT: %mask:vr = COPY $v0
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %true:vrnov0 = PseudoVADD_VV_M1_MASK %false, $noreg, $noreg, $v0, 4, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */
+    %pt:vrnov0 = COPY $v8
+    %false:vrnov0 = COPY $v9
+    %mask:vr = COPY $v0
+    $v0 = COPY %mask
+    %true:vrnov0 = PseudoVADD_VV_M1_MASK %false, $noreg, $noreg, $v0, 4, 4 /* e16 */, 0 /* tu, mu */
+    $v0 = COPY %mask
+    %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lukel97 lukel97 merged commit 2949720 into llvm:main Sep 6, 2024
6 of 7 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 6, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 10 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/5046

Here is the relevant piece of the build log for the reference
Step 10 (Add check check-offload) failure: 1200 seconds without output running [b'ninja', b'-j 32', b'check-offload'], attempting to kill
...
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/test_libc.cpp (869 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49779.cpp (870 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug47654.cpp (871 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug50022.cpp (872 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/wtime.c (873 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/bug49021.cpp (874 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/std_complex_arithmetic.cpp (875 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/complex_reduction.cpp (876 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp (877 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49021.cpp (878 of 879)
command timed out: 1200 seconds without output running [b'ninja', b'-j 32', b'check-offload'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1235.780149

assert(TrueV0Def && TrueV0Def->isCopy() && MIV0Def && MIV0Def->isCopy());
if (TrueV0Def->getOperand(1).getReg() != MIV0Def->getOperand(1).getReg())
return false;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lukel97, shouldn't we check around here that the mask for the false operand also matches?

I'm seeing this case

$v0 = COPY %15:vr
%20:vrnov0 = PseudoVRSUB_VI_MF8_MASK $noreg(tied-def 0), killed %19:vrnov0, -2, $v0, %12:gprnox0, 3, 1
…
$v0 = COPY %14:vr ; <---- Different mask!
%26:vrnov0 = PseudoVOR_VI_MF8_MASK $noreg(tied-def 0), killed %25:vrnov0, 1, $v0, %12:gprnox0, 3, 1
$v0 = COPY %15:vr
%27:vrnov0 = PseudoVMERGE_VVM_MF8 $noreg(tied-def 0), killed %26:vrnov0, killed %20:vrnov0, $v0, %12:gprnox0, 3

being turned into

$v0 = COPY %15:vr
%27:vr = PseudoVMV_V_V_MF8 $noreg(tied-def 0), killed %20:vrnov0, %12:gprnox0, 3, 0

which I don't think is equivalent.

I can open an issue with a reproducer if that helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, I think the False == TruePassthru check needs to be the other way round, i.e we check that False is NoRegister, not TruePassthru. The MIR should be enough to recreate a test case, I'll take a look into this now

Copy link
Contributor Author

@lukel97 lukel97 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #107827 to fix it, I believe the underlying issue was that we were discarding the false operand when the true operand's passthru was undef.

In your example above, in theory I think we can still replace the VMERGE_VVM with VMV_V_V as long as the VRSUB_VI's passthru becomes the false operand. In practice though we would need to move the VRSUB_VI down to access %26, but we can't move past a copy to $v0, so the peephole should just bail instead after the patch.

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Sep 9, 2024
This fixes the issue raised in llvm#106108 (comment)

True's passthru needs to be equivalent to vmerge's false, but we also allow true's passthru to be undef.

However if it's undef then we need to replace it with vmerge's false, otherwise we end up discarding the false operand entirely.

The changes in fixed-vectors-strided-load-store-asm.ll undo the changes in llvm#106108 where we introduced this miscompile.
lukel97 added a commit that referenced this pull request Sep 9, 2024
This fixes the issue raised in
#106108 (comment)

True's passthru needs to be equivalent to vmerge's false, but we also
allow true's passthru to be undef.

However if it's undef then we need to replace it with false, otherwise
we end up discarding the false operand entirely.

The changes in fixed-vectors-strided-load-store-asm.ll undo the changes
in #106108 where we introduced this miscompile.
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
…06108)

We currently fold a vmerge.vvm into its true operand if the true operand
is a masked pseudo with the same mask.

We can move this over to RISCVVectorPeephole by instead splitting it up
into a smaller peephole which converts it to a vmv.v.v first. The
existing foldVMV_V_V peephole will then take care of folding it if
needed.

This is very similar to the existing all-ones mask peephole and we could
potentially do it inside of it. I opted to put it in a separate peephole
to make it easier to reason about, given that the duplication is small,
but I could be persuaded either way.
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
…107827)

This fixes the issue raised in
llvm#106108 (comment)

True's passthru needs to be equivalent to vmerge's false, but we also
allow true's passthru to be undef.

However if it's undef then we need to replace it with false, otherwise
we end up discarding the false operand entirely.

The changes in fixed-vectors-strided-load-store-asm.ll undo the changes
in llvm#106108 where we introduced this miscompile.
MichelleCDjunaidi added a commit to MichelleCDjunaidi/llvm-project that referenced this pull request Oct 25, 2024
commit 56905dab7da50bccfcceaeb496b206ff476127e1
Author: JinjinLi868 <lijinjin.868@bytedance.com>
Date:   Tue Sep 10 10:47:33 2024 +0800

    [clang] fix half && bfloat16 convert node expr codegen (#89051)

    Data type conversion between fp16 and bf16 will generate fptrunc and
    fpextend nodes, but they are actually bitcast nodes.

commit ffcff4af59712792712b33648f8ea148b299c364
Author: Yingwei Zheng <dtcxzyw2333@gmail.com>
Date:   Tue Sep 10 10:38:21 2024 +0800

    [ValueTracking] Infer is-power-of-2 from assumptions. (#107745)

    This patch tries to infer is-power-of-2 from assumptions. I don't see
    that this kind of assumption exists in my dataset.
    Related issue: https://github.com/rust-lang/rust/issues/129795

    Close https://github.com/llvm/llvm-project/issues/58996.

commit eb0e4b1415800e34b86319ce1d57ad074d5ca202
Author: Petr Hosek <phosek@google.com>
Date:   Mon Sep 9 19:21:59 2024 -0700

    [Fuzzer] Passthrough zlib CMake paths into the test (#107926)

    We shouldn't assume that we're using system zlib installation.

commit 761bf333e378b52614cf36cd5db2837d5e4e0ae4
Author: Yuxuan Chen <ych@fb.com>
Date:   Mon Sep 9 18:57:39 2024 -0700

    [LLVM][Coroutines] Switch CoroAnnotationElidePass to a FunctionPass (#107897)

    After landing https://github.com/llvm/llvm-project/pull/99285 we found
    that the call graph update was causing the following crash when
    expensive checks are turned on
    ```
    llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC || RC->isAncestorOf(Targe
    tRC)) && "New call edge is not trivial!"' failed.
    ```
    I have to admit I believe that the call graph update process I did for
    that patch could be wrong.

    After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced
    that `CoroAnnotationElidePass` can be a FunctionPass and rely on the
    adaptor to update the call graph for us, so long as we properly
    invalidate the caller's analyses.

    After this patch,
    `llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer
    fails under expensive checks.

commit 7a8e9dfe5cc6f049f918e528ef476d9e7aada8a5
Author: Jordan Rupprecht <rupprecht@google.com>
Date:   Mon Sep 9 20:34:43 2024 -0500

    [bazel][libc][NFC] Add missing layering deps (#107947)

    After 277371943fa48f2550df02870951f5e5a77efef5

    e.g.

    ```
    external/llvm-project/libc/test/src/math/smoke/NextTowardTest.h:12:10: error: module llvm-project//libc/test/src/math/smoke:nexttowardf_test does not depend on a module exporting 'src/__support/CPP/bit.h'
    ```

commit 1ca411ca451e0e86caf9207779616f32ed9fd908
Author: wanglei <wanglei@loongson.cn>
Date:   Tue Sep 10 09:28:15 2024 +0800

    [LoongArch] Codegen for concat_vectors with LASX

    Fixes: #107355

    Reviewed By: SixWeining

    Pull Request: https://github.com/llvm/llvm-project/pull/107523

commit e64a1c00c1d612dccd976c06fdac85afa3b06fbe
Author: Mircea Trofin <mtrofin@google.com>
Date:   Mon Sep 9 18:25:50 2024 -0700

    Fix unintended extra commit in PR #107499

commit f7479b5ff43261a20258743da5fa583a0c729564
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 18:24:07 2024 -0700

    [NFC][TableGen] Simplify DirectiveEmitter using range for loops (#107909)

    Make constructors that take const Record * implicit, allowing us to
    simplify some range based loops to use that class instance as the loop
    variable.

    Change remaining constructor calls to use () instead of {} to construct
    objects.

commit a111f9119a5ec77c19a514ec09454218f739454f
Author: Yingwei Zheng <dtcxzyw2333@gmail.com>
Date:   Tue Sep 10 09:19:39 2024 +0800

     [LoongArch][ISel] Check the number of sign bits in `PatGprGpr_32` (#107432)

    After https://github.com/llvm/llvm-project/pull/92205, LoongArch ISel
    selects `div.w` for `trunc i64 (sdiv i64 3202030857, (sext i32 X to
    i64)) to i32`. It is incorrect since `3202030857` is not a signed 32-bit
    constant. It will produce wrong result when `X == 2`:
    https://alive2.llvm.org/ce/z/pzfGZZ

    This patch adds additional `sexti32` checks to operands of
    `PatGprGpr_32`.
    Alive2 proof: https://alive2.llvm.org/ce/z/AkH5Mp

    Fix #107414.

commit f3b4e47b34e59625e2c8420ce8bf789373177d6d
Author: Longsheng Mou <longshengmou@gmail.com>
Date:   Tue Sep 10 09:19:22 2024 +0800

    [mlir][linalg][NFC] Drop redundant rankReductionStrategy (#107875)

    This patch drop redundant rankReductionStrategy in
    `populateFoldUnitExtentDimsViaSlicesPatterns` and fixes comment typos.

commit 3b2261809471a018de50e745c0d475b048c66fd4
Author: Mircea Trofin <mtrofin@google.com>
Date:   Mon Sep 9 18:16:24 2024 -0700

    [ctx_prof] Insert the ctx prof flattener after the module inliner (#107499)

    This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore.

    Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)

commit b0d2411b53a0b55baf6d6dc7986d285ce59807fa
Author: Alex MacLean <amaclean@nvidia.com>
Date:   Mon Sep 9 17:37:09 2024 -0700

    [NVPTX] Support copysign PTX instruction (#107800)

    Lower `fcopysign` SDNodes into `copysign` PTX instructions where
    possible. See [PTX ISA: 9.7.3.2. Floating Point Instructions: copysign]
    (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-copysign).

commit 81ef8e2fdbdfac4e186e12a874242b294d05d4e0
Author: Vitaly Buka <vitalybuka@google.com>
Date:   Mon Sep 9 17:00:06 2024 -0700

    [NFC][sanitizer] Extract GetDTLSRange (#107934)

commit ae02211eaef305f957b419e5c39499aa472b956e
Author: vporpo <vporpodas@google.com>
Date:   Mon Sep 9 16:52:54 2024 -0700

    [SandboxIR] Implement UndefValue (#107628)

    This patch implements sandboxir::UndefValue mirroring llvm::UndefValue.

commit 33c1325a73c4bf6bacdb865c2550038afe4377d2
Author: Anton Korobeynikov <anton@korobeynikov.info>
Date:   Mon Sep 9 16:34:41 2024 -0700

    [PAC] Make __is_function_overridden pauth-aware on ELF platforms (#107498)

    Apparently, there are two almost identical implementations: one for
    MachO and another one for ELF. The ELF bits somehow slipped while
    https://github.com/llvm/llvm-project/pull/84573 was reviewed.

    The particular implementation is identical to MachO case.

commit 88bd507dc2dd9c235b54d718cf84e4ef80d94bc9
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Mon Sep 9 11:07:38 2024 -0700

    [X86] Handle shifts + and in `LowerSELECTWithCmpZero`

    shifts are the same as sub where rhs == 0 is identity.
    and is the inverted case where:
        `SELECT (AND(X,1) == 0), (AND Y, Z), Y`
            -> `(AND Y, (OR NEG(AND(X, 1)), Z))`
    With -1 as the identity.

    Closes #107910

commit d148a1a40461ed27863f4b17ac2bd5914499f413
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Mon Sep 9 11:07:36 2024 -0700

    [X86] Add tests support shifts + and in `LowerSELECTWithCmpZero`; NFC

commit 26b786ae2f15bfbf6f0925856a788ae0bfb2f8c1
Author: Artem Belevich <tra@google.com>
Date:   Mon Sep 9 16:15:00 2024 -0700

    [NVPTX] Restrict combining to properly aligned v16i8 vectors. (#107919)

    Fixes generation of invalid loads leading to misaligned access errors.
    The bug got exposed by SLP vectorizer change ec360d6 which allowed SLP
    to produce `v16i8` vectors.

    Also updated the tests to use automatic check generator.

commit f12e10b513686a12f20f0c897dcc9ffc00cbce09
Author: vporpo <vporpodas@google.com>
Date:   Mon Sep 9 15:41:30 2024 -0700

    [SandboxVec] Implement Pass class (#107617)

    This patch implements the Pass base class and the FunctionPass sub-class
    that operate on Sandbox IR.

commit bdf02249e7f8f95177ff58c881caf219699acb98
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 14:33:21 2024 -0700

    [TableGen] Change CGIOperandList::OperandInfo::Rec to const pointer (#107858)

    Change CGIOperandList::OperandInfo::Rec and CGIOperandList::TheDef to
    const pointer.

    This is a part of effort to have better const correctness in TableGen
    backends:

    https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

commit a9a5a18a0e99b0251c0fe6ce61c5e699bf6b379b
Author: Tim Gymnich <tgymnich@icloud.com>
Date:   Mon Sep 9 23:27:27 2024 +0200

    [SPIRV] Add sign intrinsic part 1 (#101987)

    partially fixes #70078
    - Added `int_spv_sign` intrinsic in `IntrinsicsSPIRV.td`
    - Added lowering and map to `int_spv_sign in
    `SPIRVInstructionSelector.cpp`
    - Added SPIR-V backend test case in
    `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/sign.ll`
    - https://github.com/llvm/llvm-project/pull/101988
    - https://github.com/llvm/llvm-project/pull/101989

commit 66e9078f827383f77c1c239f6c09f2b07a963649
Author: Steven Wu <stevenwu@apple.com>
Date:   Mon Sep 9 14:12:12 2024 -0700

    [LTO] Fix a use-after-free in legacy LTO C APIs (#107896)

    Fix a bug that `lto_runtime_lib_symbols_list` is returning the address
    of a local variable that will be freed when getting out of scope. This
    is a regression from #98512 that rewrites the runtime libcall function
    lists into a SmallVector.

    rdar://135559037

commit d9a996020394a8181d17e4f0a0fc89d59371f9af
Author: ChiaHungDuan <chiahungduan@google.com>
Date:   Mon Sep 9 13:59:03 2024 -0700

    [scudo] Add fragmentation info for each memory group (#107475)

    This information helps with tuning the heuristic of selecting memory
    groups to release the unused pages.

commit 6f8d2781f604cfcf9ea6facecc0bea8e4d682e1e
Author: Sterling-Augustine <56981066+Sterling-Augustine@users.noreply.github.com>
Date:   Mon Sep 9 20:49:49 2024 +0000

    [SandboxIR] Add missing VectorType functions (#107650)

    Fills in many missing functions from VectorType

commit 53a81d4d26f0409de8a0655d7af90f2bea222a12
Author: Charlie Barto <chbarto@microsoft.com>
Date:   Mon Sep 9 13:41:08 2024 -0700

    Reland [asan][windows] Eliminate the static asan runtime on windows (#107899)

    This reapplies 8fa66c6ca7272268747835a0e86805307b62399c ([asan][windows]
    Eliminate the static asan runtime on windows) for a second time.

    That PR bounced off the tests because it caused failures in the other
    sanitizer runtimes, these have been fixed by only building interception,
    sanitizer_common, and asan with /MD, and continuing to build the rest of
    the runtimes with /MT. This does mean that any usage of the static
    ubsan/fuzzer/etc runtimes will mean you're mixing different runtime
    library linkages in the same app, the interception, sanitizer_common,
    and asan runtimes are designed for this, however it does result in some
    linker warnings.

    Additionally, it turns out when building in release-mode with
    LLVM_ENABLE_PDBs the build system forced /OPT:ICF. This totally breaks
    asan's "new" method of doing "weak" functions on windows, and so
    /OPT:NOICF was explicitly added to asan's link flags.

    ---------

    Co-authored-by: Amy Wishnousky <amyw@microsoft.com>

commit 34034381b7d54da864f8794f578d9c501d6d4f3b
Author: Florian Hahn <flo@fhahn.com>
Date:   Mon Sep 9 21:35:59 2024 +0100

    [VPlan] Consistently use VTC for vector trip count in vplan-printing.ll.

    The inconsistency surfaced in
    https://github.com/llvm/llvm-project/pull/95305. Split off the reduce
    the diff.

commit 3f22756f391e20040fa3581206b77c409433bd9f
Author: Justin Bogner <mail@justinbogner.com>
Date:   Mon Sep 9 13:21:22 2024 -0700

    [DirectX] Lower `@llvm.dx.typedBufferLoad` to DXIL ops

    The `@llvm.dx.typedBufferLoad` intrinsic is lowered to `@dx.op.bufferLoad`.
    There's some complexity here in translating to scalarized IR, which I've
    abstracted out into a function that should be useful for samples, gathers, and
    CBuffer loads.

    I've also updated the DXILResources.rst docs to match what I'm doing here and
    the proposal in llvm/wg-hlsl#59. I've removed the content about stores and raw
    buffers for now with the expectation that it will be added along with the work.

    Note that this change includes a bit of a hack in how it deals with
    `getOverloadKind` for the `dx.ResRet` types - we need to adjust how we deal
    with operation overloads to generate a table directly rather than proxy through
    the OverloadKind enum, but that's left for a later change here.

    Part of #91367

    Pull Request: https://github.com/llvm/llvm-project/pull/104252

commit 985600dcd3fcef4095097bea5b556e84c8143a7f
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 13:09:53 2024 -0700

    [TableGen] Migrate CodeGenHWModes to use const RecordKeeper (#107851)

    Migrate CodeGenHWModes to use const RecordKeeper and const Record
    pointers.

    This is a part of effort to have better const correctness in TableGen
    backends:

    https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

commit b3d2d5039b9b8aa10a86c593387f200b15c02aef
Author: Alexey Bataev <a.bataev@outlook.com>
Date:   Mon Sep 9 12:32:45 2024 -0700

    [SLP][NFC]Reorder code for better structural complexity, NFC

commit e62bf7cd0beb530bc0842bb7aa8ff162607a82b9
Author: Sean Perry <perry@ca.ibm.com>
Date:   Mon Sep 9 15:24:16 2024 -0400

    [z/OS] Set the default arch for z/OS to be arch10 (#89854)

    The default arch level on z/OS is arch10. Update the code so z/OS has
    arch10 without changing the default for zLinux.

commit 98815f7878c3240e27f516e331255532087f5fcb
Author: c8ef <c8ef@outlook.com>
Date:   Tue Sep 10 03:13:29 2024 +0800

    [clang][docs] Add clang-tutor to External Clang Examples (#107665)

commit 3681d8552fb9e6cb15e9d45849ff2e34a25c518e
Author: Nikita Popov <nikita.ppv@gmail.com>
Date:   Mon Sep 9 21:10:12 2024 +0200

    Revert "[Clang][Sema] Use the correct lookup context when building overloaded 'operator->' in the current instantiation (#104458)"

    This reverts commit 3cdb30ebbc18fa894d3bd67aebcff76ce7c741ac.

    Breaks clang bootstrap.

commit ab82f83dae065a9aa4716618524eddf4aad5fcf0
Author: Mingming Liu <mingmingl@google.com>
Date:   Mon Sep 9 11:53:07 2024 -0700

    [LTO][NFC] Fix forward declaration (#107902)

    Fix after https://github.com/llvm/llvm-project/pull/107792

commit 6776d65ceaea84fe815845da3c41b2f1621521fb
Author: NoumanAmir-10xe <66777536+NoumanAmir657@users.noreply.github.com>
Date:   Mon Sep 9 23:49:22 2024 +0500

    [libc++] Implement LWG3953 (#107535)

    Closes #105303

commit eec1ee8ef10820c61c03b00b68d242d8c87d478a
Author: Abhina Sree <Abhina.Sreeskantharajan@ibm.com>
Date:   Mon Sep 9 14:37:53 2024 -0400

    [SystemZ][z/OS] Enable lit testing for z/OS (#107631)

    This patch fixes various errors to enable llvm-lit to run on z/OS

commit 78c1009c3e54e59b6177deb4d74dd3a3083a3f01
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 11:35:13 2024 -0700

    [NFC][TableGen] DirectiveEmitter code cleanup (#107775)

    Eliminate unnecessary llvm:: prefix as this code is in llvm namespace.
    Use ArrayRef<> instead of std::vector references when appropriate.
    Use .empty() instead of .size() == 0.

commit 99ea357f7b5e7e01e42b8d68dd211dc304b3115b
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Mon Sep 9 11:34:53 2024 -0700

    [MLGO] Fix logging verbosity in scripts (#107818)

    This patch fixes issues related to logging verbosity in the MLGO python
    scripts. This was an oversight when converting from absl.logging to the
    python logging API as absl natively supports a --verbosity flag to set
    the desired logging level. This patch adds a flag to support similar
    functionality in Python's logging library and additionally updates
    docstrings where relevant to point to the new values.

commit a7c26aaf2eca61cd5d885194872471c63d68f3bc
Author: Zequan Wu <zequanwu@google.com>
Date:   Mon Sep 9 11:34:13 2024 -0700

    Revert "[Coverage] Ignore unused functions if the count is 0." (#107901)

    Reverts llvm/llvm-project#107661

    Breaks llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp

commit 02fff933d0eff71db8ff44f4acf1641bb1ad4d38
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Mon Sep 9 18:28:23 2024 +0000

    [MLGO] Remove unused imports

    Remove unused imports from python files in the MLGO library.

commit 048e46ad53bedef076df868524f0a15eb7cbd38c
Author: Brian Cain <bcain@quicinc.com>
Date:   Mon Sep 9 13:27:13 2024 -0500

    [clang, hexagon] Update copyright, license text (#107161)

    When this file was first contributed - `28b01c59c93d ([hexagon] Add
    {hvx,}hexagon_{protos,circ_brev...}, 2021-06-30)` - I incorrectly
    included a QuIC copyright statement with "All rights reserved". I should
    have contributed this file with the `Apache+LLVM exception` license.

commit b1b9b7b853fc4301aedd9ad6b7c22b75f5546b94
Author: Eduard Satdarov <sath@yandex-team.ru>
Date:   Mon Sep 9 21:17:53 2024 +0300

    [libc++] Cache file attributes during directory iteration (#93316)

    This patch adds caching of file attributes during directory iteration
    on Windows. This improves the performance when working with files being
    iterated on in a directory.

commit 09b231cb38755e1bd122dbab9c57c4847bf64204
Author: Mingming Liu <mingmingl@google.com>
Date:   Mon Sep 9 11:16:58 2024 -0700

    Re-apply "[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF" (#107792)

    Fix the use-after-free bug and re-apply
    https://github.com/llvm/llvm-project/pull/106193
    * Without the fix, the string referenced by `objSym.Name` could be
    destroyed even if string saver keeps a copy of the referenced string.
    This caused use-after-free.
    * The fix ([latest
    commit](https://github.com/llvm/llvm-project/pull/107792/commits/9776ed44cfb26172480145aed8f59ba78a6fa2ea))
    updates `objSym.Name` to reference (via `StringRef`) the string saver's
    copy.

    Test:
    1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible
    with `-DLLVM_USE_SANITIZER=Address` and gone with the fix.
    3. Run all tests by following
    https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes.
    * Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at
    `@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined
    [here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30)
    * With the fix, the [multi-stage
    test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh)
    pass stage2 {asan, ubsan, masan}. This is also the test used by
    https://lab.llvm.org/buildbot/#/builders/169

    **Original commit message**

    `StringMap<T>` creates a [copy of the
    string](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMapEntry.h#L55-L58)
    for entry insertions and intentionally keep copies [since the
    implementation optimizes string memory
    usage](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMap.h#L124).
    On the other hand, linker keeps copies of symbol names [1] in
    `lld::elf::parseFiles` [2] before invoking `compileBitcodeFiles` [3].

    This change proposes to optimize away string copies inside
    [LTO::GlobalResolutions](https://github.com/llvm/llvm-project/blob/24e791b4164986a1ca7776e3ae0292ef20d20c47/llvm/include/llvm/LTO/LTO.h#L409),
    which will make LTO indexing more memory efficient for ELF. There are
    similar opportunities for other (COFF, wasm, MachO) formats.

    The optimization takes place for lld (ELF) only. For the rest of use
    cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep
    copies and use global resolution key for de-duplication.

    Together with @kazutakahirata's work to make `ComputeCrossModuleImport`
    more memory efficient, we see a ~20% peak memory usage reduction in a
    binary where peak memory usage needs to go down. Thanks to the
    optimization in
    https://github.com/llvm/llvm-project/commit/329ba523ccbbe68a12434926c92fd9a86494d958,
    the max (as opposed to the sum) of `ComputeCrossModuleImport` or
    `GlobalResolution` shows up in peak memory usage.
    * Regarding correctness, the set of
    [resolved](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/lib/LTO/LTO.cpp#L739)
    [per-module
    symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L188-L191)
    is a subset of
    [llvm::lto::InputFile::Symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L120).
    And bitcode symbol parsing saves symbol name when iterating
    `obj->symbols` in `BitcodeFile::parse` already. This change updates
    `BitcodeFile::parseLazy` to keep copies of per-module undefined symbols.
    * Presumably the undefined symbols in a LTO unit (copied in this patch
    in linker unique saver) is a small set compared with the set of symbols
    in global-resolution (copied before this patch), making this a
    worthwhile trade-off. Benchmarking this change alone shows measurable
    memory savings across various benchmarks.

    [1] ELF
    https://github.com/llvm/llvm-project/blob/1cea5c2138bef3d8fec75508df6dbb858e6e3560/lld/ELF/InputFiles.cpp#L1748
    [2]
    https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2863
    [3]
    https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2995

commit 277371943fa48f2550df02870951f5e5a77efef5
Author: lntue <35648136+lntue@users.noreply.github.com>
Date:   Mon Sep 9 14:15:46 2024 -0400

    [libc][bazel] Update bazel overlay for math functions and their tests. (#107862)

commit 4a501a4556bb191bd6eb5398a7330a28437e5087
Author: Artem Belevich <tra@google.com>
Date:   Mon Sep 9 11:14:41 2024 -0700

    [CUDA/HIP] propagate -cuid to a host-only compilation. (#107483)

    Right now we're bailing out too early, and `-cuid` does not get set for
    the host-only compilations.

commit 6850410562123b6e4fbb039e7ba4a2325b994b84
Author: Zequan Wu <zequanwu@google.com>
Date:   Mon Sep 9 11:14:21 2024 -0700

    [Coverage] Ignore unused functions if the count is 0. (#107661)

    Relax the condition to ignore the case when count is 0.

    This fixes a bug on
    https://github.com/llvm/llvm-project/commit/381e9d2386facea7f2acc0f8c16a6d0731267f80.
    This was reported at
    https://discourse.llvm.org/t/coverage-from-multiple-test-executables/81024/.

commit 5f74671c85877e03622e8d308aee15ed73ccee7c
Author: Tarun Prabhu <tarun@lanl.gov>
Date:   Mon Sep 9 12:10:16 2024 -0600

    [flang][Driver] Support -Xlinker in flang (#107472)

    Partially addresses: https://github.com/llvm/llvm-project/issues/89888

commit 0f349b7a9cde0080e626f6cfd362885341eb63b4
Author: Sarah Spall <spall@users.noreply.github.com>
Date:   Mon Sep 9 11:07:20 2024 -0700

    [HLSL] Implement support for HLSL intrinsic - select (#107129)

    Implement support for HLSL intrinsic select.
    This would close issue #75377

commit 34e3007c69eb91c16f23f20548305a2fb8feb75e
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 10:51:52 2024 -0700

    [ARM] Fix a warning

    This patch fixes:

      llvm/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h:214:5: error: default
      label in switch which covers all enumeration values
      [-Werror,-Wcovered-switch-default]

commit 6cc0138ca3dbdb21f4c4a5fa39cf05c38da4bb75
Author: Chris B <chris.bieneman@me.com>
Date:   Mon Sep 9 12:34:50 2024 -0500

    Fix implicit conversion rank ordering (#106811)

    DXC prefers dimension-preserving conversions over precision-losing
    conversions. This means a double4 -> float4 conversion is preferred over
    a double4 -> double3 or double4 -> double conversion.

commit cd8229bb4bfa4de45528ce101d9dceb9be8bff9e
Author: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
Date:   Mon Sep 9 10:32:35 2024 -0700

    [flang][cuda] Support c_devptr in c_f_pointer intrinsic (#107470)

    This is an extension of CUDA Fortran. The iso_c_binding intrinsic can
    accept a `TYPE(c_devptr)` as its first argument. This patch relax the
    semantic check to accept it and update the lowering to unwrap the cptr
    field from the c_devptr.

commit 7543d09b852695187d08aa5d56d50016fea8f706
Author: Andrew Ng <andrew.ng@sony.com>
Date:   Mon Sep 9 18:18:41 2024 +0100

    [llvm-ml] Fix RIP-relative addressing for ptr operands (#107618)

    Fixes #54773

commit 7f90479b2300b3758fd90015a2e6e7e94cfcf1e7
Author: Leandro Lupori <leandro.lupori@linaro.org>
Date:   Mon Sep 9 14:09:45 2024 -0300

    [flang][OpenMP] Don't abort when default is used on an invalid directive (#107586)

    The previous assert was not considering programs with semantic errors.

    Fixes https://github.com/llvm/llvm-project/issues/107495
    Fixes https://github.com/llvm/llvm-project/issues/93437

commit 95831f012d76558fe78f5f3e71b1003a773384e5
Author: David Green <david.green@arm.com>
Date:   Mon Sep 9 18:04:38 2024 +0100

    [ARM] Add a default unreachable case to AddrModeToString. NFC

    Fixes #107739

commit c36c462cc719d47aa2408bca91a028300b2be6d4
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 09:44:37 2024 -0700

    [LTO] Simplify calculateCallGraphRoot (NFC) (#107765)

    The function returns an instance of FunctionSummary populated by
    calculateCallGraphRoot regardless of whether Edges is empty or not.

commit 7d371725cdf993d16f6debf74cf740c3aea84f9b
Author: Mingming Liu <mingmingl@google.com>
Date:   Mon Sep 9 09:43:47 2024 -0700

    [NFCI][BitcodeReader]Read real GUID from VI as opposed to storing it in map (#107735)

    Currently, `ValueIdToValueInfoMap` [1] stores `std::tuple<ValueInfo,
    GlobalValue::GUID /* original GUID */, GlobalValue::GUID /* real GUID*/
    >`. This change updates the stored value type to `std::pair<ValueInfo,
    GlobalValue::GUID /* original GUID */>`, and reads real GUID from
    ValueInfo.

    When an entry is inserted into `ValueIdToValueInfoMap`, ValueInfo is
    created or inserted using real GUID [2]. ValueInfo keeps a pointer to
    GlobalValueMap [3], using either `GUID` or `{GUID, Name}` [4] when
    reading per-module summaries to create a combined summary.

    [1] owned by per module-summary bitcode reader
    https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L947-L950
    [2]
    [first](https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7130-L7133),
    [second](https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7221-L7222),
    [third](https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7622-L7623)
    [3]
    https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1427-L1431
    [4]
    https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1631
    and
    https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1621

    ---------

    Co-authored-by: Kazu Hirata <kazu@google.com>

commit 60f052edc66a5b5b346635656f231930c436a008
Author: Petr Hosek <phosek@google.com>
Date:   Mon Sep 9 09:43:02 2024 -0700

    [CMake] Passthrough variables for packages to subbuilds (#107611)

    These packaged are imported by LLVMConfig.cmake and so we should be
    passing through the necessary variables from the parent build into the
    subbuilds.

    We use `CMAKE_CACHE_DEFAULT_ARGS` so subbuilds can override these
    variables if needed.

commit 5c8fd1eece8fff69871cef57a2363dc0f734a7d1
Author: Sam Clegg <sbc@chromium.org>
Date:   Mon Sep 9 09:28:08 2024 -0700

    [lld][WebAssembly] Fix use of uninitialized stack data with --wasm64 (#107780)

    In the case of `--wasm64` we were setting the type of the init expression
    to be 64-bit but were only setting the low 32-bits of the value (by
    assigning to Int32).

    Fixes: https://github.com/emscripten-core/emscripten/issues/22538

commit 95753ffa49f57c284a4682a8ca03e05d59f2c112
Author: LLVM GN Syncbot <llvmgnsyncbot@gmail.com>
Date:   Mon Sep 9 16:13:05 2024 +0000

    [gn build] Port ea2da571c761

commit db6051dae085c35020c1273ae8d38508c9958bc7
Author: Pavel Skripkin <paskripkin@gmail.com>
Date:   Mon Sep 9 19:12:38 2024 +0300

    [analyzer] fix crash on binding to symbolic region with `void *` type (#107572)

    As reported in
    https://github.com/llvm/llvm-project/pull/103714#issuecomment-2295769193.
    CSA crashes on trying to bind value to symbolic region with `void *`.
    This happens when such region gets passed as inline asm input and engine
    tries to bind `UnknownVal` to that region.

    Fix it by changing type from void to char before calling
    `GetElementZeroRegion`

commit 3cdb30ebbc18fa894d3bd67aebcff76ce7c741ac
Author: Krystian Stasiowski <sdkrystian@gmail.com>
Date:   Mon Sep 9 12:06:45 2024 -0400

    [Clang][Sema] Use the correct lookup context when building overloaded 'operator->' in the current instantiation (#104458)

    Currently, clang erroneously rejects the following:
    ```
    struct A
    {
        template<typename T>
        void f();
    };

    template<typename T>
    struct B
    {
        void g()
        {
            (*this)->template f<int>(); // error: no member named 'f' in 'B<T>'
        }

        A* operator->();
    };
    ```

    This happens because `Sema::ActOnStartCXXMemberReference` does not adjust the `ObjectType` parameter when `ObjectType` is a dependent type (except when the type is a `PointerType` and the class member access is the `->` form). Since the (possibly adjusted) `ObjectType` parameter (`B<T>` in the above example) is passed to `Parser::ParseOptionalCXXScopeSpecifier`, we end up looking up `f` in `B` rather than `A`.

    This patch fixes the issue by identifying cases where the type of the object expression `T` is a dependent, non-pointer type and:
    - `T` is the current instantiation and lookup for `operator->` finds a member of the current instantiation, or
    - `T` has at least one dependent base case, and `operator->` is not found in the current instantiation

    and using `ASTContext::DependentTy` as the type of the object expression when the optional _nested-name-specifier_ is parsed.

    Fixes #104268.

commit eba6160deec5a32e4b31c2a446172d0e388195c9
Author: Tarun Prabhu <tarun@lanl.gov>
Date:   Mon Sep 9 09:57:49 2024 -0600

    [flang][Driver] Support --no-warnings option (#107455)

    Because of the way visibility is implemented in Options.td, options that
    are aliases do not inherit the visibility of the option being aliased.
    Therefore, explicitly set the visibility of the alias to be the same as
    the aliased option.

    This partially addresses
    https://github.com/llvm/llvm-project/issues/89888

commit 914ab366c24cf494a798ce3a178686456731861a
Author: sstipanovic <146831748+sstipanovic@users.noreply.github.com>
Date:   Mon Sep 9 17:54:30 2024 +0200

    [AMDGPU] Overload image atomic swap to allow float as well. (#107283)

    LLPC can generate llvm.amdgcn.image.atomic.swap intrinsic with data
    argument as float type as well as float return type. This went unnoticed
    until CreateIntrinsic with implicit mangling was used.

commit ea2da571c761066542f8d2273933d2523279e631
Author: Tyler Nowicki <tyler.nowicki@amd.com>
Date:   Mon Sep 9 11:50:27 2024 -0400

    [Coroutines] Move the SuspendCrossingInfo analysis helper into its own header/source (#106306)

    * Move the SuspendCrossingInfo analysis helper into its own
    header/source

    See RFC for more info:
    https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057

    Co-authored-by: tnowicki <tnowicki.nowicki@amd.com>

commit 1651014960b90bd1398f61bec0866d4a187910ef
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 08:47:42 2024 -0700

    [TableGen] Change SetTheory set/vec to use const Record * (#107692)

    Change SetTheory::RecSet/RecVec to use const Record pointers.

commit e46f03bc31a61a903416f1d3c68063ab75aebe6e
Author: Teresa Johnson <tejohnson@google.com>
Date:   Mon Sep 9 08:17:41 2024 -0700

    [MemProf] Remove unnecessary data structure (NFC) (#107643)

    Recent change #106623 added the CallToFunc map, but I subsequently
    realized the same information is already available for the calls being
    examined in the StackIdToMatchingCalls map we're iterating through.

commit 86e5c5468ae3fcd65b23fd7b3cb0182e676829bd
Author: Nicolas van Kempen <nvankemp@gmail.com>
Date:   Mon Sep 9 11:15:28 2024 -0400

    [clang-tidy][run-clang-tidy] Fix minor shutdown noise (#105724)

    On my new machine, the script outputs some shutdown noise:
    ```
    Ctrl-C detected, goodbye.
    Traceback (most recent call last):
      File "/home/nvankempen/llvm-project/./clang-tools-extra/clang-tidy/tool/run-clang-tidy.py", line 626, in <module>
        asyncio.run(main())
      File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
        return loop.run_until_complete(main)
      File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
        self.run_forever()
      File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
        self._run_once()
      File "/usr/lib/python3.10/asyncio/base_events.py", line 1871, in _run_once
        event_list = self._selector.select(timeout)
      File "/usr/lib/python3.10/selectors.py", line 469, in select
        fd_event_list = self._selector.poll(timeout, max_ev)
    KeyboardInterrupt
    ```

    This fixes it. Also remove an unused typing import.
    Relevant documentation:
    https://docs.python.org/3/library/asyncio-runner.html#handling-keyboard-interruption

commit 763bc9249cf0b7da421182e24716d9a569fb5184
Author: Jakub Kuderski <jakub@nod-labs.com>
Date:   Mon Sep 9 11:12:26 2024 -0400

    [mlir][amdgpu] Align Chipset with TargetParser (#107720)

    Update the Chipset struct to follow the `IsaVersion` definition from
    llvm's `TargetParser`. This is a follow up to
    https://github.com/llvm/llvm-project/pull/106169#discussion_r1733955012.

    * Add the stepping version. Note: This may break downstream code that
    compares against the minor version directly.
    * Use comparisons with full Chipset version where possible.

    Note that we can't use the code in `TargetParser` directly because the
    chipset utility is outside of `mlir/Target` that re-exports llvm's
    target library.

commit 6cc3bf7d1d343f910b40cee24d4cda873a6ddd55
Author: Quinn Dawkins <quinn.dawkins@gmail.com>
Date:   Mon Sep 9 11:05:37 2024 -0400

    [mlir][tensor] Add canonicalization to fold consecutive tensor.pad ops (#107302)

    `tensor.pad(tensor.pad)` with the same constant padding value can be
    combined into a single pad that pads to the sum of the high and low
    padding amounts.

commit ea9204505cf1099b98b1fdcb898f0bd35e463984
Author: Lei Huang <lei@ca.ibm.com>
Date:   Mon Sep 9 11:01:22 2024 -0400

    Fix codegen for transparent_union function params (#104816)

    Update codegen for func param with transparent_union attr to be that of
    the first union member.

    This is a followup to #101738 to fix non-ppc codegen and closes #76773.

commit 6634d44e5e6079e19efe54c2de35e2e63108b085
Author: Amy Wang <kai.ting.wang@huawei.com>
Date:   Mon Sep 9 10:57:13 2024 -0400

    [MLIR][Transform] Allow stateInitializer and stateExporter for applyTransforms (#101186)

    This is discussed in RFC:

    https://discourse.llvm.org/t/rfc-making-the-constructor-of-the-transformstate-class-protected/80377

commit 111932d5cae0199d9c59669b37232a011f8b8757
Author: Luke Lau <luke@igalia.com>
Date:   Mon Sep 9 22:45:44 2024 +0800

    [RISCV] Fix same mask vmerge peephole discarding false operand (#107827)

    This fixes the issue raised in
    https://github.com/llvm/llvm-project/pull/106108#discussion_r1749677510

    True's passthru needs to be equivalent to vmerge's false, but we also
    allow true's passthru to be undef.

    However if it's undef then we need to replace it with false, otherwise
    we end up discarding the false operand entirely.

    The changes in fixed-vectors-strided-load-store-asm.ll undo the changes
    in #106108 where we introduced this miscompile.

commit 2d338bed00b2bba713bceb4915400063b95929b2
Author: Tobias Stadler <mail@stadler-tobias.de>
Date:   Mon Sep 9 16:30:44 2024 +0200

    [CodeGen] Refactor DeadMIElim isDead and GISel isTriviallyDead (#105956)

    Merge GlobalISel's isTriviallyDead and DeadMachineInstructionElim's
    isDead code and remove all unnecessary checks from the hot path by
    looping over the operands before doing any other checks.

    See #105950 for why DeadMIElim needs to remove LIFETIME markers even
    though they probably shouldn't generally be considered dead.

    x86 CTMark O3: -0.1%
    AArch64 GlobalISel CTMark O0: -0.6%, O2: -0.2%

commit a2f659c1349cb70c09b183eb214e2a24cf04c2c6
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 07:15:12 2024 -0700

    [StructurizeCFG] Avoid repeated hash lookups (NFC) (#107797)

commit ab95ed5ce0b099913eb5c9b03fef7f322c24acd2
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 07:14:40 2024 -0700

    [IPO] Avoid repeated hash lookups (NFC) (#107796)

commit 3940a1ba1454afec916be86385bb2031526e3e13
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 07:13:52 2024 -0700

    [Float2Int] Avoid repeated hash lookups (NFC) (#107795)

commit 563dc226fe17f7638d02a957d1b2870dfa968f01
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 07:13:27 2024 -0700

    [Analysis] Avoid repeated hash lookups (NFC) (#107794)

commit 620b8d994b8abdcf31271d9f4db7e7422fc9bd65
Author: Samuel Thibault <samuel.thibault@ens-lyon.org>
Date:   Mon Sep 9 15:53:33 2024 +0200

    [hurd] Fix accessing f_type field of statvfs (#71851)

    f4719c4d2cda ("Add support for GNU Hurd in Path.inc and other places")
    made llvm use an internal __f_type name for the f_type field (which it
    is not supposed to since accessing double-underscore names is explicitly
    not supported by standards). In glibc 2.39 this field was renamed to
    f_type so application can now access the field as the standard says.

commit eaac4a26136ca8e3633bf91795343cd060d7af87
Author: Pierre van Houtryve <pierre.vanhoutryve@amd.com>
Date:   Mon Sep 9 15:35:28 2024 +0200

    [AMDGPU] Document & Finalize GFX12 Memory Model (#98599)

    Documents the memory model implemented as of #98591, with some
    fixes/optimizations to the implementation.

commit 1a5a1e97817c9a3db4d1f9795789c99790cf88e2
Author: Florian Hahn <flo@fhahn.com>
Date:   Mon Sep 9 14:26:08 2024 +0100

    [VPlan] Assert that VFxUF is always used.

    Add assertion to ensure invariant discussed in
    https://github.com/llvm/llvm-project/pull/95305.

commit 1f2a634c44dedef11f590956f297b2c7a1659fcf
Author: Sergey Kachkov <sergey.kachkov@syntacore.com>
Date:   Wed Sep 4 17:42:03 2024 +0300

    Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380)

    Motivating example: https://godbolt.org/z/eb97zrxhx
    Here we have 2 induction variables in the loop: one is corresponding to
    i variable (add rdx, 4), the other - to res (add rax, 2). The second
    induction variable can be removed by rewriteLoopExitValues() method
    (final value of res at loop exit is unroll_iter * -2); however, this
    doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
    ```
    ; Preheader:
    for.body.preheader.new:                           ; preds = %for.body.preheader
      %unroll_iter = and i64 %N, -4
      br label %for.body

    ; Loop:
    for.body:                                         ; preds = %for.body, %for.body.preheader.new
      %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
      %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
      %inc.3 = add nuw i64 %i.07, 4
      %lsr.iv.next = add nsw i64 %lsr.iv, -2
      %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
      br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

    ; Exit blocks
    for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
      %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
      %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
      %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
      br label %for.end.loopexit.unr-lcssa
    ```
    rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
    one in LCSSA phi node, the other - in induction phi node. Here we have 3
    uses of this value because of duplicated lcssa nodes, so the transform
    doesn't apply and leads to an extra add operation inside the loop. The
    proposed solution is to accumulate inserted instructions that will
    require LCSSA form update into SetVector and then call
    formLCSSAForInstructions for this SetVector once, so the same
    instructions don't process twice.

    Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation
    when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are
    moved into separate duplicated-phis.ll test with explicit x86 target requirement
    to fix bots which are not building this target.

commit 17f0c5dfaab8bc72e19cb68e73b0944e5ee27b88
Author: Sergey Kachkov <sergey.kachkov@syntacore.com>
Date:   Fri Aug 30 16:00:42 2024 +0300

    [LSR][NFC] Add pre-commit test

commit aa158bf40285925d3c019d9e697cd2c88421297a
Author: Florian Hahn <flo@fhahn.com>
Date:   Mon Sep 9 14:10:12 2024 +0100

    [LV] Update tests to replace some code with loop varying instructions.

    Update some tests with loop-invariant instructions, where hoisting them
    out of the loop changes the vectorization decision. This should preserve
    their original spirit when making further improvements.

commit e25eb1433110d94d16fd69e5aca9bdf72259263d
Author: Florian Hahn <flo@fhahn.com>
Date:   Mon Sep 9 13:05:54 2024 +0100

    [ConstraintElim] Add tests for loops with chained header conditions.

commit 1199e5b9ce5a001445463ba8da1f70fa4558fbcc
Author: Nikita Popov <npopov@redhat.com>
Date:   Mon Sep 9 12:45:48 2024 +0200

    [MemCpyOpt] Add more tests for memcpy passed to readonly arg (NFC)

commit cf8fb4320f1be29c55909adf5ff8ad47e02b2dbe
Author: Momchil Velikov <momchil.velikov@arm.com>
Date:   Mon Sep 9 13:34:41 2024 +0100

    [AArch64] Implement NEON vamin/vamax intrinsics (#99041)

    This patch implements the intrinsics of the form

        floatNxM_t vamin[q]_fN(floatNxM_t vn, floatNxM_t vm);
        floatNxM_t vamax[q]_fN(floatNxM_t vn, floatNxM_t vm);

    as defined in https://github.com/ARM-software/acle/pull/324

    ---------

    Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>

commit 32cef07885e112d05bc2b1c285f40e353d80e18f
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 05:27:38 2024 -0700

    [LLDB][TableGen] Migrate lldb-tblgen to use const RecordKeeper (#107536)

    Migrate LLDB TableGen backend to use const RecordKeeper.

    This is a part of effort to have better const correctness in TableGen
    backends:

    https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

commit cca54e347ac34912cdfb9983533c61836db135e0
Author: Martin Storsjö <martin@martin.st>
Date:   Mon Sep 9 15:08:19 2024 +0300

    Revert "Reapply "[Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer" (#97308)"

    This reverts commit 45c8766973bb3bb73dd8d996231e114dcf45df9f
    and 049512e39d96995cb373a76cf2d009a86eaf3aab.

    This change triggers failed asserts on inputs like this:

        struct a {
        } constexpr b;
        class c {
        public:
          c(a);
        };
        class B {
        public:
          using d = int;
          struct e {
            enum { f } g;
            int h;
            c i;
            d j{};
          };
        };
        B::e k{B::e::f, int(), b};

    Compiled like this:

        clang -target x86_64-linux-gnu -c repro.cpp
        clang: ../../clang/lib/CodeGen/CGExpr.cpp:3105: clang::CodeGen::LValue
        clang::CodeGen::CodeGenFunction::EmitDeclRefLValue(const clang::DeclRefExpr*):
        Assertion `(ND->isUsed(false) || !isa<VarDecl>(ND) || E->isNonOdrUse() ||
        !E->getLocation().isValid()) && "Should not use decl without marking it used!"' failed.

commit 7a930ce327fdbc5c77b50ee6304645084100c037
Author: Jeremy Morse <jeremy.morse@sony.com>
Date:   Mon Sep 9 12:54:45 2024 +0100

    [DWARF] Emit a minimal line-table for totally empty functions (#107267)

    In degenerate but legal inputs, we can have functions that have no source
    locations at all -- all the DebugLocs attached to instructions are empty.
    LLVM didn't produce any source location for the function; with this patch
    it will at least emit the function-scope source location. Demonstrated by
    empty-line-info.ll

    The XCOFF test modified has similar symptoms -- with this patch, the size
    of the ".dwline" section grows a bit, thus shifting some of the file
    internal offsets, which I've updated.

commit 959d84044a70da08923fe221f999f4e406094ee9
Author: pvanhout <pierre.vanhoutryve@amd.com>
Date:   Mon Sep 9 13:50:48 2024 +0200

    [AMDGPU] Remove unused SplitGraph::Node::getFullCost

commit b8b8fbe19dea2825b801c4738ff78dbf26aae430
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 04:18:55 2024 -0700

    [NFC][TableGen] Migrate LLVM Attribute Emitter to const RecordKeeper (#107698)

    Migrate LLVM Attribute Emitter to const RecordKeeper.

commit d84d9559bdc7aeb4ce14c251f6a3490c66db8d3a
Author: Nicolas van Kempen <nvankemp@gmail.com>
Date:   Mon Sep 9 07:12:46 2024 -0400

    [clang][analyzer] Fix #embed crash (#107764)

    Fix #107724.

commit 09c00b6f0463f6936be5d2100f9d47c0077700f8
Author: Benjamin Kramer <benny.kra@googlemail.com>
Date:   Mon Sep 9 13:03:38 2024 +0200

    [bazel] Add missing dependencies for 345cc47ba7a28811ae4ec7d113059ccb39c500a3

commit 049512e39d96995cb373a76cf2d009a86eaf3aab
Author: yronglin <yronglin777@gmail.com>
Date:   Mon Sep 9 19:01:11 2024 +0800

    [NFC][clang] Fix clang version in the test for the implementation of cwg1815 (#107838)

    This PR fix the clang version in
    https://github.com/llvm/llvm-project/pull/97308 .

    Signed-off-by: yronglin <yronglin777@gmail.com>

commit 345cc47ba7a28811ae4ec7d113059ccb39c500a3
Author: Daniil Fukalov <dfukalov@gmail.com>
Date:   Mon Sep 9 12:44:03 2024 +0200

    [NFC] Add explicit #include llvm-config.h where its macros are used, lldb part. (#107603)

    (this is lldb part)

    Without these explicit includes, removing other headers, who implicitly
    include llvm-config.h, may have non-trivial side effects. For example,
    `clangd` may report even `llvm-config.h` as "no used" in case it defines
    a macro, that is explicitly used with #ifdef. It is actually amplified
    with different build configs which use different set of macros.

commit dbd81ba2e85c2f244f22c983d96a106eae65c06a
Author: Mikhail Goncharov <goncharov.mikhail@gmail.com>
Date:   Mon Sep 9 11:47:47 2024 +0200

    complete rename of __orc_rt namespace

    for 3e04ad428313dde40c779af6d675b162e150125e

    it's bizzare that none of the builbots were broken, only bazel build
    https://buildkite.com/llvm-project/upstream-bazel/builds/109623#0191d5d0-2b3e-4ee7-b8dd-1e2580977e9b

commit 663e9cec9c96169aa4e72ab9b6bf08b2d6603093
Author: Artem Kroviakov <71938912+akroviakov@users.noreply.github.com>
Date:   Mon Sep 9 11:49:16 2024 +0200

    [Func][GPU] Use SymbolUserOpInterface in func::ConstantOp  (#107748)

    This PR enables `func::ConstantOp` creation and usage for device
    functions inside GPU modules.
    The current main returns error for referencing device functions via
    `func::ConstantOp`, because during the `ConstantOp` verification it only
    checks symbols in `ModuleOp` symbol table, which, of course, does not
    contain device functions that are defined in `GPUModuleOp`. This PR
    proposes a more general solution.

    Co-authored-by: Artem Kroviakov <artem.kroviakov@tum.de>

commit aa21ce4a792c170074193c32e8ba8dd35e57c628
Author: Jonas Rickert <Jonas.Rickert@amd.com>
Date:   Mon Sep 9 11:48:13 2024 +0200

    [mlir] Do not set lastToken in AsmParser's resetToken function and add a unit test for AsmParsers's locations (#105529)

    This changes the function `resetToken` to not update `lastToken`.

    The member `lastToken` is the last token that was consumed by the
    parser.
    Resetting the lexer position to a different position does not cause any
    token to be consumed, so `lastToken` should not be updated.
    Setting it to `curToken` can cause the scopeLoc.end location of
    `OperationDefinition `to be off-by-one, pointing to the
    first token after the operation.

    An example for an operation for which the scopeLoc.end location was
    wrong before is:
    ```
    %0 = torch.vtensor.literal(dense_resource<__elided__> : tensor<768xbf16>) : !torch.vtensor<[768],bf16>
    ```
    Here the scope end loc always pointed to the next token

    This also adds a test for the Locations of `OperationDefinitions`.
    Without the change to `resetToken` the test failes, with the scope end
    location for `llvm.mlir.undef` pointing to the `func.return` in the next
    line

commit b98aa6fb1d5f5fa904ce6d789a8fa4a245a90ee6
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Mon Sep 9 10:29:04 2024 +0100

    [X86] LowerABD - lower i8/i16 cases directly to CMOV(SUB(X,Y),SUB(Y,X)) pattern

    Better codegen (shorter dependency chain for better ILP) than via the TRUNC(ABS(SUB(EXT(LHS),EXT(RHS)))) expansion

commit d57be195e37f9c11a26e8e3fe8da5ef62bb921af
Author: Lukacma <Marian.Lukac@arm.com>
Date:   Mon Sep 9 10:28:01 2024 +0100

    [AArch64] replace SVE intrinsics with no active lanes with zero (#107413)

    This patch extends https://github.com/llvm/llvm-project/pull/73964 and
    optimises SVE intrinsics into zero constants when predicate is zero.

commit 476b1a661f6846537d232e9a3bc5a68c5f15efb3
Author: Jerry-Ge <jerry.ge@arm.com>
Date:   Mon Sep 9 02:26:39 2024 -0700

    [TOSA] Update input name for Sin and Cos operators (#107606)

    Update the dialect input names from input to input1 for Sin/Cos for
    consistency.

    Signed-off-by: Jerry Ge <jerry.ge@arm.com>

commit da11ede57d034767a6f5d5e211c06c1c3089d7fd
Author: vabridgers <58314289+vabridgers@users.noreply.github.com>
Date:   Mon Sep 9 03:47:39 2024 -0500

    [analyzer] Remove overzealous "No dispatcher registered" assertion (#107294)

    Random testing revealed it's possible to crash the analyzer with the
    command line invocation:

    clang -cc1 -analyze -analyzer-checker=nullability empty.c

    where the source file, empty.c is an empty source file.

    ```
    clang: <root>/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp:56:
       void clang::ento::CheckerManager::finishedCheckerRegistration():
         Assertion `Event.second.HasDispatcher && "No dispatcher registered for an event"' failed.

    PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/

    Stack dump:
    0.      Program arguments: clang -cc1 -analyze -analyzer-checker=nullability nullability-nocrash.c
     ...
                 clang::AnalyzerOptions&, clang::Preprocessor const&,
                 llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>,
                 std::allocator<char>>>, llvm::ArrayRef<std::function<void (clang::ento::CheckerRegistry&)>>)
    ```

    This commit removes the assertion which failed here, because it was
    logically incorrect: it required that if an Event is handled by some
    (enabled) checker, then there must be an **enabled** checker which can
    emit that kind of Event. It should be OK to disable the event-producing
    checkers but enable an event-consuming checker which has different
    responsibilities in addition to handling the events.

    Note that this assertion was in an `#ifndef NDEBUG` block, so this
    change does not impact the non-debug builds.

    Co-authored-by: Vince Bridgers <vince.a.bridgers@ericsson.com>

commit 04742f34b343af87dda93edacbb06f6e98a1d80f
Author: Nikita Popov <npopov@redhat.com>
Date:   Mon Sep 9 10:24:54 2024 +0200

    [SCCP] Add test for nonnull argument inference (NFC)

commit 3b1146e050657f40954e8e1f977837f884df2488
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Mon Sep 9 01:27:22 2024 -0700

    [llvm-exegesis] Use MCRegister instead of unsigned to hold registers (#107820)

commit 74ad2540523ec78122ba5a32e35e0b65ee27b7b3
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Mon Sep 9 08:10:11 2024 +0000

    [Github][MLGO] Fix mlgo-utils path in new-prs-labeler

    This patch (hopefully) fixes the mlgo-utils path in new-prs-labeler so
    that it actually matches all files in that directory. Currently it is
    not catching the files as they are relatively deeply nested within the
    folder.

commit 3e04ad428313dde40c779af6d675b162e150125e
Author: Lang Hames <lhames@gmail.com>
Date:   Mon Sep 9 17:59:47 2024 +1000

    [ORC-RT] Remove double underscore from the orc_rt namespace.

    We should use `orc_rt` as the public C++ API namespace for the ORC runtime and
    control symbol visibility to hide implementation details, rather than rely on
    the '__' prefix.

commit d5f6f30664ed53ef27d949fad0ce3994ea9988dd
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Mon Sep 9 07:49:54 2024 +0000

    [MLGO] Add spaces at the end of lines in multiline string

    This patch adds spaces at the end of lines in multiline strings in the
    extract_ir script. Without this patch, the warning/info messages will be
    printed without spaces between words when there is a line break in the
    source which looks/reads weird.

commit 8549b324bc1f450f4477f46f18db67439dbf6d75
Author: Younan Zhang <zyn7109@gmail.com>
Date:   Mon Sep 9 15:09:43 2024 +0800

    [Clang] Don't assert non-empty packs for FunctionParmPackExprs (#107561)

    `FunctionParmPackExpr`s are peculiar in that they have to be of
    unexpanded dependency while they don't introduce any unexpanded packs.
    So this patch rules them out in the non-empty pack assertion in
    `DiagnoseUnexpandedParameterPack()`.

    There was a fix #69224, but that turned out to be insufficient.

    I also moved the separate tests to a pre-existing file.

    Fixes https://github.com/llvm/llvm-project/issues/86361

commit 022b3c27e27832f27c61683095899227c26e0cca
Author: Piyou Chen <piyou.chen@sifive.com>
Date:   Mon Sep 9 15:07:39 2024 +0800

    [Clang][RISCV] Recognize unsupport target feature by supporting isValidFeatureName (#106495)

    This patch makes unsupported target attributes emit a warning and ignore
    the target attribute during semantic checks. The changes include:

    1. Adding the RISCVTargetInfo::isValidFeatureName function.
    2. Rejecting non-full-arch strings in the handleFullArchString function.
    3. Adding test cases to demonstrate the warning behavior.

commit 9347b66cfcd9acf84dbbd500b6344041c587f6a9
Author: Pierre van Houtryve <pierre.vanhoutryve@amd.com>
Date:   Mon Sep 9 09:06:34 2024 +0200

    Reland "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)" (#107076)

    Relands #104763 with
    - Fixes for EXPENSIVE_CHECKS test failure (due to sorting operator
    failing if the input is shuffled first)
     - Fix for broken proposal selection
     - c3cb27370af40e491446164840766478d3258429 included

    Original commit description below
    ---

    Major rewrite of the AMDGPUSplitModule pass in order to better support
    it long-term.

    Highlights:
    - Removal of the "SML" logging system in favor of just using CL options
    and LLVM_DEBUG, like any other pass in LLVM.
    - The SML system started from good intentions, but it was too flawed and
    messy to be of any real use. It was also a real pain to use and made the
    code more annoying to maintain.
     - Graph-based module representation with DOTGraph printing support
    - The graph represents the module accurately, with bidirectional, typed
    edges between nodes (a node usually represents one function).
    - Nodes are assigned IDs starting from 0, which allows us to represent a
    set of nodes as a BitVector. This makes comparing 2 sets of nodes to
    find common dependencies a trivial task. Merging two clusters of nodes
    together is also really trivial.
     - No more defaulting to "P0" for external calls
    - Roots that can reach non-copyable dependencies (such as external
    calls) are now grouped together in a single "cluster" that can go into
    any partition.
     - No more defaulting to "P0" for indirect calls
    - New representation for module splitting proposals that can be graded
    and compared.
    - Graph-search algorithm that can explore multiple branches/assignments
    for a cluster of functions, up to a maximum depth.
    - With the default max depth of 8, we can create up to 256 propositions
    to try and find the best one.
    - We can still fall back to a greedy approach upon reaching max depth.
    That greedy approach uses almost identical heuristics to the previous
    version of the pass.

    All of this gives us a lot of room to experiment with new heuristics or
    even entirely different splitting strategies if we need to. For
    instance, the graph representation has room for abstract nodes, e.g. if
    we need to represent some global variables or external constraints. We
    could also introduce more edge types to model other type of relations
    between nodes, etc.

    I also designed the graph representation & the splitting strategies to
    be as fast as possible, and it seems to have paid off. Some quick tests
    showed that we spend pretty much all of our time in the CloneModule
    function, with the actual splitting logic being >1% of the runtime.

commit bdcbfa7fb4ac6f23262095c401d28309d689225e
Author: LLVM GN Syncbot <llvmgnsyncbot@gmail.com>
Date:   Mon Sep 9 06:28:13 2024 +0000

    [gn build] Port a416267a5f3f

commit a416267a5f3fffb3d1e9d8d53245aef8169c5ddb
Author: Yuxuan Chen <ych@fb.com>
Date:   Sun Sep 8 23:09:40 2024 -0700

    [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (#99285)

    This patch is episode three of the middle end implementation for the
    coroutine HALO improvement project published on discourse:
    https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

    After we attribute the calls to some coroutines as "coro_elide_safe" in
    the C++ FE and creating a `noalloc` ramp function, we use a new middle
    end pass to move the call to coroutines to the noalloc variant.

    This pass should be run after CoroSplit. For each node we process in
    CoroSplit, we look for its callers and replace the attributed ones in
    presplit coroutines to the noalloc one. The transformed `noalloc` ramp
    function will also require a frame pointer to a block of memory it can
    use as an activation frame. We allocate this on the caller's frame with
    an alloca.

    Please note that we cannot safely transform such attributed calls in
    post-split coroutines due to memory lifetime reasons. The CoroSplit pass
    is responsible for creating the coroutine frame spills for all the
    allocas in the coroutine. Therefore it will be unsafe to create new
    allocas like this one in post-split coroutines. This happens relatively
    rarely because CGSCC performs the passes on the callees before the
    caller. However, if multiple coroutines coexist in one SCC, this
    situation does happen (and prevents us from having potentially unbound
    frame size due to recursion.)

    You can find episode 1: Clang FE of this patch series at
    https://github.com/llvm/llvm-project/pull/99282
    Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283

commit 234cc81625030e934651d6ae0ace66e37138ba4a
Author: Yuxuan Chen <ych@fb.com>
Date:   Sun Sep 8 23:09:20 2024 -0700

    [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (#99283)

    This patch is episode two of the coroutine HALO improvement project
    published on discourse:
    https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

    Previously CoroElide depends on inlining, and its analysis does not work
    very well with code generated by the C++ frontend due the existence of
    many customization points. There has been issue reported to upstream how
    ineffective the original CoroElide was in real world applications.

    For C++ users, this set of patches aim to fix this problem by providing
    library authors and users deterministic HALO behaviour for some
    well-behaved coroutine `Task` types. The stack begins with a library
    side attribute on the `Task` class that guarantees no unstructured
    concurrency when coroutines are awaited directly with `co_await`ed as a
    prvalue. This attribute on Task types gives us lifetime guarantees and
    makes C++ FE capable to telling the ME which coroutine calls are
    elidable. We convey such information from FE through the attribute
    `coro_elide_safe`.

    This patch modifies CoroSplit to create a variant of the coroutine ramp
    function that 1) does not use heap allocated frame, instead take an
    additional parameter as the pointer to the frame. Such parameter is
    attributed with `dereferenceble` and `align` to convey size and align
    requirements for the frame. 2) always stores cleanup instead of destroy
    address for `coro.destroy()` actions.

    In a later patch, we will have a new pass that runs right after
    CoroSplit to find usages of the callee coroutine attributed
    `coro_elide_safe` in presplit coroutine callers, allocates the frame on
    its "stack", transform those usages to call the `noalloc` ramp function
    variant.

    (note I put quotes on the word "stack" here, because for presplit
    coroutine, any alloca will be spilled into the frame when it's being
    split)

    The C++ Frontend attribute implementation that works with this change
    can be found at https://github.com/llvm/llvm-project/pull/99282
    The pass that makes use of the new `noalloc` split can be found at
    https://github.com/llvm/llvm-project/pull/99285

commit e17a39bc314f97231e440c9e68d9f46a9c07af6d
Author: Yuxuan Chen <ych@fb.com>
Date:   Sun Sep 8 23:08:58 2024 -0700

    [Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] (#99282)

    This patch is the frontend implementation of the coroutine elide
    improvement project detailed in this discourse post:
    https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

    This patch proposes a C++ struct/class attribute
    `[[clang::coro_await_elidable]]`. This notion of await elidable task
    gives developers and library authors a certainty that coroutine heap
    elision happens in a predictable way.

    Originally, after we lower a coroutine to LLVM IR, CoroElide is
    responsible for analysis of whether an elision can happen. Take this as
    an example:
    ```
    Task foo();
    Task bar() {
      co_await foo();
    }
    ```
    For CoroElide to happen, the ramp function of `foo` must be inlined into
    `bar`. This inlining happens after `foo` has been split but `bar` is
    usually still a presplit coroutine. If `foo` is indeed a coroutine, the
    inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
    then runs an analysis to figure out whether the SSA value of
    `coro.begin()` of `foo` gets destroyed before `bar` terminates.

    `Task` types are rarely simple enough for the destroy logic of the task
    to reference the SSA value from `coro.begin()` directly. Hence, the pass
    is very ineffective for even the most trivial C++ Task types. Improving
    CoroElide by implementing more powerful analyses is possible, however it
    doesn't give us the…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants