DAG: Simplify demanded bits for truncating atomic_store #90113

arsenm · 2024-04-25T19:56:21Z

It's really unfortunate that STORE and ATOMIC_STORE are separate opcodes. This duplicates a basic simplify demanded for the truncating case. This avoids some AMDGPU lit regressions in a future patch.

I'm not sure how to craft a test that exposes this without first introducing the regressions by promoting half to i16.

It's really unfortunate that STORE and ATOMIC_STORE are separate opcodes. This duplicates a basic simplify demanded for the truncating case. This avoids some AMDGPU lit regressions in a future patch. I'm not sure how to craft a test that exposes this without first introducing the regressions by promoting half to i16.

llvmbot · 2024-04-25T19:56:40Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Matt Arsenault (arsenm)

Changes

It's really unfortunate that STORE and ATOMIC_STORE are separate opcodes. This duplicates a basic simplify demanded for the truncating case. This avoids some AMDGPU lit regressions in a future patch.

I'm not sure how to craft a test that exposes this without first introducing the regressions by promoting half to i16.

Full diff: https://github.com/llvm/llvm-project/pull/90113.diff

1 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+20)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index aa746f1c7b7b3b..f115a39a6953ce 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -530,6 +530,7 @@ namespace {
     bool refineExtractVectorEltIntoMultipleNarrowExtractVectorElts(SDNode *N);
 
     SDValue visitSTORE(SDNode *N);
+    SDValue visitATOMIC_STORE(SDNode *N);
     SDValue visitLIFETIME_END(SDNode *N);
     SDValue visitINSERT_VECTOR_ELT(SDNode *N);
     SDValue visitEXTRACT_VECTOR_ELT(SDNode *N);
@@ -1909,6 +1910,7 @@ SDValue DAGCombiner::visit(SDNode *N) {
   case ISD::BR_CC:              return visitBR_CC(N);
   case ISD::LOAD:               return visitLOAD(N);
   case ISD::STORE:              return visitSTORE(N);
+  case ISD::ATOMIC_STORE:       return visitATOMIC_STORE(N);
   case ISD::INSERT_VECTOR_ELT:  return visitINSERT_VECTOR_ELT(N);
   case ISD::EXTRACT_VECTOR_ELT: return visitEXTRACT_VECTOR_ELT(N);
   case ISD::BUILD_VECTOR:       return visitBUILD_VECTOR(N);
@@ -21096,6 +21098,24 @@ SDValue DAGCombiner::replaceStoreOfInsertLoad(StoreSDNode *ST) {
                       ST->getMemOperand()->getFlags());
 }
 
+SDValue DAGCombiner::visitATOMIC_STORE(SDNode *N) {
+  AtomicSDNode *ST = cast<AtomicSDNode>(N);
+  SDValue Val = ST->getVal();
+  EVT VT = Val.getValueType();
+  EVT MemVT = ST->getMemoryVT();
+
+  if (MemVT.bitsLT(VT)) { // Is truncating store
+    APInt TruncDemandedBits = APInt::getLowBitsSet(VT.getScalarSizeInBits(),
+                                                   MemVT.getScalarSizeInBits());
+    // See if we can simplify the operation with SimplifyDemandedBits, which
+    // only works if the value has a single use.
+    if (SimplifyDemandedBits(Val, TruncDemandedBits))
+      return SDValue(N, 0);
+  }
+
+  return SDValue();
+}
+
 SDValue DAGCombiner::visitSTORE(SDNode *N) {
   StoreSDNode *ST  = cast<StoreSDNode>(N);
   SDValue Chain = ST->getChain();

jayfoad

LGTM

jayfoad · 2024-04-26T09:02:06Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+                                                   MemVT.getScalarSizeInBits());
+    // See if we can simplify the operation with SimplifyDemandedBits, which
+    // only works if the value has a single use.
+    if (SimplifyDemandedBits(Val, TruncDemandedBits))


Do you need any of the worklist fiddling that the corresponding code in visitSTORE does?

The comments there suggest it's due to merge optimizations triggering, which I assumed don't apply for the atomic case. It's probably not important to revisit these aggressively in any case

Implement the promotion in the DAG. Depends #90113

arsenm added the llvm:SelectionDAG SelectionDAGISel as well label Apr 25, 2024

arsenm requested review from jyknight, preames, RKSimon, topperc and Pierre-vh April 25, 2024 19:56

arsenm mentioned this pull request Apr 25, 2024

AMDGPU: Don't bitcast float typed atomic store in IR #90116

Merged

jayfoad approved these changes Apr 26, 2024

View reviewed changes

arsenm merged commit 405c018 into llvm:main Apr 26, 2024
5 of 6 checks passed

arsenm deleted the dag-simplify-demanded-atomic-store-trunc branch April 26, 2024 13:21

This was referenced Apr 29, 2024

main #90439

Closed

[AArch64] Add support for Cortex-R82AE and improve Cortex-R82 #90440

Merged

arsenm added a commit that referenced this pull request May 7, 2024

AMDGPU: Don't bitcast float typed atomic store in IR (#90116)

82bb253

Implement the promotion in the DAG. Depends #90113

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAG: Simplify demanded bits for truncating atomic_store #90113

DAG: Simplify demanded bits for truncating atomic_store #90113

arsenm commented Apr 25, 2024

llvmbot commented Apr 25, 2024

jayfoad left a comment

jayfoad Apr 26, 2024

arsenm Apr 26, 2024

DAG: Simplify demanded bits for truncating atomic_store #90113

DAG: Simplify demanded bits for truncating atomic_store #90113

Conversation

arsenm commented Apr 25, 2024

llvmbot commented Apr 25, 2024

jayfoad left a comment

Choose a reason for hiding this comment

jayfoad Apr 26, 2024

Choose a reason for hiding this comment

arsenm Apr 26, 2024

Choose a reason for hiding this comment