[RISCV][TTI] Cost non-power-of-two size changing casts #101047

preames · 2024-07-29T17:45:40Z

For a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type.

Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types.

This is analogous to d3fd28a, but with the same reasoning applied to casts instead.

For a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type. Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types. This is analogous to d3fd28a, but with the same reasoning applied to casts instead.

llvmbot · 2024-07-29T17:46:11Z

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

For a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type.

Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types.

This is analogous to d3fd28a, but with the same reasoning applied to casts instead.

Full diff: https://github.com/llvm/llvm-project/pull/101047.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+17-6)
(modified) llvm/test/Analysis/CostModel/RISCV/cast.ll (+15-15)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index a61a9f10be86c..b91388aa213d8 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1030,17 +1030,28 @@ InstructionCost RISCVTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
   if (!IsVectorType)
     return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
 
-  bool IsTypeLegal = isTypeLegal(Src) && isTypeLegal(Dst) &&
-                     (Src->getScalarSizeInBits() <= ST->getELen()) &&
-                     (Dst->getScalarSizeInBits() <= ST->getELen());
-
-  // FIXME: Need to compute legalizing cost for illegal types.
-  if (!IsTypeLegal)
+  // FIXME: Need to compute legalizing cost for illegal types.  The current
+  // code handles only legal types and those which can be trivially
+  // promoted to legal.
+  if (Src->getScalarSizeInBits() > ST->getELen() ||
+      Dst->getScalarSizeInBits() > ST->getELen())
     return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
 
   std::pair<InstructionCost, MVT> SrcLT = getTypeLegalizationCost(Src);
   std::pair<InstructionCost, MVT> DstLT = getTypeLegalizationCost(Dst);
 
+  // Our actual lowering for the case where a wider legal type is available
+  // uses promotion to the wider type.  This is reflected in the result of
+  // getTypeLegalizationCost, but BasicTTI assumes the widened cases are
+  // scalarized if the legalized Src and Dst are not equal sized.
+  const DataLayout &DL = this->getDataLayout();
+  if (!SrcLT.second.isVector() || !DstLT.second.isVector() ||
+      !TypeSize::isKnownLE(DL.getTypeSizeInBits(Src),
+                           SrcLT.second.getSizeInBits()) ||
+      !TypeSize::isKnownLE(DL.getTypeSizeInBits(Dst),
+                           DstLT.second.getSizeInBits()))
+    return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
+
   int ISD = TLI->InstructionOpcodeToISD(Opcode);
   assert(ISD && "Invalid opcode");
 
diff --git a/llvm/test/Analysis/CostModel/RISCV/cast.ll b/llvm/test/Analysis/CostModel/RISCV/cast.ll
index 669e7028ff54d..68e633d6f0550 100644
--- a/llvm/test/Analysis/CostModel/RISCV/cast.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/cast.ll
@@ -4317,24 +4317,24 @@ define void @uitofp() {
 
 define void @oddvec_sizes() {
 ; CHECK-LABEL: 'oddvec_sizes'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 30 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %10 = bitcast <3 x i32> undef to <3 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %11 = bitcast <7 x i32> undef to <7 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %12 = bitcast <15 x i32> undef to <15 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %19 = fptosi <3 x float> undef to <3 x i32>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %20 = fptosi <7 x float> undef to <7 x i32>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %21 = fptosi <15 x float> undef to <15 x i32>

llvmbot · 2024-07-29T17:46:11Z

@llvm/pr-subscribers-llvm-analysis

Author: Philip Reames (preames)

Changes

For a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type.

Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types.

This is analogous to d3fd28a, but with the same reasoning applied to casts instead.

Full diff: https://github.com/llvm/llvm-project/pull/101047.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+17-6)
(modified) llvm/test/Analysis/CostModel/RISCV/cast.ll (+15-15)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index a61a9f10be86c..b91388aa213d8 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1030,17 +1030,28 @@ InstructionCost RISCVTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
   if (!IsVectorType)
     return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
 
-  bool IsTypeLegal = isTypeLegal(Src) && isTypeLegal(Dst) &&
-                     (Src->getScalarSizeInBits() <= ST->getELen()) &&
-                     (Dst->getScalarSizeInBits() <= ST->getELen());
-
-  // FIXME: Need to compute legalizing cost for illegal types.
-  if (!IsTypeLegal)
+  // FIXME: Need to compute legalizing cost for illegal types.  The current
+  // code handles only legal types and those which can be trivially
+  // promoted to legal.
+  if (Src->getScalarSizeInBits() > ST->getELen() ||
+      Dst->getScalarSizeInBits() > ST->getELen())
     return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
 
   std::pair<InstructionCost, MVT> SrcLT = getTypeLegalizationCost(Src);
   std::pair<InstructionCost, MVT> DstLT = getTypeLegalizationCost(Dst);
 
+  // Our actual lowering for the case where a wider legal type is available
+  // uses promotion to the wider type.  This is reflected in the result of
+  // getTypeLegalizationCost, but BasicTTI assumes the widened cases are
+  // scalarized if the legalized Src and Dst are not equal sized.
+  const DataLayout &DL = this->getDataLayout();
+  if (!SrcLT.second.isVector() || !DstLT.second.isVector() ||
+      !TypeSize::isKnownLE(DL.getTypeSizeInBits(Src),
+                           SrcLT.second.getSizeInBits()) ||
+      !TypeSize::isKnownLE(DL.getTypeSizeInBits(Dst),
+                           DstLT.second.getSizeInBits()))
+    return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
+
   int ISD = TLI->InstructionOpcodeToISD(Opcode);
   assert(ISD && "Invalid opcode");
 
diff --git a/llvm/test/Analysis/CostModel/RISCV/cast.ll b/llvm/test/Analysis/CostModel/RISCV/cast.ll
index 669e7028ff54d..68e633d6f0550 100644
--- a/llvm/test/Analysis/CostModel/RISCV/cast.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/cast.ll
@@ -4317,24 +4317,24 @@ define void @uitofp() {
 
 define void @oddvec_sizes() {
 ; CHECK-LABEL: 'oddvec_sizes'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 30 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %10 = bitcast <3 x i32> undef to <3 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %11 = bitcast <7 x i32> undef to <7 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %12 = bitcast <15 x i32> undef to <15 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %19 = fptosi <3 x float> undef to <3 x i32>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %20 = fptosi <7 x float> undef to <7 x i32>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %21 = fptosi <15 x float> undef to <15 x i32>

preames · 2024-07-29T17:47:35Z

llvm/test/Analysis/CostModel/RISCV/cast.ll

+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>


This exposes a bug in the base costing for truncs - we're using the destination LMUL where we should be using the source LMUL (since it's wider). Will fix that separately. (Edit, see #101051)

For the fp<->int cases a few lines below, that's the known lack of LMUL in the cost model. See #87506

arcbbb · 2024-07-30T02:16:43Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+  if (Src->getScalarSizeInBits() > ST->getELen() ||
+      Dst->getScalarSizeInBits() > ST->getELen())


need ST->hasVInstructions() before calling ST->getELen

Done, thanks.

arcbbb

LGTM. Thanks!

arcbbb · 2024-08-07T02:01:49Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+  const DataLayout &DL = this->getDataLayout();
+  if (!SrcLT.second.isVector() || !DstLT.second.isVector() ||
+      !TypeSize::isKnownLE(DL.getTypeSizeInBits(Src),
+                           SrcLT.second.getSizeInBits()) ||
+      !TypeSize::isKnownLE(DL.getTypeSizeInBits(Dst),
+                           DstLT.second.getSizeInBits()))
+    return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
+


Before the patch, the split cost is done by BasicTTIImplBase::getCastInstrCost.
Could we add a check SrcLT.first > 1 || DstLT.first > 1 here to keep the split cost since isTypeLegal() is removed?

The size checks should catch this case. The only way it wouldn't would be if we split to larger types; that would be quite weird.

preames requested review from arcbbb, topperc and luke957 July 29, 2024 17:45

llvmbot added backend:RISC-V llvm:analysis labels Jul 29, 2024

preames commented Jul 29, 2024

View reviewed changes

arcbbb reviewed Jul 30, 2024

View reviewed changes

Address review comment

facafef

arcbbb approved these changes Jul 31, 2024

View reviewed changes

arcbbb mentioned this pull request Aug 6, 2024

[RISCV][TTI] Scale the cost of FP-Int conversion with LMUL #87506

Merged

arcbbb reviewed Aug 7, 2024

View reviewed changes

preames merged commit ac6e1fd into llvm:main Aug 13, 2024
8 checks passed

preames deleted the pr-riscv-tti-cast-promote-cost branch August 13, 2024 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV][TTI] Cost non-power-of-two size changing casts #101047

[RISCV][TTI] Cost non-power-of-two size changing casts #101047

preames commented Jul 29, 2024

llvmbot commented Jul 29, 2024

llvmbot commented Jul 29, 2024

preames Jul 29, 2024 •

edited

Loading

arcbbb Jul 30, 2024

preames Jul 30, 2024

arcbbb left a comment

arcbbb Aug 7, 2024

preames Aug 13, 2024

		if (Src->getScalarSizeInBits() > ST->getELen() \|\|
		Dst->getScalarSizeInBits() > ST->getELen())

[RISCV][TTI] Cost non-power-of-two size changing casts #101047

[RISCV][TTI] Cost non-power-of-two size changing casts #101047

Conversation

preames commented Jul 29, 2024

llvmbot commented Jul 29, 2024

llvmbot commented Jul 29, 2024

preames Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

arcbbb Jul 30, 2024

Choose a reason for hiding this comment

preames Jul 30, 2024

Choose a reason for hiding this comment

arcbbb left a comment

Choose a reason for hiding this comment

arcbbb Aug 7, 2024

Choose a reason for hiding this comment

preames Aug 13, 2024

Choose a reason for hiding this comment

preames Jul 29, 2024 •

edited

Loading