Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV][TTI] Cost non-power-of-two size changing casts #101047

Merged
merged 2 commits into from
Aug 13, 2024

Conversation

preames
Copy link
Collaborator

@preames preames commented Jul 29, 2024

For a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type.

Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types.

This is analogous to d3fd28a, but with the same reasoning applied to casts instead.

For a cast with src and destination size being unequal, we were
costing the cast as if it were being scalarized, when in fact we
can often promote such cases to a wider legal type.

Note that for casts with equal size (i.e. bitcast, some fp<->i,
and ptrtoint) the generic logic in BasicTTI already assumed
promotion.  It just doesn't handle the cast where source and
destination are both promoted to non-equal types.

This is analogous to d3fd28a, but with the same reasoning applied
to casts instead.
@llvmbot
Copy link
Member

llvmbot commented Jul 29, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

For a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type.

Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types.

This is analogous to d3fd28a, but with the same reasoning applied to casts instead.


Full diff: https://github.com/llvm/llvm-project/pull/101047.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+17-6)
  • (modified) llvm/test/Analysis/CostModel/RISCV/cast.ll (+15-15)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index a61a9f10be86c..b91388aa213d8 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1030,17 +1030,28 @@ InstructionCost RISCVTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
   if (!IsVectorType)
     return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
 
-  bool IsTypeLegal = isTypeLegal(Src) && isTypeLegal(Dst) &&
-                     (Src->getScalarSizeInBits() <= ST->getELen()) &&
-                     (Dst->getScalarSizeInBits() <= ST->getELen());
-
-  // FIXME: Need to compute legalizing cost for illegal types.
-  if (!IsTypeLegal)
+  // FIXME: Need to compute legalizing cost for illegal types.  The current
+  // code handles only legal types and those which can be trivially
+  // promoted to legal.
+  if (Src->getScalarSizeInBits() > ST->getELen() ||
+      Dst->getScalarSizeInBits() > ST->getELen())
     return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
 
   std::pair<InstructionCost, MVT> SrcLT = getTypeLegalizationCost(Src);
   std::pair<InstructionCost, MVT> DstLT = getTypeLegalizationCost(Dst);
 
+  // Our actual lowering for the case where a wider legal type is available
+  // uses promotion to the wider type.  This is reflected in the result of
+  // getTypeLegalizationCost, but BasicTTI assumes the widened cases are
+  // scalarized if the legalized Src and Dst are not equal sized.
+  const DataLayout &DL = this->getDataLayout();
+  if (!SrcLT.second.isVector() || !DstLT.second.isVector() ||
+      !TypeSize::isKnownLE(DL.getTypeSizeInBits(Src),
+                           SrcLT.second.getSizeInBits()) ||
+      !TypeSize::isKnownLE(DL.getTypeSizeInBits(Dst),
+                           DstLT.second.getSizeInBits()))
+    return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
+
   int ISD = TLI->InstructionOpcodeToISD(Opcode);
   assert(ISD && "Invalid opcode");
 
diff --git a/llvm/test/Analysis/CostModel/RISCV/cast.ll b/llvm/test/Analysis/CostModel/RISCV/cast.ll
index 669e7028ff54d..68e633d6f0550 100644
--- a/llvm/test/Analysis/CostModel/RISCV/cast.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/cast.ll
@@ -4317,24 +4317,24 @@ define void @uitofp() {
 
 define void @oddvec_sizes() {
 ; CHECK-LABEL: 'oddvec_sizes'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 30 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %10 = bitcast <3 x i32> undef to <3 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %11 = bitcast <7 x i32> undef to <7 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %12 = bitcast <15 x i32> undef to <15 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %19 = fptosi <3 x float> undef to <3 x i32>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %20 = fptosi <7 x float> undef to <7 x i32>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %21 = fptosi <15 x float> undef to <15 x i32>

@llvmbot
Copy link
Member

llvmbot commented Jul 29, 2024

@llvm/pr-subscribers-llvm-analysis

Author: Philip Reames (preames)

Changes

For a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type.

Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types.

This is analogous to d3fd28a, but with the same reasoning applied to casts instead.


Full diff: https://github.com/llvm/llvm-project/pull/101047.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+17-6)
  • (modified) llvm/test/Analysis/CostModel/RISCV/cast.ll (+15-15)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index a61a9f10be86c..b91388aa213d8 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1030,17 +1030,28 @@ InstructionCost RISCVTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
   if (!IsVectorType)
     return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
 
-  bool IsTypeLegal = isTypeLegal(Src) && isTypeLegal(Dst) &&
-                     (Src->getScalarSizeInBits() <= ST->getELen()) &&
-                     (Dst->getScalarSizeInBits() <= ST->getELen());
-
-  // FIXME: Need to compute legalizing cost for illegal types.
-  if (!IsTypeLegal)
+  // FIXME: Need to compute legalizing cost for illegal types.  The current
+  // code handles only legal types and those which can be trivially
+  // promoted to legal.
+  if (Src->getScalarSizeInBits() > ST->getELen() ||
+      Dst->getScalarSizeInBits() > ST->getELen())
     return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
 
   std::pair<InstructionCost, MVT> SrcLT = getTypeLegalizationCost(Src);
   std::pair<InstructionCost, MVT> DstLT = getTypeLegalizationCost(Dst);
 
+  // Our actual lowering for the case where a wider legal type is available
+  // uses promotion to the wider type.  This is reflected in the result of
+  // getTypeLegalizationCost, but BasicTTI assumes the widened cases are
+  // scalarized if the legalized Src and Dst are not equal sized.
+  const DataLayout &DL = this->getDataLayout();
+  if (!SrcLT.second.isVector() || !DstLT.second.isVector() ||
+      !TypeSize::isKnownLE(DL.getTypeSizeInBits(Src),
+                           SrcLT.second.getSizeInBits()) ||
+      !TypeSize::isKnownLE(DL.getTypeSizeInBits(Dst),
+                           DstLT.second.getSizeInBits()))
+    return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
+
   int ISD = TLI->InstructionOpcodeToISD(Opcode);
   assert(ISD && "Invalid opcode");
 
diff --git a/llvm/test/Analysis/CostModel/RISCV/cast.ll b/llvm/test/Analysis/CostModel/RISCV/cast.ll
index 669e7028ff54d..68e633d6f0550 100644
--- a/llvm/test/Analysis/CostModel/RISCV/cast.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/cast.ll
@@ -4317,24 +4317,24 @@ define void @uitofp() {
 
 define void @oddvec_sizes() {
 ; CHECK-LABEL: 'oddvec_sizes'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 30 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %10 = bitcast <3 x i32> undef to <3 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %11 = bitcast <7 x i32> undef to <7 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %12 = bitcast <15 x i32> undef to <15 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 21 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 45 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %19 = fptosi <3 x float> undef to <3 x i32>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %20 = fptosi <7 x float> undef to <7 x i32>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %21 = fptosi <15 x float> undef to <15 x i32>

; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
Copy link
Collaborator Author

@preames preames Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exposes a bug in the base costing for truncs - we're using the destination LMUL where we should be using the source LMUL (since it's wider). Will fix that separately. (Edit, see #101051)

For the fp<->int cases a few lines below, that's the known lack of LMUL in the cost model. See #87506

Comment on lines 1036 to 1037
if (Src->getScalarSizeInBits() > ST->getELen() ||
Dst->getScalarSizeInBits() > ST->getELen())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need ST->hasVInstructions() before calling ST->getELen

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

Copy link
Contributor

@arcbbb arcbbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Comment on lines +1047 to +1054
const DataLayout &DL = this->getDataLayout();
if (!SrcLT.second.isVector() || !DstLT.second.isVector() ||
!TypeSize::isKnownLE(DL.getTypeSizeInBits(Src),
SrcLT.second.getSizeInBits()) ||
!TypeSize::isKnownLE(DL.getTypeSizeInBits(Dst),
DstLT.second.getSizeInBits()))
return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the patch, the split cost is done by BasicTTIImplBase::getCastInstrCost.
Could we add a check SrcLT.first > 1 || DstLT.first > 1 here to keep the split cost since isTypeLegal() is removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size checks should catch this case. The only way it wouldn't would be if we split to larger types; that would be quite weird.

@preames preames merged commit ac6e1fd into llvm:main Aug 13, 2024
8 checks passed
@preames preames deleted the pr-riscv-tti-cast-promote-cost branch August 13, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants