-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV][TTI] Cost non-power-of-two size changing casts #101047
Conversation
For a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type. Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types. This is analogous to d3fd28a, but with the same reasoning applied to casts instead.
@llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesFor a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type. Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types. This is analogous to d3fd28a, but with the same reasoning applied to casts instead. Full diff: https://github.com/llvm/llvm-project/pull/101047.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index a61a9f10be86c..b91388aa213d8 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1030,17 +1030,28 @@ InstructionCost RISCVTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
if (!IsVectorType)
return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
- bool IsTypeLegal = isTypeLegal(Src) && isTypeLegal(Dst) &&
- (Src->getScalarSizeInBits() <= ST->getELen()) &&
- (Dst->getScalarSizeInBits() <= ST->getELen());
-
- // FIXME: Need to compute legalizing cost for illegal types.
- if (!IsTypeLegal)
+ // FIXME: Need to compute legalizing cost for illegal types. The current
+ // code handles only legal types and those which can be trivially
+ // promoted to legal.
+ if (Src->getScalarSizeInBits() > ST->getELen() ||
+ Dst->getScalarSizeInBits() > ST->getELen())
return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
std::pair<InstructionCost, MVT> SrcLT = getTypeLegalizationCost(Src);
std::pair<InstructionCost, MVT> DstLT = getTypeLegalizationCost(Dst);
+ // Our actual lowering for the case where a wider legal type is available
+ // uses promotion to the wider type. This is reflected in the result of
+ // getTypeLegalizationCost, but BasicTTI assumes the widened cases are
+ // scalarized if the legalized Src and Dst are not equal sized.
+ const DataLayout &DL = this->getDataLayout();
+ if (!SrcLT.second.isVector() || !DstLT.second.isVector() ||
+ !TypeSize::isKnownLE(DL.getTypeSizeInBits(Src),
+ SrcLT.second.getSizeInBits()) ||
+ !TypeSize::isKnownLE(DL.getTypeSizeInBits(Dst),
+ DstLT.second.getSizeInBits()))
+ return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
+
int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");
diff --git a/llvm/test/Analysis/CostModel/RISCV/cast.ll b/llvm/test/Analysis/CostModel/RISCV/cast.ll
index 669e7028ff54d..68e633d6f0550 100644
--- a/llvm/test/Analysis/CostModel/RISCV/cast.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/cast.ll
@@ -4317,24 +4317,24 @@ define void @uitofp() {
define void @oddvec_sizes() {
; CHECK-LABEL: 'oddvec_sizes'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %10 = bitcast <3 x i32> undef to <3 x float>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %11 = bitcast <7 x i32> undef to <7 x float>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %12 = bitcast <15 x i32> undef to <15 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %19 = fptosi <3 x float> undef to <3 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %20 = fptosi <7 x float> undef to <7 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %21 = fptosi <15 x float> undef to <15 x i32>
|
@llvm/pr-subscribers-llvm-analysis Author: Philip Reames (preames) ChangesFor a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type. Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types. This is analogous to d3fd28a, but with the same reasoning applied to casts instead. Full diff: https://github.com/llvm/llvm-project/pull/101047.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index a61a9f10be86c..b91388aa213d8 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1030,17 +1030,28 @@ InstructionCost RISCVTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
if (!IsVectorType)
return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
- bool IsTypeLegal = isTypeLegal(Src) && isTypeLegal(Dst) &&
- (Src->getScalarSizeInBits() <= ST->getELen()) &&
- (Dst->getScalarSizeInBits() <= ST->getELen());
-
- // FIXME: Need to compute legalizing cost for illegal types.
- if (!IsTypeLegal)
+ // FIXME: Need to compute legalizing cost for illegal types. The current
+ // code handles only legal types and those which can be trivially
+ // promoted to legal.
+ if (Src->getScalarSizeInBits() > ST->getELen() ||
+ Dst->getScalarSizeInBits() > ST->getELen())
return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
std::pair<InstructionCost, MVT> SrcLT = getTypeLegalizationCost(Src);
std::pair<InstructionCost, MVT> DstLT = getTypeLegalizationCost(Dst);
+ // Our actual lowering for the case where a wider legal type is available
+ // uses promotion to the wider type. This is reflected in the result of
+ // getTypeLegalizationCost, but BasicTTI assumes the widened cases are
+ // scalarized if the legalized Src and Dst are not equal sized.
+ const DataLayout &DL = this->getDataLayout();
+ if (!SrcLT.second.isVector() || !DstLT.second.isVector() ||
+ !TypeSize::isKnownLE(DL.getTypeSizeInBits(Src),
+ SrcLT.second.getSizeInBits()) ||
+ !TypeSize::isKnownLE(DL.getTypeSizeInBits(Dst),
+ DstLT.second.getSizeInBits()))
+ return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
+
int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");
diff --git a/llvm/test/Analysis/CostModel/RISCV/cast.ll b/llvm/test/Analysis/CostModel/RISCV/cast.ll
index 669e7028ff54d..68e633d6f0550 100644
--- a/llvm/test/Analysis/CostModel/RISCV/cast.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/cast.ll
@@ -4317,24 +4317,24 @@ define void @uitofp() {
define void @oddvec_sizes() {
; CHECK-LABEL: 'oddvec_sizes'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = sext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = sext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %3 = sext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = zext <3 x i8> undef to <3 x i16>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = zext <7 x i8> undef to <7 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %6 = zext <15 x i8> undef to <15 x i32>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = trunc <3 x i32> undef to <3 x i8>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = trunc <7 x i32> undef to <7 x i8>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %9 = trunc <15 x i32> undef to <15 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %10 = bitcast <3 x i32> undef to <3 x float>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %11 = bitcast <7 x i32> undef to <7 x float>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %12 = bitcast <15 x i32> undef to <15 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %13 = sitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %14 = sitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %15 = sitofp <15 x i32> undef to <15 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %16 = uitofp <3 x i32> undef to <3 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %17 = uitofp <7 x i32> undef to <7 x float>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %18 = uitofp <15 x i32> undef to <15 x float>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %19 = fptosi <3 x float> undef to <3 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %20 = fptosi <7 x float> undef to <7 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %21 = fptosi <15 x float> undef to <15 x i32>
|
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %6 = zext <15 x i8> undef to <15 x i32> | ||
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = trunc <3 x i32> undef to <3 x i8> | ||
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = trunc <7 x i32> undef to <7 x i8> | ||
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %9 = trunc <15 x i32> undef to <15 x i8> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Src->getScalarSizeInBits() > ST->getELen() || | ||
Dst->getScalarSizeInBits() > ST->getELen()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need ST->hasVInstructions()
before calling ST->getELen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
const DataLayout &DL = this->getDataLayout(); | ||
if (!SrcLT.second.isVector() || !DstLT.second.isVector() || | ||
!TypeSize::isKnownLE(DL.getTypeSizeInBits(Src), | ||
SrcLT.second.getSizeInBits()) || | ||
!TypeSize::isKnownLE(DL.getTypeSizeInBits(Dst), | ||
DstLT.second.getSizeInBits())) | ||
return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the patch, the split cost is done by BasicTTIImplBase::getCastInstrCost.
Could we add a check SrcLT.first > 1 || DstLT.first > 1
here to keep the split cost since isTypeLegal() is removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size checks should catch this case. The only way it wouldn't would be if we split to larger types; that would be quite weird.
For a cast with src and destination size being unequal, we were costing the cast as if it were being scalarized, when in fact we can often promote such cases to a wider legal type.
Note that for casts with equal size (i.e. bitcast, some fp<->i, and ptrtoint) the generic logic in BasicTTI already assumed promotion. It just doesn't handle the cast where source and destination are both promoted to non-equal types.
This is analogous to d3fd28a, but with the same reasoning applied to casts instead.