Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Add invalid 1 x vscale costs for reductions and reduction-operations. #102105

Merged
merged 2 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -541,7 +541,15 @@ static InstructionCost getHistogramCost(const IntrinsicCostAttributes &ICA) {
InstructionCost
AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind) {
// The code-generator is currently not able to handle scalable vectors
// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
// it. This change will be removed when code-generation for these types is
// sufficiently reliable.
auto *RetTy = ICA.getReturnType();
if (auto *VTy = dyn_cast<ScalableVectorType>(RetTy))
if (VTy->getElementCount() == ElementCount::getScalable(1))
return InstructionCost::getInvalid();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a similar check for vector operands (e.g. cntpop would return a scalar value, but might have a scalable vector operand).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was aiming for the things that we can produce a reduction from. I've added a tests for @llvm.experimental.cttz.elts.i64.nxv1i1, but it looks like it already produces an invalid cost before this patch. llvm.ctpop will return a vector and I couldn't see any other intrinsics that produced scalars and took vectors, which were not reductions that should be handled in another function.
We could add the extra check for the operand type too if you think that's worth-while for future-compatibility or there is another intrinsic you were thinking of.

switch (ICA.getID()) {
case Intrinsic::experimental_vector_histogram_add:
if (!ST->hasSVE2())
Expand Down Expand Up @@ -3070,6 +3078,14 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
ArrayRef<const Value *> Args,
const Instruction *CxtI) {

// The code-generator is currently not able to handle scalable vectors
// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
// it. This change will be removed when code-generation for these types is
// sufficiently reliable.
if (auto *VTy = dyn_cast<ScalableVectorType>(Ty))
if (VTy->getElementCount() == ElementCount::getScalable(1))
return InstructionCost::getInvalid();

// TODO: Handle more cost kinds.
if (CostKind != TTI::TCK_RecipThroughput)
return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
Expand Down Expand Up @@ -3844,6 +3860,14 @@ InstructionCost
AArch64TTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
FastMathFlags FMF,
TTI::TargetCostKind CostKind) {
// The code-generator is currently not able to handle scalable vectors
// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
// it. This change will be removed when code-generation for these types is
// sufficiently reliable.
if (auto *VTy = dyn_cast<ScalableVectorType>(Ty))
if (VTy->getElementCount() == ElementCount::getScalable(1))
return InstructionCost::getInvalid();

std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);

if (LT.second.getScalarType() == MVT::f16 && !ST->hasFullFP16())
Expand Down Expand Up @@ -3888,6 +3912,14 @@ InstructionCost
AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *ValTy,
std::optional<FastMathFlags> FMF,
TTI::TargetCostKind CostKind) {
// The code-generator is currently not able to handle scalable vectors
// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
// it. This change will be removed when code-generation for these types is
// sufficiently reliable.
if (auto *VTy = dyn_cast<ScalableVectorType>(ValTy))
if (VTy->getElementCount() == ElementCount::getScalable(1))
return InstructionCost::getInvalid();

if (TTI::requiresOrderedReduction(FMF)) {
if (auto *FixedVTy = dyn_cast<FixedVectorType>(ValTy)) {
InstructionCost BaseCost =
Expand Down
4 changes: 4 additions & 0 deletions llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ define void @fadd() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fadd <vscale x 4 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fadd <vscale x 8 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fadd <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V1F32 = fadd <vscale x 1 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fadd <vscale x 2 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fadd <vscale x 4 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fadd <vscale x 8 x float> undef, undef
Expand All @@ -19,6 +20,7 @@ define void @fadd() {
%V8F16 = fadd <vscale x 8 x half> undef, undef
%V16F16 = fadd <vscale x 16 x half> undef, undef

%V1F32 = fadd <vscale x 1 x float> undef, undef
%V2F32 = fadd <vscale x 2 x float> undef, undef
%V4F32 = fadd <vscale x 4 x float> undef, undef
%V8F32 = fadd <vscale x 8 x float> undef, undef
Expand All @@ -34,6 +36,7 @@ define void @fsub() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fsub <vscale x 4 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fsub <vscale x 8 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fsub <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V1F32 = fsub <vscale x 1 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fsub <vscale x 2 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fsub <vscale x 4 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fsub <vscale x 8 x float> undef, undef
Expand All @@ -45,6 +48,7 @@ define void @fsub() {
%V8F16 = fsub <vscale x 8 x half> undef, undef
%V16F16 = fsub <vscale x 16 x half> undef, undef

%V1F32 = fsub <vscale x 1 x float> undef, undef
%V2F32 = fsub <vscale x 2 x float> undef, undef
%V4F32 = fsub <vscale x 4 x float> undef, undef
%V8F32 = fsub <vscale x 8 x float> undef, undef
Expand Down
2 changes: 2 additions & 0 deletions llvm/test/Analysis/CostModel/AArch64/cttz_elts.ll
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

define void @foo_no_vscale_range() {
; CHECK-LABEL: 'foo_no_vscale_range'
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %res.i64.nxv1i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv1i1(<vscale x 1 x i1> undef, i1 true)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res.i64.nxv2i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv2i1(<vscale x 2 x i1> undef, i1 true)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res.i64.nxv4i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv4i1(<vscale x 4 x i1> undef, i1 true)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res.i64.nxv8i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv8i1(<vscale x 8 x i1> undef, i1 true)
Expand Down Expand Up @@ -45,6 +46,7 @@ define void @foo_no_vscale_range() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %res.i32.v32i1.nzip = call i32 @llvm.experimental.cttz.elts.i32.v32i1(<32 x i1> undef, i1 false)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%res.i64.nxv1i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv1i1(<vscale x 1 x i1> undef, i1 true)
%res.i64.nxv2i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv2i1(<vscale x 2 x i1> undef, i1 true)
%res.i64.nxv4i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv4i1(<vscale x 4 x i1> undef, i1 true)
%res.i64.nxv8i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv8i1(<vscale x 8 x i1> undef, i1 true)
Expand Down
21 changes: 21 additions & 0 deletions llvm/test/Analysis/CostModel/AArch64/sve-arith.ll
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,34 @@ define void @scalable_mul() #0 {
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mul_nxv8i16 = mul <vscale x 8 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mul_nxv4i32 = mul <vscale x 4 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mul_nxv2i64 = mul <vscale x 2 x i64> undef, undef
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %mul_nxv1i64 = mul <vscale x 1 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
entry:
%mul_nxv16i8 = mul <vscale x 16 x i8> undef, undef
%mul_nxv8i16 = mul <vscale x 8 x i16> undef, undef
%mul_nxv4i32 = mul <vscale x 4 x i32> undef, undef
%mul_nxv2i64 = mul <vscale x 2 x i64> undef, undef
%mul_nxv1i64 = mul <vscale x 1 x i64> undef, undef

ret void
}

define void @scalable_add() #0 {
; CHECK-LABEL: 'scalable_add'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %add_nxv16i8 = add <vscale x 16 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %add_nxv8i16 = add <vscale x 8 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %add_nxv4i32 = add <vscale x 4 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %add_nxv2i64 = add <vscale x 2 x i64> undef, undef
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %add_nxv1i64 = add <vscale x 1 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
entry:
%add_nxv16i8 = add <vscale x 16 x i8> undef, undef
%add_nxv8i16 = add <vscale x 8 x i16> undef, undef
%add_nxv4i32 = add <vscale x 4 x i32> undef, undef
%add_nxv2i64 = add <vscale x 2 x i64> undef, undef
%add_nxv1i64 = add <vscale x 1 x i64> undef, undef

ret void
}
Expand Down
Loading
Loading