Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update P7 32-bit partial vector load cost #108261

Merged
merged 5 commits into from
Oct 3, 2024
Merged

Conversation

RolandF77
Copy link
Collaborator

Update cost model to reflect codegen change to use lfiwzx from #104507.

@llvmbot
Copy link
Member

llvmbot commented Sep 11, 2024

@llvm/pr-subscribers-llvm-analysis

Author: None (RolandF77)

Changes

Update cost model to reflect codegen change to use lfiwzx from #104507.


Full diff: https://github.com/llvm/llvm-project/pull/108261.diff

2 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+10-4)
  • (modified) llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll (+5-2)
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
index b7bdbeb535d526..df0047022a2c04 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
@@ -802,12 +802,18 @@ InstructionCost PPCTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
   // explicitly check this case. There are also corresponding store
   // instructions.
   unsigned MemBytes = Src->getPrimitiveSizeInBits();
-  if (ST->hasVSX() && IsAltivecType &&
-      (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32)))
-    return 1;
+  Align AlignBytes = Alignment ? *Alignment : Align(1);
+  unsigned SrcBytes = LT.second.getStoreSize();
+  if (ST->hasVSX() && IsAltivecType) {
+    if (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32))
+      return 1;
+    // Use lfiwax/xxspltw
+    if (Opcode == Instruction::Load && MemBytes == 32)
+      if (AlignBytes < SrcBytes || Cost > 2)
+        return 2;
+  }
 
   // Aligned loads and stores are easy.
-  unsigned SrcBytes = LT.second.getStoreSize();
   if (!SrcBytes || !Alignment || *Alignment >= SrcBytes)
     return Cost;
 
diff --git a/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll b/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
index 54cafa0ae59f39..0e7e89c18c1cba 100644
--- a/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
+++ b/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
@@ -1,18 +1,21 @@
 ; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 -mattr=+vsx | FileCheck %s
+; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx | FileCheck --check-prefix=P7 %s
 target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
 target triple = "powerpc64-unknown-linux-gnu"
 
 define i32 @loads(i32 %arg) {
   ; CHECK: cost of 1 {{.*}} load
+  ; P7: cost of 2 {{.*}} load
   load <4 x i8>, ptr undef, align 1
 
-  ; CHECK: cost of 1 {{.*}} load
+  ; CHECK, P7: cost of 1 {{.*}} load
   load <8 x i8>, ptr undef, align 1
 
   ; CHECK: cost of 1 {{.*}} load
+  ; P7: cost of 2 {{.*}} load
   load <2 x i16>, ptr undef, align 2
 
-  ; CHECK: cost of 1 {{.*}} load
+  ; CHECK, P7: cost of 1 {{.*}} load
   load <4 x i16>, ptr undef, align 2
 
   ret i32 undef

@llvmbot
Copy link
Member

llvmbot commented Sep 11, 2024

@llvm/pr-subscribers-backend-powerpc

Author: None (RolandF77)

Changes

Update cost model to reflect codegen change to use lfiwzx from #104507.


Full diff: https://github.com/llvm/llvm-project/pull/108261.diff

2 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+10-4)
  • (modified) llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll (+5-2)
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
index b7bdbeb535d526..df0047022a2c04 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
@@ -802,12 +802,18 @@ InstructionCost PPCTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
   // explicitly check this case. There are also corresponding store
   // instructions.
   unsigned MemBytes = Src->getPrimitiveSizeInBits();
-  if (ST->hasVSX() && IsAltivecType &&
-      (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32)))
-    return 1;
+  Align AlignBytes = Alignment ? *Alignment : Align(1);
+  unsigned SrcBytes = LT.second.getStoreSize();
+  if (ST->hasVSX() && IsAltivecType) {
+    if (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32))
+      return 1;
+    // Use lfiwax/xxspltw
+    if (Opcode == Instruction::Load && MemBytes == 32)
+      if (AlignBytes < SrcBytes || Cost > 2)
+        return 2;
+  }
 
   // Aligned loads and stores are easy.
-  unsigned SrcBytes = LT.second.getStoreSize();
   if (!SrcBytes || !Alignment || *Alignment >= SrcBytes)
     return Cost;
 
diff --git a/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll b/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
index 54cafa0ae59f39..0e7e89c18c1cba 100644
--- a/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
+++ b/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
@@ -1,18 +1,21 @@
 ; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 -mattr=+vsx | FileCheck %s
+; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx | FileCheck --check-prefix=P7 %s
 target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
 target triple = "powerpc64-unknown-linux-gnu"
 
 define i32 @loads(i32 %arg) {
   ; CHECK: cost of 1 {{.*}} load
+  ; P7: cost of 2 {{.*}} load
   load <4 x i8>, ptr undef, align 1
 
-  ; CHECK: cost of 1 {{.*}} load
+  ; CHECK, P7: cost of 1 {{.*}} load
   load <8 x i8>, ptr undef, align 1
 
   ; CHECK: cost of 1 {{.*}} load
+  ; P7: cost of 2 {{.*}} load
   load <2 x i16>, ptr undef, align 2
 
-  ; CHECK: cost of 1 {{.*}} load
+  ; CHECK, P7: cost of 1 {{.*}} load
   load <4 x i16>, ptr undef, align 2
 
   ret i32 undef

return 1;
// Use lfiwax/xxspltw
if (Opcode == Instruction::Load && MemBytes == 32)
if (AlignBytes < SrcBytes || Cost > 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic if(Cost > 2) return 2; seems not perfect. Can we exclude the case at the place where Cost is set to be bigger than 2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cost check removed. It was left over from an earlier version of the change.

if (ST->hasVSX() && IsAltivecType &&
(MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32)))
return 1;
Align AlignBytes = Alignment ? *Alignment : Align(1);
Copy link
Contributor

@diggerlin diggerlin Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can put the definition of the variable before the
if (Opcode == Instruction::Load && MemBytes == 32 && AlignBytes < SrcBytes) )

target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

define i32 @loads(i32 %arg) {
; CHECK: cost of 1 {{.*}} load
; P7: cost of 2 {{.*}} load
Copy link
Contributor

@diggerlin diggerlin Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change to

opt < %s -passes="print" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 -mattr=+vsx | FileCheck %s -DCOST=1

opt < %s -passes="print" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx | FileCheck %s -DCOST=2

add change

 ; CHECK: cost of 1 {{.*}} load
  ; P7: cost of 2 {{.*}} load

to
CHECK: cost of [[COST]] {{.*}} load

it is only a suggestion, feel free to keep it if you do not want to modify

return 1;
unsigned SrcBytes = LT.second.getStoreSize();
if (ST->hasVSX() && IsAltivecType) {
if (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32))
Copy link
Contributor

@diggerlin diggerlin Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this maybe not related to the patch,

I think the variable MemBytes should be MemBits , otherwise it maybe cause mis_understand (it looks like compare with 64bytes and 32bytes)

if (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32))

I think we can modify the variable name in the patch by the way?


// Use lfiwax/xxspltw
Align AlignBytes = Alignment ? *Alignment : Align(1);
if (Opcode == Instruction::Load && MemBytes == 32 && AlignBytes < SrcBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious that why need the AlignBytes < SrcBytes here ?

Copy link
Collaborator Author

@RolandF77 RolandF77 Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a partial vector (< 128 bits) is being loaded with a full vector aligned address (>= 128 bits), the load will be done as a full vector load since we know from alignment that it is safe. Therefore the cost of a partial vector load does not apply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for explaining.

Copy link
Contributor

@diggerlin diggerlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have further comment on it. thanks for addressing the comments

Copy link
Contributor

@amy-kwan amy-kwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall LGTM. I have one question I was wondering about.

Is it more accurate to update the title so that it doesn't mention v4i8 specifically? For example, doesn't v2i16 also count?

@RolandF77 RolandF77 changed the title update P7 v4i8 load cost update P7 32-bit partial vector load cost Oct 2, 2024
@RolandF77 RolandF77 merged commit 06c8210 into llvm:main Oct 3, 2024
8 checks passed
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
Update cost model to reflect codegen change to use lfiwzx 
for 32-bit partial vector loads on pwr7 with
llvm#104507.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants