[AMDGPU][PromoteAlloca] Support memsets to ptr allocas #80678

Pierre-vh · 2024-02-05T13:15:22Z

Fixes #80366

Fixes llvm#80366

llvmbot · 2024-02-05T13:15:52Z

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Fixes #80366

Full diff: https://github.com/llvm/llvm-project/pull/80678.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+12-4)
(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll (+12)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index 5e73411cae9b70..c1b244f50d93f8 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -521,10 +521,18 @@ static Value *promoteAllocaUserToVector(
       // For memset, we don't need to know the previous value because we
       // currently only allow memsets that cover the whole alloca.
       Value *Elt = MSI->getOperand(1);
-      if (DL.getTypeStoreSize(VecEltTy) > 1) {
-        Value *EltBytes =
-            Builder.CreateVectorSplat(DL.getTypeStoreSize(VecEltTy), Elt);
-        Elt = Builder.CreateBitCast(EltBytes, VecEltTy);
+      const unsigned BytesPerElt = DL.getTypeStoreSize(VecEltTy);
+      if (BytesPerElt > 1) {
+        Value *EltBytes = Builder.CreateVectorSplat(BytesPerElt, Elt);
+
+        // If the element type of the vector is a pointer, we need to first cast
+        // to an integer, then use a PtrCast.
+        if (VecEltTy->isPointerTy()) {
+          Type *PtrInt = Builder.getIntNTy(BytesPerElt * 8);
+          Elt = Builder.CreateBitCast(EltBytes, PtrInt);
+          Elt = Builder.CreateIntToPtr(Elt, VecEltTy);
+        } else
+          Elt = Builder.CreateBitCast(EltBytes, VecEltTy);
       }
 
       return Builder.CreateVectorSplat(VectorTy->getElementCount(), Elt);
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll
index 15af1f17e230ec..829e7a1b84e90c 100644
--- a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll
@@ -84,4 +84,16 @@ entry:
   ret void
 }
 
+define amdgpu_kernel void @memset_ptr_alloca(ptr %out) {
+; CHECK-LABEL: @memset_ptr_alloca(
+; CHECK-NEXT:    store i64 0, ptr [[OUT:%.*]], align 8
+; CHECK-NEXT:    ret void
+;
+  %alloca = alloca [6 x ptr], align 16, addrspace(5)
+  call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false)
+  %load = load i64, ptr addrspace(5) %alloca
+  store i64 %load, ptr %out
+  ret void
+}
+
 declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture writeonly, i8, i64, i1 immarg)

llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll

arsenm · 2024-02-05T13:16:20Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

+
+        // If the element type of the vector is a pointer, we need to first cast
+        // to an integer, then use a PtrCast.
+        if (VecEltTy->isPointerTy()) {


Will fail the same way for vector of pointers

What do you mean vector of pointers?

this only looks at the element type of the vector type we're promoting to, which is always a primitive

I mean [6 x <2 x ptr>]

added the test as well, that isn't promoted either.

On a side note, PromoteAlloca needs some love still. I'm wondering if it's worth the effort to support things like nested arrays. I'd assume we'd hit the upper limit very fast with X * Y elements but maybe not?

It's still not trying to intelligently choose which allocas are the most profitable to promote

@mariusz-sikora-at-amd I'm not sure, no strong opinion. I was thinking of doing it by flattening arrays (e.g. [2 x [3 x float]]) becomes [6 x float]. I think the tricky part is resolving the GEPs correctly, it might be a bigger refactoring than it looks like at first glance.

One alternative may be to have some kind of "alloca canonicalization" pass earlier that does the flattening for us to enable PromoteAlloca better.

@arsenm I haven't lost track of that but I also didn't find the time for it yet :/
Last time I thought about it I thought about changing the pass so it collects allocas, then sorts them by profitability (number of users + whether there are uses in loops), then just greedily promotes them one by one until it runs out of budget. Would that be good?

Yes, that's what I was thinking for a first step

Also, I thought SROA did try to flatten nested arrays already

I was thinking about splitting two-dimensional array into multiple one-dimensional because:

(this may be graphics specific) only single component / column of the array is used and by splitting we can removed unused columns / components.

some Shaders have a very low VGPR usage, but contain "big" (more than 32 elements) arrays like [ 10 x [ 4 x float ] ]. If we split it into four one-dimensional arrays than we will be able to put everything into VGPRs.

instruction movrel (which I'm aiming to generate) is supporting up to 32 elements.

Fixes llvm#80366 (cherry picked from commit 4e958ab)

Fixes llvm#80366

Fixes llvm#80366 (cherry picked from commit 4e958ab)

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas

8cfe73a

Fixes llvm#80366

Pierre-vh requested review from jayfoad and arsenm February 5, 2024 13:15

llvmbot added the backend:AMDGPU label Feb 5, 2024

arsenm requested changes Feb 5, 2024

View reviewed changes

add more tests

2e2375b

Pierre-vh requested a review from arsenm February 5, 2024 13:30

more tests, the sequel

f1d9d1a

arsenm approved these changes Feb 5, 2024

View reviewed changes

Pierre-vh merged commit 4e958ab into llvm:main Feb 5, 2024
4 of 5 checks passed

Pierre-vh deleted the memset-ptr branch February 5, 2024 13:36

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Feb 5, 2024

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (llvm#80678)

ba5a8cd

Fixes llvm#80366 (cherry picked from commit 4e958ab)

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Feb 5, 2024

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (llvm#80678)

09303e7

Fixes llvm#80366 (cherry picked from commit 4e958ab)

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Feb 5, 2024

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (llvm#80678)

47fbb64

Fixes llvm#80366 (cherry picked from commit 4e958ab)

agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (llvm#80678)

2007796

Fixes llvm#80366

tstellar pushed a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (llvm#80678)

c549644

Fixes llvm#80366 (cherry picked from commit 4e958ab)

tstellar pushed a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (llvm#80678)

c4e26c6

Fixes llvm#80366 (cherry picked from commit 4e958ab)

tstellar pushed a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (llvm#80678)

fe78ccb

Fixes llvm#80366 (cherry picked from commit 4e958ab)

tstellar pushed a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (llvm#80678)

f551ab8

Fixes llvm#80366 (cherry picked from commit 4e958ab)

pointhex mentioned this pull request May 7, 2024

getStyleDiagHandler #91314

Closed

aemerson mentioned this pull request May 9, 2024

release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge #91672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas #80678

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas #80678

Pierre-vh commented Feb 5, 2024

llvmbot commented Feb 5, 2024

arsenm Feb 5, 2024

Pierre-vh Feb 5, 2024

Pierre-vh Feb 5, 2024

arsenm Feb 5, 2024

Pierre-vh Feb 5, 2024

arsenm Feb 5, 2024

Pierre-vh Feb 5, 2024

arsenm Feb 5, 2024

arsenm Feb 5, 2024

mariusz-sikora-at-amd Feb 6, 2024

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas #80678

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas #80678

Conversation

Pierre-vh commented Feb 5, 2024

llvmbot commented Feb 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment