-
Notifications
You must be signed in to change notification settings - Fork 12.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[AMDGPU] Add IR-level pass to rewrite away address space 7 (#77952)
This commit adds the -lower-buffer-fat-pointers pass, which is applicable to all AMDGCN compilations. The purpose of this pass is to remove the type `ptr addrspace(7)` from incoming IR. This must be done at the LLVM IR level because `ptr addrspace(7)`, as a 160-bit primitive type, cannot be correctly handled by SelectionDAG. The detailed operation of the pass is described in comments, but, in summary, the removal proceeds by: 1. Rewriting loads and stores of ptr addrspace(7) to loads and stores of i160 (including vectors and aggregates). This is needed because the in-register representation of these pointers will stop matching their in-memory representation in step 2, and so ptrtoint/inttoptr operations are used to preserve the expected memory layout 2. Mutating the IR to replace all occurrences of `ptr addrspace(7)` with the type `{ptr addrspace(8), ptr addrspace(6) }`, which makes the two parts of a buffer fat pointer (the 128-bit address space 8 resource and the 32-bit address space 6 offset) visible in the IR. This also impacts the argument and return types of functions. 3. *Splitting* the resource and offset parts. All instructions that produce or consume buffer fat pointers (like GEP or load) are rewritten to produce or consume the resource and offset parts separately. For example, GEP updates the offset part of the result and a load uses the resource and offset parts to populate the relevant llvm.amdgcn.raw.ptr.buffer.load intrinsic call. At the end of this process, the original mutated instructions are replaced by their new split counterparts, ensuring no invalidly-typed IR escapes this pass. (For operations like call, where the struct form is needed, insertelement operations are inserted). Compared to LGC's PatchBufferOp ( https://github.com/GPUOpen-Drivers/llpc/blob/32cda89776980202597d5bf4ed4447a1bae64047/lgc/patch/PatchBufferOp.cpp ): this pass - Also handles vectors of ptr addrspace(7)s - Also handles function boundaries - Includes the same uniform buffer optimization for loops and conditionals - Does *not* handle memcpy() and friends (this is future work) - Does *not* break up large loads and stores into smaller parts. This should be handled by extending the legalization of *.buffer.{load,store} to handle larger types by producing multiple instructions (the same way ordinary LOAD and STORE are legalized). That work is planned for a followup commit. - Does *not* have special logic for handling divergent buffer descriptors. The logic in LGC is, as far as I can tell, incorrect in general, and, per discussions with @nhaehnle, isn't widely used. Therefore, divergent descriptors are handled with waterfall loops later in legalization. As a final matter, this commit updates atomic expansion to treat buffer operations analogously to global ones. (One question for reviewers: is the new pass is the right place? Should it be later in the pipeline?) Differential Revision: https://reviews.llvm.org/D158463
- Loading branch information
Showing
16 changed files
with
3,898 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2,012 changes: 2,012 additions & 0 deletions
2,012
llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
68 changes: 63 additions & 5 deletions
68
llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces-vectors.ll
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,73 @@ | ||
; RUN: not --crash llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - -stop-after=irtranslator < %s | ||
; REQUIRES: asserts | ||
|
||
; Confirm that no one's gotten vectors of addrspace(7) pointers to go through the | ||
; IR translater incidentally. | ||
; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2 | ||
; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - -stop-after=irtranslator < %s | FileCheck %s | ||
|
||
define <2 x ptr addrspace(7)> @no_auto_constfold_gep_vector() { | ||
; CHECK-LABEL: name: no_auto_constfold_gep_vector | ||
; CHECK: bb.1 (%ir-block.0): | ||
; CHECK-NEXT: [[C:%[0-9]+]]:_(p8) = G_CONSTANT i128 0 | ||
; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x p8>) = G_BUILD_VECTOR [[C]](p8), [[C]](p8) | ||
; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 123 | ||
; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[C1]](s32), [[C1]](s32) | ||
; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32), [[UV5:%[0-9]+]]:_(s32), [[UV6:%[0-9]+]]:_(s32), [[UV7:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[BUILD_VECTOR]](<2 x p8>) | ||
; CHECK-NEXT: [[UV8:%[0-9]+]]:_(s32), [[UV9:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[BUILD_VECTOR1]](<2 x s32>) | ||
; CHECK-NEXT: $vgpr0 = COPY [[UV]](s32) | ||
; CHECK-NEXT: $vgpr1 = COPY [[UV1]](s32) | ||
; CHECK-NEXT: $vgpr2 = COPY [[UV2]](s32) | ||
; CHECK-NEXT: $vgpr3 = COPY [[UV3]](s32) | ||
; CHECK-NEXT: $vgpr4 = COPY [[UV4]](s32) | ||
; CHECK-NEXT: $vgpr5 = COPY [[UV5]](s32) | ||
; CHECK-NEXT: $vgpr6 = COPY [[UV6]](s32) | ||
; CHECK-NEXT: $vgpr7 = COPY [[UV7]](s32) | ||
; CHECK-NEXT: $vgpr8 = COPY [[UV8]](s32) | ||
; CHECK-NEXT: $vgpr9 = COPY [[UV9]](s32) | ||
; CHECK-NEXT: SI_RETURN implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3, implicit $vgpr4, implicit $vgpr5, implicit $vgpr6, implicit $vgpr7, implicit $vgpr8, implicit $vgpr9 | ||
%gep = getelementptr i8, <2 x ptr addrspace(7)> zeroinitializer, <2 x i32> <i32 123, i32 123> | ||
ret <2 x ptr addrspace(7)> %gep | ||
} | ||
|
||
define <2 x ptr addrspace(7)> @gep_vector_splat(<2 x ptr addrspace(7)> %ptrs, i64 %idx) { | ||
; CHECK-LABEL: name: gep_vector_splat | ||
; CHECK: bb.1 (%ir-block.0): | ||
; CHECK-NEXT: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 | ||
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 | ||
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2 | ||
; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3 | ||
; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr4 | ||
; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr5 | ||
; CHECK-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY $vgpr6 | ||
; CHECK-NEXT: [[COPY7:%[0-9]+]]:_(s32) = COPY $vgpr7 | ||
; CHECK-NEXT: [[MV:%[0-9]+]]:_(p8) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32) | ||
; CHECK-NEXT: [[MV1:%[0-9]+]]:_(p8) = G_MERGE_VALUES [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32) | ||
; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x p8>) = G_BUILD_VECTOR [[MV]](p8), [[MV1]](p8) | ||
; CHECK-NEXT: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr8 | ||
; CHECK-NEXT: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr9 | ||
; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32) | ||
; CHECK-NEXT: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr10 | ||
; CHECK-NEXT: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr11 | ||
; CHECK-NEXT: [[MV2:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY10]](s32), [[COPY11]](s32) | ||
; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<2 x s64>) = G_IMPLICIT_DEF | ||
; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0 | ||
; CHECK-NEXT: [[DEF1:%[0-9]+]]:_(<2 x p8>) = G_IMPLICIT_DEF | ||
; CHECK-NEXT: [[DEF2:%[0-9]+]]:_(<2 x s32>) = G_IMPLICIT_DEF | ||
; CHECK-NEXT: [[IVEC:%[0-9]+]]:_(<2 x s64>) = G_INSERT_VECTOR_ELT [[DEF]], [[MV2]](s64), [[C]](s64) | ||
; CHECK-NEXT: [[SHUF:%[0-9]+]]:_(<2 x s64>) = G_SHUFFLE_VECTOR [[IVEC]](<2 x s64>), [[DEF]], shufflemask(0, 0) | ||
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(<2 x s32>) = G_TRUNC [[SHUF]](<2 x s64>) | ||
; CHECK-NEXT: [[ADD:%[0-9]+]]:_(<2 x s32>) = G_ADD [[BUILD_VECTOR1]], [[TRUNC]] | ||
; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32), [[UV5:%[0-9]+]]:_(s32), [[UV6:%[0-9]+]]:_(s32), [[UV7:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[BUILD_VECTOR]](<2 x p8>) | ||
; CHECK-NEXT: [[UV8:%[0-9]+]]:_(s32), [[UV9:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[ADD]](<2 x s32>) | ||
; CHECK-NEXT: $vgpr0 = COPY [[UV]](s32) | ||
; CHECK-NEXT: $vgpr1 = COPY [[UV1]](s32) | ||
; CHECK-NEXT: $vgpr2 = COPY [[UV2]](s32) | ||
; CHECK-NEXT: $vgpr3 = COPY [[UV3]](s32) | ||
; CHECK-NEXT: $vgpr4 = COPY [[UV4]](s32) | ||
; CHECK-NEXT: $vgpr5 = COPY [[UV5]](s32) | ||
; CHECK-NEXT: $vgpr6 = COPY [[UV6]](s32) | ||
; CHECK-NEXT: $vgpr7 = COPY [[UV7]](s32) | ||
; CHECK-NEXT: $vgpr8 = COPY [[UV8]](s32) | ||
; CHECK-NEXT: $vgpr9 = COPY [[UV9]](s32) | ||
; CHECK-NEXT: SI_RETURN implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3, implicit $vgpr4, implicit $vgpr5, implicit $vgpr6, implicit $vgpr7, implicit $vgpr8, implicit $vgpr9 | ||
%gep = getelementptr i8, <2 x ptr addrspace(7)> %ptrs, i64 %idx | ||
ret <2 x ptr addrspace(7)> %gep | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.