-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Fix negative immediate offset for unbuffered smem loads #89165
[AMDGPU] Fix negative immediate offset for unbuffered smem loads #89165
Conversation
For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm#79553.
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: None (vangthao95) ChangesFor unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of #79553. Patch is 33.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/89165.diff 11 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index bba7682cd7a0d1..bf65244255f341 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -1984,8 +1984,10 @@ bool AMDGPUDAGToDAGISel::SelectScratchSVAddr(SDNode *N, SDValue Addr,
// not null) offset. If Imm32Only is true, match only 32-bit immediate
// offsets available on CI.
bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode,
- SDValue *SOffset, SDValue *Offset,
- bool Imm32Only, bool IsBuffer) const {
+ SDValue *SBase, SDValue *SOffset,
+ SDValue *Offset, bool Imm32Only,
+ bool IsBuffer,
+ bool HasSOffset) const {
assert((!SOffset || !Offset) &&
"Cannot match both soffset and offset at the same time!");
@@ -2016,7 +2018,14 @@ bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode,
AMDGPU::getSMRDEncodedOffset(*Subtarget, ByteOffset, IsBuffer);
if (EncodedOffset && Offset && !Imm32Only) {
*Offset = CurDAG->getTargetConstant(*EncodedOffset, SL, MVT::i32);
- return true;
+ if (EncodedOffset >= 0 || IsBuffer || HasSOffset ||
+ !Subtarget->hasSignedSMRDImmOffset())
+ return true;
+ // For unbuffered smem loads, it is illegal for the Immediate Offset to be
+ // negative if the resulting (Offset + (M0 or SOffset or zero) is negative.
+ // Handle the case where the Immediate Offset is negative and there is no
+ // SOffset.
+ return false;
}
// SGPR and literal offsets are unsigned.
@@ -2072,13 +2081,34 @@ SDValue AMDGPUDAGToDAGISel::Expand32BitAddress(SDValue Addr) const {
// true, match only 32-bit immediate offsets available on CI.
bool AMDGPUDAGToDAGISel::SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase,
SDValue *SOffset, SDValue *Offset,
- bool Imm32Only,
- bool IsBuffer) const {
+ bool Imm32Only, bool IsBuffer,
+ bool HasSOffset) const {
if (SOffset && Offset) {
assert(!Imm32Only && !IsBuffer);
SDValue B;
- return SelectSMRDBaseOffset(Addr, B, nullptr, Offset) &&
- SelectSMRDBaseOffset(B, SBase, SOffset, nullptr);
+ if (!SelectSMRDBaseOffset(Addr, B, nullptr, Offset, false, false, true))
+ return false;
+
+ if (!SelectSMRDBaseOffset(B, SBase, SOffset, nullptr, false, false, true))
+ return false;
+
+ if (IsBuffer || Imm32Only || !Subtarget->hasSignedSMRDImmOffset())
+ return true;
+
+ // For unbuffered smem loads, it is illegal for the Immediate Offset to be
+ // negative if the resulting (Offset + (M0 or SOffset or zero) is negative.
+ // Handle the case where the Immediate Offset + SOffset is negative.
+ if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(*Offset)) {
+ int64_t ByteOffset = C->getSExtValue();
+ if (ByteOffset >= 0)
+ return true;
+
+ KnownBits SKnown = CurDAG->computeKnownBits(*SOffset);
+ if (ByteOffset + SKnown.getMinValue().getSExtValue() < 0)
+ return false;
+ }
+
+ return true;
}
// A 32-bit (address + offset) should not cause unsigned 32-bit integer
@@ -2097,11 +2127,14 @@ bool AMDGPUDAGToDAGISel::SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase,
}
if (!N0 || !N1)
return false;
- if (SelectSMRDOffset(N1, SOffset, Offset, Imm32Only, IsBuffer)) {
+
+ if (SelectSMRDOffset(N1, &N0, SOffset, Offset, Imm32Only, IsBuffer,
+ HasSOffset)) {
SBase = N0;
return true;
}
- if (SelectSMRDOffset(N0, SOffset, Offset, Imm32Only, IsBuffer)) {
+ if (SelectSMRDOffset(N0, &N1, SOffset, Offset, Imm32Only, IsBuffer,
+ HasSOffset)) {
SBase = N1;
return true;
}
@@ -2149,14 +2182,14 @@ bool AMDGPUDAGToDAGISel::SelectSMRDSgprImm(SDValue Addr, SDValue &SBase,
}
bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm(SDValue N, SDValue &Offset) const {
- return SelectSMRDOffset(N, /* SOffset */ nullptr, &Offset,
+ return SelectSMRDOffset(N, /*SBase=*/nullptr, /* SOffset */ nullptr, &Offset,
/* Imm32Only */ false, /* IsBuffer */ true);
}
bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm32(SDValue N,
SDValue &Offset) const {
assert(Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS);
- return SelectSMRDOffset(N, /* SOffset */ nullptr, &Offset,
+ return SelectSMRDOffset(N, /*SBase=*/nullptr, /* SOffset */ nullptr, &Offset,
/* Imm32Only */ true, /* IsBuffer */ true);
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h
index f987b747c0e21b..b1ad16af3c35a2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h
@@ -183,13 +183,15 @@ class AMDGPUDAGToDAGISel : public SelectionDAGISel {
bool SelectScratchSVAddr(SDNode *N, SDValue Addr, SDValue &VAddr,
SDValue &SAddr, SDValue &Offset) const;
- bool SelectSMRDOffset(SDValue ByteOffsetNode, SDValue *SOffset,
- SDValue *Offset, bool Imm32Only = false,
- bool IsBuffer = false) const;
+ bool SelectSMRDOffset(SDValue ByteOffsetNode, SDValue *SBase,
+ SDValue *SOffset, SDValue *Offset,
+ bool Imm32Only = false, bool IsBuffer = false,
+ bool HasSOffset = false) const;
SDValue Expand32BitAddress(SDValue Addr) const;
bool SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase, SDValue *SOffset,
SDValue *Offset, bool Imm32Only = false,
- bool IsBuffer = false) const;
+ bool IsBuffer = false,
+ bool HasSOffset = false) const;
bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue *SOffset,
SDValue *Offset, bool Imm32Only = false) const;
bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
@@ -201,6 +203,8 @@ class AMDGPUDAGToDAGISel : public SelectionDAGISel {
bool SelectSMRDBufferImm32(SDValue N, SDValue &Offset) const;
bool SelectSMRDBufferSgprImm(SDValue N, SDValue &SOffset,
SDValue &Offset) const;
+ bool SelectSMRDPrefetchImm(SDValue Addr, SDValue &SBase,
+ SDValue &Offset) const;
bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;
bool SelectVOP3ModsImpl(SDValue In, SDValue &Src, unsigned &SrcMods,
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index e13c13913d4e82..10dda8a9e1eaac 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -4211,6 +4211,17 @@ bool AMDGPUInstructionSelector::selectSmrdOffset(MachineOperand &Root,
Base = GEPI2.SgprParts[0];
*SOffset = OffsetReg;
*Offset = *EncodedImm;
+ if (*Offset >= 0 || !STI.hasSignedSMRDImmOffset())
+ return true;
+
+ // For unbuffered smem loads, it is illegal for the Immediate Offset
+ // to be negative if the resulting (Offset + (M0 or SOffset or zero)
+ // is negative. Handle the case where the Immediate Offset + SOffset
+ // is negative.
+ auto SKnown = KB->getKnownBits(*SOffset);
+ if (*Offset + SKnown.getMinValue().getSExtValue() < 0)
+ return false;
+
return true;
}
}
@@ -4221,7 +4232,13 @@ bool AMDGPUInstructionSelector::selectSmrdOffset(MachineOperand &Root,
if (Offset && GEPI.SgprParts.size() == 1 && EncodedImm) {
Base = GEPI.SgprParts[0];
*Offset = *EncodedImm;
- return true;
+ if (*Offset >= 0 || !STI.hasSignedSMRDImmOffset())
+ return true;
+ // For unbuffered smem loads, it is illegal for the Immediate Offset to be
+ // negative if the resulting (Offset + (M0 or SOffset or zero is negative.
+ // Handle the case where the Immediate Offset is negative and there is no
+ // SOffset.
+ return false;
}
// SGPR offset is unsigned.
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 8a4a46ce50d1d7..25c24c924f0a20 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -1315,6 +1315,10 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
// of sign-extending.
bool hasGetPCZeroExtension() const { return GFX12Insts; }
+ // \returns true if the target supports signed immediate offset for SMRD
+ // instructions.
+ bool hasSignedSMRDImmOffset() const { return getGeneration() >= GFX9; }
+
/// \returns SGPR allocation granularity supported by the subtarget.
unsigned getSGPRAllocGranule() const {
return AMDGPU::IsaInfo::getSGPRAllocGranule(this);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-constant.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-constant.mir
index c44477273dad09..504f7697a0fcca 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-constant.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-constant.mir
@@ -1234,7 +1234,15 @@ body: |
; GFX10: liveins: $sgpr0_sgpr1
; GFX10-NEXT: {{ $}}
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
- ; GFX10-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]], -1, 0 :: (load (s32), addrspace 4)
+ ; GFX10-NEXT: [[S_MOV_B:%[0-9]+]]:sreg_64 = S_MOV_B64_IMM_PSEUDO -1
+ ; GFX10-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY [[COPY]].sub0
+ ; GFX10-NEXT: [[COPY2:%[0-9]+]]:sreg_32 = COPY [[S_MOV_B]].sub0
+ ; GFX10-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY [[COPY]].sub1
+ ; GFX10-NEXT: [[COPY4:%[0-9]+]]:sreg_32 = COPY [[S_MOV_B]].sub1
+ ; GFX10-NEXT: [[S_ADD_U32_:%[0-9]+]]:sreg_32 = S_ADD_U32 [[COPY1]], [[COPY2]], implicit-def $scc
+ ; GFX10-NEXT: [[S_ADDC_U32_:%[0-9]+]]:sreg_32 = S_ADDC_U32 [[COPY3]], [[COPY4]], implicit-def dead $scc, implicit $scc
+ ; GFX10-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[S_ADD_U32_]], %subreg.sub0, [[S_ADDC_U32_]], %subreg.sub1
+ ; GFX10-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[REG_SEQUENCE]], 0, 0 :: (load (s32), addrspace 4)
; GFX10-NEXT: $sgpr0 = COPY [[S_LOAD_DWORD_IMM]]
%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(s64) = G_CONSTANT i64 -1
@@ -1304,7 +1312,15 @@ body: |
; GFX10: liveins: $sgpr0_sgpr1
; GFX10-NEXT: {{ $}}
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
- ; GFX10-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]], -524288, 0 :: (load (s32), addrspace 4)
+ ; GFX10-NEXT: [[S_MOV_B:%[0-9]+]]:sreg_64 = S_MOV_B64_IMM_PSEUDO -524288
+ ; GFX10-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY [[COPY]].sub0
+ ; GFX10-NEXT: [[COPY2:%[0-9]+]]:sreg_32 = COPY [[S_MOV_B]].sub0
+ ; GFX10-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY [[COPY]].sub1
+ ; GFX10-NEXT: [[COPY4:%[0-9]+]]:sreg_32 = COPY [[S_MOV_B]].sub1
+ ; GFX10-NEXT: [[S_ADD_U32_:%[0-9]+]]:sreg_32 = S_ADD_U32 [[COPY1]], [[COPY2]], implicit-def $scc
+ ; GFX10-NEXT: [[S_ADDC_U32_:%[0-9]+]]:sreg_32 = S_ADDC_U32 [[COPY3]], [[COPY4]], implicit-def dead $scc, implicit $scc
+ ; GFX10-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sreg_64_xexec = REG_SEQUENCE [[S_ADD_U32_]], %subreg.sub0, [[S_ADDC_U32_]], %subreg.sub1
+ ; GFX10-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[REG_SEQUENCE]], 0, 0 :: (load (s32), addrspace 4)
; GFX10-NEXT: $sgpr0 = COPY [[S_LOAD_DWORD_IMM]]
%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(s64) = G_CONSTANT i64 -524288
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/smrd.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/smrd.ll
index 139f82b3dc9f7a..9ee0acf2aa2db6 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/smrd.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/smrd.ll
@@ -88,11 +88,13 @@ entry:
ret void
}
-; GFX9_10 can use a signed immediate byte offset
+; GFX9+ can use a signed immediate byte offset but not without sgpr[offset]
; GCN-LABEL: {{^}}smrd6:
; SICIVI: s_add_u32 s{{[0-9]}}, s{{[0-9]}}, -4
; SICIVI: s_load_dword s{{[0-9]}}, s[{{[0-9]:[0-9]}}], 0x0
-; GFX9_10: s_load_dword s{{[0-9]}}, s[{{[0-9]:[0-9]}}], -0x4
+; GFX9_10: s_add_u32 s2, s2, -4
+; GFX9_10: s_addc_u32 s3, s3, -1
+; GFX9_10: s_load_dword s{{[0-9]}}, s[{{[0-9]:[0-9]}}], 0x0
define amdgpu_kernel void @smrd6(ptr addrspace(1) %out, ptr addrspace(4) %ptr) #0 {
entry:
%tmp = getelementptr i32, ptr addrspace(4) %ptr, i64 -1
diff --git a/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-smem.ll b/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-smem.ll
index 54dc5b8b9d3dd6..41d2360dd5e1e6 100644
--- a/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-smem.ll
+++ b/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-smem.ll
@@ -297,9 +297,11 @@ define amdgpu_cs void @test_sink_smem_offset_neg400(ptr addrspace(4) inreg %ptr,
; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: .LBB5_1: ; %loop
; GFX9-NEXT: ; =>This Inner Loop Header: Depth=1
-; GFX9-NEXT: s_waitcnt lgkmcnt(0)
-; GFX9-NEXT: s_load_dword s3, s[0:1], -0x190
; GFX9-NEXT: s_add_i32 s2, s2, -1
+; GFX9-NEXT: s_add_u32 s4, s0, 0xfffffe70
+; GFX9-NEXT: s_addc_u32 s5, s1, -1
+; GFX9-NEXT: s_waitcnt lgkmcnt(0)
+; GFX9-NEXT: s_load_dword s3, s[4:5], 0x0
; GFX9-NEXT: s_cmp_lg_u32 s2, 0
; GFX9-NEXT: s_cbranch_scc1 .LBB5_1
; GFX9-NEXT: ; %bb.2: ; %end
@@ -307,10 +309,14 @@ define amdgpu_cs void @test_sink_smem_offset_neg400(ptr addrspace(4) inreg %ptr,
;
; GFX12-LABEL: test_sink_smem_offset_neg400:
; GFX12: ; %bb.0: ; %entry
+; GFX12-NEXT: s_movk_i32 s4, 0xfe70
+; GFX12-NEXT: s_mov_b32 s5, -1
+; GFX12-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-NEXT: s_add_nc_u64 s[0:1], s[0:1], s[4:5]
; GFX12-NEXT: .LBB5_1: ; %loop
; GFX12-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX12-NEXT: s_wait_kmcnt 0x0
-; GFX12-NEXT: s_load_b32 s3, s[0:1], -0x190
+; GFX12-NEXT: s_load_b32 s3, s[0:1], 0x0
; GFX12-NEXT: s_add_co_i32 s2, s2, -1
; GFX12-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
; GFX12-NEXT: s_cmp_lg_u32 s2, 0
diff --git a/llvm/test/CodeGen/AMDGPU/gfx12_scalar_subword_loads.ll b/llvm/test/CodeGen/AMDGPU/gfx12_scalar_subword_loads.ll
index c69207c0472e7c..08da89ec0fb229 100644
--- a/llvm/test/CodeGen/AMDGPU/gfx12_scalar_subword_loads.ll
+++ b/llvm/test/CodeGen/AMDGPU/gfx12_scalar_subword_loads.ll
@@ -19,15 +19,31 @@ define amdgpu_ps void @test_s_load_i8(ptr addrspace(4) inreg %in, ptr addrspace(
}
define amdgpu_ps void @test_s_load_i8_imm(ptr addrspace(4) inreg %in, ptr addrspace(1) %out) {
-; GCN-LABEL: test_s_load_i8_imm:
-; GCN: ; %bb.0:
-; GCN-NEXT: s_load_i8 s0, s[0:1], -0x64
-; GCN-NEXT: s_wait_kmcnt 0x0
-; GCN-NEXT: v_mov_b32_e32 v2, s0
-; GCN-NEXT: global_store_b32 v[0:1], v2, off
-; GCN-NEXT: s_nop 0
-; GCN-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
-; GCN-NEXT: s_endpgm
+; DAG-LABEL: test_s_load_i8_imm:
+; DAG: ; %bb.0:
+; DAG-NEXT: s_movk_i32 s2, 0xff9c
+; DAG-NEXT: s_mov_b32 s3, -1
+; DAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; DAG-NEXT: s_add_nc_u64 s[0:1], s[0:1], s[2:3]
+; DAG-NEXT: s_load_i8 s0, s[0:1], 0x0
+; DAG-NEXT: s_wait_kmcnt 0x0
+; DAG-NEXT: v_mov_b32_e32 v2, s0
+; DAG-NEXT: global_store_b32 v[0:1], v2, off
+; DAG-NEXT: s_nop 0
+; DAG-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; DAG-NEXT: s_endpgm
+;
+; GISEL-LABEL: test_s_load_i8_imm:
+; GISEL: ; %bb.0:
+; GISEL-NEXT: s_add_co_u32 s0, s0, 0xffffff9c
+; GISEL-NEXT: s_add_co_ci_u32 s1, s1, -1
+; GISEL-NEXT: s_load_i8 s0, s[0:1], 0x0
+; GISEL-NEXT: s_wait_kmcnt 0x0
+; GISEL-NEXT: v_mov_b32_e32 v2, s0
+; GISEL-NEXT: global_store_b32 v[0:1], v2, off
+; GISEL-NEXT: s_nop 0
+; GISEL-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GISEL-NEXT: s_endpgm
%gep = getelementptr i8, ptr addrspace(4) %in, i64 -100
%ld = load i8, ptr addrspace(4) %gep
%sext = sext i8 %ld to i32
@@ -195,15 +211,31 @@ define amdgpu_ps void @test_s_load_i16(ptr addrspace(4) inreg %in, ptr addrspace
}
define amdgpu_ps void @test_s_load_i16_imm(ptr addrspace(4) inreg %in, ptr addrspace(1) %out) {
-; GCN-LABEL: test_s_load_i16_imm:
-; GCN: ; %bb.0:
-; GCN-NEXT: s_load_i16 s0, s[0:1], -0xc8
-; GCN-NEXT: s_wait_kmcnt 0x0
-; GCN-NEXT: v_mov_b32_e32 v2, s0
-; GCN-NEXT: global_store_b32 v[0:1], v2, off
-; GCN-NEXT: s_nop 0
-; GCN-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
-; GCN-NEXT: s_endpgm
+; DAG-LABEL: test_s_load_i16_imm:
+; DAG: ; %bb.0:
+; DAG-NEXT: s_movk_i32 s2, 0xff38
+; DAG-NEXT: s_mov_b32 s3, -1
+; DAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; DAG-NEXT: s_add_nc_u64 s[0:1], s[0:1], s[2:3]
+; DAG-NEXT: s_load_i16 s0, s[0:1], 0x0
+; DAG-NEXT: s_wait_kmcnt 0x0
+; DAG-NEXT: v_mov_b32_e32 v2, s0
+; DAG-NEXT: global_store_b32 v[0:1], v2, off
+; DAG-NEXT: s_nop 0
+; DAG-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; DAG-NEXT: s_endpgm
+;
+; GISEL-LABEL: test_s_load_i16_imm:
+; GISEL: ; %bb.0:
+; GISEL-NEXT: s_add_co_u32 s0, s0, 0xffffff38
+; GISEL-NEXT: s_add_co_ci_u32 s1, s1, -1
+; GISEL-NEXT: s_load_i16 s0, s[0:1], 0x0
+; GISEL-NEXT: s_wait_kmcnt 0x0
+; GISEL-NEXT: v_mov_b32_e32 v2, s0
+; GISEL-NEXT: global_store_b32 v[0:1], v2, off
+; GISEL-NEXT: s_nop 0
+; GISEL-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GISEL-NEXT: s_endpgm
%gep = getelementptr i16, ptr addrspace(4) %in, i64 -100
%ld = load i16, ptr addrspace(4) %gep
%sext = sext i16 %ld to i32
diff --git a/llvm/test/CodeGen/AMDGPU/global-saddr-load.ll b/llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
index d9cbbc11f9a738..2f7e91faa41847 100644
--- a/llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+++ b/llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
@@ -157,12 +157,25 @@ define amdgpu_ps float @global_load_saddr_i8_offset_neg4096(ptr addrspace(1) inr
; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: ; return to shader part epilog
;
-; GFX12-LABEL: global_load_saddr_i8_offset_neg4096:
-; GFX12: ; %bb.0:
-; GFX12-NEXT: s_load_u8 s0, s[2:3], -0x1000
-; GFX12-NEXT: s_wait_kmcnt 0x0
-; GFX12-NEXT: v_mov_b32_e32 v0, s0
-; GFX12-NEXT: ; return to shader part epilog
+; GFX12-SDAG-LABEL: global_load_saddr_i8_offset_neg4096:
+; GFX12-SDAG: ; %bb.0:
+; GFX12-SDAG-NEXT: s_movk_i32 s0, 0xf000
+; GFX12-SDAG-NEXT: s_mov_b32 s1, -1
+; GFX12-SDAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-SDAG-NEXT: s_add_nc_u64 s[0:1], s[2:3], s[0:1]
+; GFX12-SDAG-NEXT: s_load_u8 s0, s[0:1], 0x0
+; GFX12-SDAG-NEXT: s_wait_kmcnt 0x0
+; GFX12-SDAG-NEXT: v_mov_b32_e32 v0, s0
+; GFX12-SDAG-NEXT: ; return to shader part epilog
+;
+; GFX12-GISEL-LABEL: global_load_saddr_i8_offset_neg4096:
+; GFX12-GISEL: ; %bb.0:
+; GFX12-GISEL-NEXT: s_add_co_u32 s0, s2, 0xfffff000
+; GFX12-GISEL-NEXT: s_add_co_ci_u32 s1, s3, -1
+; GFX12-GISEL-NEXT: s_load_u8 s0, s[0:1], 0x0
+; GFX12-GISEL-NEXT: s_wait_kmcnt 0x0
+; GFX12-GISEL-NEXT: v_mov_b32_e32 v0, s0
+; GFX12-GISEL-NEXT: ; return to shader part epilog
%gep0 = getelementptr inbounds i8, ptr addrspace(1) %sbase, i64 -4096
%load = load i8, ptr addrspace(1) %gep0
%zext = zext i8 %load to i32
@@ -198,12 +211,25 @@ define amdgpu_ps float @global_load_saddr_i8_offset_neg4097(ptr addrspace(1) inr
; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: ; return to shader part epilog
;
-; GFX12-LABEL: global_load_saddr_i8_offset_neg4097:
-; GFX12: ; %bb.0:
-; GFX12-NEXT: s_load_u8 s0, s[2:3], -0x1001
-; GFX12-NEXT: s_wait_kmcnt 0x0
-; GFX12-NEXT: v_mov_b32_e32 v0, s0
-; GFX12-NEXT: ; return to shader part epilog
+; GFX12-SDAG-LABEL: global_load_saddr_i8_offset_neg4097:
+; GFX12-SDAG: ; %bb.0:
+; GFX12-SDAG-NEXT: ...
[truncated]
|
if (EncodedOffset >= 0 || IsBuffer || HasSOffset || | ||
!Subtarget->hasSignedSMRDImmOffset()) | ||
return true; | ||
// For unbuffered smem loads, it is illegal for the Immediate Offset to be | ||
// negative if the resulting (Offset + (M0 or SOffset or zero) is negative. | ||
// Handle the case where the Immediate Offset is negative and there is no | ||
// SOffset. | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble following the relationship with getSMRDEncodedOffset. If I look in there, it's checking for legal unsigned offsets. Is there a bug somewhere down in that helper? Should we be rejecting the same offsets in the assembler?
Also, can directly return the boolean expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved parts of the check inside getSMRDEncodedOffset but checking SOffset is a bit more complicated so I did not move that part inside the helper.
@@ -4221,7 +4232,13 @@ bool AMDGPUInstructionSelector::selectSmrdOffset(MachineOperand &Root, | |||
if (Offset && GEPI.SgprParts.size() == 1 && EncodedImm) { | |||
Base = GEPI.SgprParts[0]; | |||
*Offset = *EncodedImm; | |||
return true; | |||
if (*Offset >= 0 || !STI.hasSignedSMRDImmOffset()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the in the DAG version, it would be better if the code was shared in the low level offset-is-valid function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. I don't fully understand all the logic, but we have tested this in some graphics cases that previously crashed due to abusing negative offsets, and the results were good.
assert((!SOffset || !Offset) && | ||
"Cannot match both soffset and offset at the same time!"); | ||
|
||
ConstantSDNode *C = dyn_cast<ConstantSDNode>(ByteOffsetNode); | ||
if (!C) { | ||
if (!SOffset) | ||
return false; | ||
bool Changed = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be any need to track a "Changed" state. Nothing is changing here, it's a series of preconditions that succeeds or not. The end case should just return true If the mode matched.
// For unbuffered smem loads, it is illegal for the Immediate Offset to be | ||
// negative if the resulting (Offset + (M0 or SOffset or zero) is negative. | ||
// Handle the case where the Immediate Offset + SOffset is negative. | ||
if (AMDGPU::hasSMRDSignedImmOffset(*Subtarget) && Changed && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed is just the same as *SOffset having a value?
int64_t ImmOff = 0; | ||
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(*Offset)) | ||
ImmOff = C->getSExtValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing out the constant offset here is harder to follow, SelectSMRDBaseOffset can do this just as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reiterating this, you're splitting up the offset parsing up
bool Imm32Only, bool IsBuffer) const { | ||
bool Imm32Only, bool IsBuffer, | ||
bool HasSOffset, | ||
int64_t ImmOffset) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The offset is already passed in, having a second copy of it in the immediate case is harder to follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to get the ImmOffset value? When selecting for SOffset, the Offset pointer is a nullptr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just handle that case as ImmOffset=0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mean in the case where it is attempting to select both an SOffset and Offset (Immediate). It will first attempt to select the immediate Offset then the SOffset. When selecting for SOffset, the Offset pointer is not passed in so when selecting SOffset, it has no knowledge of the immediate Offset value.
nullptr is passed for Offset here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp#L2082
ping |
…lim) This is a cherry-pick of an unmerged upstream change llvm#89165 Once the upstream change is finished and merged, this will need reverting. Original commit message: For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm#79553. Change-Id: I235ac5d0de5da2a1544760ab3c9749665340310d
int64_t ImmOff = 0; | ||
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(*Offset)) | ||
ImmOff = C->getSExtValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reiterating this, you're splitting up the offset parsing up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code needs a cleanup. I have a hard time following what's going on with the recursion and splitting up the offsets. The addressing isn't that complex
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/530 Here is the relevant piece of the build log for the reference:
|
…m#89165) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm#79553. Change-Id: Ic920b497c378cbc6a7cc2bbca300e9391893b3a0
…m#89165) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm#79553.
…m#89165) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm#79553. Change-Id: Ieb5681777d517fbc6ea59014af8babc9980c2b1e
For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative.
New PR of #79553.