-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MachineLICM] Correctly Apply Register Masks #95746
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Pierre van Houtryve (Pierre-vh) ChangesFull diff: https://github.com/llvm/llvm-project/pull/95746.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index 6c5170e918e00..237527a165e85 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -426,38 +426,33 @@ static bool InstructionStoresToFI(const MachineInstr *MI, int FI) {
static void applyBitsNotInRegMaskToRegUnitsMask(const TargetRegisterInfo &TRI,
BitVector &RUs,
const uint32_t *Mask) {
- // Iterate over the RegMask raw to avoid constructing a BitVector, which is
- // expensive as it implies dynamically allocating memory.
- //
- // We also work backwards.
+ const unsigned NumRUs = TRI.getNumRegUnits();
+ BitVector RUsInMask(NumRUs);
+ BitVector RUsNotInMask(NumRUs);
const unsigned NumRegs = TRI.getNumRegs();
const unsigned MaskWords = (NumRegs + 31) / 32;
for (unsigned K = 0; K < MaskWords; ++K) {
- // We want to set the bits that aren't in RegMask, so flip it.
- uint32_t Word = ~Mask[K];
-
- // Iterate all set bits, starting from the right.
- while (Word) {
- const unsigned SetBitIdx = countr_zero(Word);
-
- // The bits are numbered from the LSB in each word.
- const unsigned PhysReg = (K * 32) + SetBitIdx;
-
- // Clear the bit at SetBitIdx. Doing it this way appears to generate less
- // instructions on x86. This works because negating a number will flip all
- // the bits after SetBitIdx. So (Word & -Word) == (1 << SetBitIdx), but
- // faster.
- Word ^= Word & -Word;
-
+ uint32_t Word = Mask[K];
+ for (unsigned Bit = 0; Bit < 32; ++Bit) {
+ const unsigned PhysReg = (K * 32) + Bit;
if (PhysReg == NumRegs)
- return;
+ break;
- if (PhysReg) {
- for (MCRegUnitIterator RUI(PhysReg, &TRI); RUI.isValid(); ++RUI)
- RUs.set(*RUI);
- }
+ if (!PhysReg)
+ continue;
+
+ // Extract the bit and apply it to the appropriate mask.
+ auto &Mask = ((Word >> Bit) & 1) ? RUsInMask : RUsNotInMask;
+ for (MCRegUnitIterator RUI(PhysReg, &TRI); RUI.isValid(); ++RUI)
+ Mask.set(*RUI);
}
}
+
+ // If a RU needs to be set because it's not in the RegMask, only set it
+ // if all registers from that RU are not in the mask either.
+ RUsNotInMask &= RUsInMask.flip();
+
+ RUs |= RUsNotInMask;
}
/// Examine the instruction for potentai LICM candidate. Also
diff --git a/llvm/test/CodeGen/AMDGPU/indirect-call.ll b/llvm/test/CodeGen/AMDGPU/indirect-call.ll
index da8aa54469835..7799b9509ceb0 100644
--- a/llvm/test/CodeGen/AMDGPU/indirect-call.ll
+++ b/llvm/test/CodeGen/AMDGPU/indirect-call.ll
@@ -886,12 +886,12 @@ define void @test_indirect_call_vgpr_ptr_inreg_arg(ptr %fptr) {
; GCN-NEXT: v_writelane_b32 v40, s62, 30
; GCN-NEXT: v_writelane_b32 v40, s63, 31
; GCN-NEXT: s_mov_b64 s[6:7], exec
+; GCN-NEXT: s_movk_i32 s4, 0x7b
; GCN-NEXT: .LBB6_1: ; =>This Inner Loop Header: Depth=1
; GCN-NEXT: v_readfirstlane_b32 s8, v0
; GCN-NEXT: v_readfirstlane_b32 s9, v1
; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[8:9], v[0:1]
; GCN-NEXT: s_and_saveexec_b64 s[10:11], vcc
-; GCN-NEXT: s_movk_i32 s4, 0x7b
; GCN-NEXT: s_swappc_b64 s[30:31], s[8:9]
; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1
; GCN-NEXT: s_xor_b64 exec, exec, s[10:11]
@@ -980,12 +980,12 @@ define void @test_indirect_call_vgpr_ptr_inreg_arg(ptr %fptr) {
; GISEL-NEXT: v_writelane_b32 v40, s62, 30
; GISEL-NEXT: v_writelane_b32 v40, s63, 31
; GISEL-NEXT: s_mov_b64 s[6:7], exec
+; GISEL-NEXT: s_movk_i32 s4, 0x7b
; GISEL-NEXT: .LBB6_1: ; =>This Inner Loop Header: Depth=1
; GISEL-NEXT: v_readfirstlane_b32 s8, v0
; GISEL-NEXT: v_readfirstlane_b32 s9, v1
; GISEL-NEXT: v_cmp_eq_u64_e32 vcc, s[8:9], v[0:1]
; GISEL-NEXT: s_and_saveexec_b64 s[10:11], vcc
-; GISEL-NEXT: s_movk_i32 s4, 0x7b
; GISEL-NEXT: s_swappc_b64 s[30:31], s[8:9]
; GISEL-NEXT: ; implicit-def: $vgpr0
; GISEL-NEXT: s_xor_b64 exec, exec, s[10:11]
|
You shouldn't need both diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index 237527a165e8..dd71a9a3fc79 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -424,38 +424,33 @@ static bool InstructionStoresToFI(const MachineInstr *MI, int FI) {
}
static void applyBitsNotInRegMaskToRegUnitsMask(const TargetRegisterInfo &TRI,
BitVector &RUs,
const uint32_t *Mask) {
const unsigned NumRUs = TRI.getNumRegUnits();
- BitVector RUsInMask(NumRUs);
- BitVector RUsNotInMask(NumRUs);
+ BitVector ClobberedRUs(NumRUs, true);
const unsigned NumRegs = TRI.getNumRegs();
const unsigned MaskWords = (NumRegs + 31) / 32;
for (unsigned K = 0; K < MaskWords; ++K) {
uint32_t Word = Mask[K];
for (unsigned Bit = 0; Bit < 32; ++Bit) {
const unsigned PhysReg = (K * 32) + Bit;
if (PhysReg == NumRegs)
break;
if (!PhysReg)
continue;
- // Extract the bit and apply it to the appropriate mask.
- auto &Mask = ((Word >> Bit) & 1) ? RUsInMask : RUsNotInMask;
- for (MCRegUnitIterator RUI(PhysReg, &TRI); RUI.isValid(); ++RUI)
- Mask.set(*RUI);
+ if ((Word >> Bit) & 1) {
+ for (MCRegUnitIterator RUI(PhysReg, &TRI); RUI.isValid(); ++RUI)
+ ClobberedRUs.reset(*RUI);
+ }
}
}
- // If a RU needs to be set because it's not in the RegMask, only set it
- // if all registers from that RU are not in the mask either.
- RUsNotInMask &= RUsInMask.flip();
-
- RUs |= RUsNotInMask;
+ RUs |= ClobberedRUs;
}
/// Examine the instruction for potentai LICM candidate. Also
/// gather register def and frame object update information.
void MachineLICMBase::ProcessMI(MachineInstr *MI, BitVector &RUDefs,
BitVector &RUClobbers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to describe the regression in the first line and leave the hash for the body of the message
This commit causes miscompilations for aarch64 (where the generated code does the wrong thing). This can be observed in openh264 unit tests, that suddenly are failing where they previously were passing. To reproduce, run the following sequence on aarch64 linux:
For closer investigation of what's happening, the culprit object file for this test is If there's no straightforward fix in sight, can we revert this change until the issue is resolved? (As this commit is a fix for a regression, do we need to revert that original change meanwhile as well?) |
@mstorsjo Did you verify that reverting this, and only this commit fixes it? Can you please create a lit test case that shows how codegen goes wrong? I don't have access to a AArch64 machine. I would like to first see what the issue is before deciding whether to revert or not. |
I bisected it; with commit 770393b it fails, with the preceding one it works, so it's definitely pinpointed to this one. It doesn't revert cleanly on git main, as 457e895 also touches the same file.
First off - I'm approaching this primarily as a integrator, I continuously build latest llvm/clang and test projects with it, so I'm not very well versed with the LLVM internals. I'm not very familiar with pinpointing the error in codegen... (I've done it some times, but it takes me quite a long time, and I'm doing this as a side hobby essentially.) I tried adding I guess I can try to split the source file into minimal functions, if that allows pinpointing the codegen issue better. |
You can just do Thank you :) |
Ok, here goes:
Otherwise I'm afraid that this actually would be a lingering aarch64 target bug that just happens to appear due to this, otherwise potentially valid, change. For that, we'd have to invoke some aarch64 target maintainer... |
Investigating right now. As it doesn't break any CI I won't revert yet if that's okay. I'll revert if some AArch64 maintainer wants me to (@aemerson / @sjoerdmeijer perhaps?) or if I can't tell what the problem is by tomorrow. |
@mstorsjo I must be missing some flags. When I run it locally like this, LICM doesn't kick in.
I also had to add names to the EH_LABEL instructions because they didn't have any and the MIR couldn't be parsed without that. The only difference is that in the "after" patch, we have
being hoisted I wonder if some clobber is missing from an instruction somewhere and this just exposes it? Just to be sure, you checked latest main too and it still fails over there? |
I also observe that some predecessors are added to the IR after it goes through In other words, MachineLICM isn't at fault but some earlier pass that changes the CFG is. Can you also provide the unoptimized IR? You can get it through clang by disabling all LLVM passes ( |
Sorry, no idea about that... In this case, it's triggered with a compilation command like this:
Yes, it's still failing with latest main too. |
@davemgreen might be able to help. |
Sure: With this, I can repro the same issue by compiling this IR as |
I think I have a fix, the issue is with how we apply the regmask indeed so this patch is at fault. |
Fix regression introduced in d4b8b72