-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LivePhysRegs] Add callee-saved regs from MFI in addLiveOutsNoPristines. #73553
[LivePhysRegs] Add callee-saved regs from MFI in addLiveOutsNoPristines. #73553
Conversation
b6d1671
to
2336350
Compare
Some callee-saved registers may be implicitly used by a return/terminator instruction in return blocks, e.g. LR on ARM. When computing the live-ins for a return block, add all callee-saved registers from the current frame info. This is in line with how PEI updates liveness in updateLiveness. This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll where the machine-outliner previously introduced BLs that clobbered LR which in turn is used by the tail call return. Almost all est changes are in MTE, where we have a sequence of tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc With this patch, the exit block is assumed to use LR implicitly, even though POP_RET restores it to PC and doesn't use LR. I am not sure if there's a way around that without more accurate modeling of uses of LR. It would be great to teach the machine-verifier to check liveness of LR on ARM, but I couldn't find an appropriate hook to teach it about which instructions implicitly use LR.
2336350
to
9f71389
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
Is this about computing live-outs of the return block as the code suggests? (The summary currently talks about live-ins?) |
I don't remember the situation on aarch64, but if by chance LR is modeled with this "pristine register" concept, then maybe the caller needs to use |
Thanks, it should be live-outs in the description, updated!
I am not sure, but looking at |
I still feel like I am missing something here, and it's been a while since I looked at this. But my impression is that LLVM modeling is "cheating" a bit in that technically all the callee-saves should be implicit-uses of the return instruction (and not live afterwards) but we don't model it that way and instead make them appear as live-outs of the return block? So doesn't seem like overestimating the liveness because of our current modeling? If I'm reading your patch correctly it would mean we would start adding all pristine registers for the return block[1]. I am just confused so far because this is happening in a function called [1] Pristine == "callee saved but happens to be unused and hence not saved/restored" which is what happens when you remove that |
Which code/pass is using |
Looking at the |
Yep, the current modeling in general may overestimates the liveness. With overestimating I meant that with this patch we overestimate in more places, but that's by design.
I am not sure what exactly the definition for pristines is (and not super familiar with this code in general), and maybe the function name needs to be changed. The main thing to note is that it won't add all pristines;
The issue is in |
I haven't looked closely to the patch, but I share @MatzeB's concerns here. Essentially this patch is reverting https://reviews.llvm.org/D36160, which was fixing a modeling issue with LR on ARM to begin with! |
Thanks for sharing the additional context and where IIUC D36160 tries to track LR liveness more accurately and doesn't fix a miscompile, but potentially introduced an mis-compile due to underestimating liveness of LR. I don't think the current interface allows to properly check if all exit blocks are covered by POP insts in Without changing the API, we could check if LR is marked as not restored, and if it is check if there are multiple return blocks, as sketched in https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973. This could be further refined to check if POP could be used for all returns (not sure if it is worth it given the small impact on the tests) or the API could be changed to pass all return blocks to avoid re-scanning for returns on each call (not sure if we should extend the general API even more for this workaround though). WDYT |
I think we should get rid of the "restored" flag. The problem is specific to LR, and how LR is handled depends on the calling convention. If the convention expects the caller to preserve its LR, then LR is never live out of the call. If the callee restores the LR to the value before the call (e.g. by popping it from the stack), then LR is a live-out. I haven't looked at the updated testcases in detail, but I see that most of the changes are in treating LR as live (whereas it was dead before). At the first glance that doesn't look right. |
@kparzysz please take a loo at https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973, which has much more limited impact.
We might now overestimate liveness, which only results in missed perf correct? Although I think this is mostly theoretical at this point, as there's no test cases that show that. The issue is that underestimating as we currently do leads to incorrect results, in particular with tail calls that use LR implicitly. If LR isn't marked as live in that case, other passes are free to clobber LR (e.g. the machine-outliner by introducing calls using BL, as in https://github.com/llvm/llvm-project/blob/20f634f275b431ff256ba45cbcbb6dc5bd945fb3/llvm/test/CodeGen/Thumb2/outlined-fn-may-clobber-lr-in-caller.ll |
If it's an urgent issue, then it's fine to have a limited-impact fix. I think the root issue still remains though.
Does the outliner run after PEI? If the callee has to ensure the correct value of the LR before returning (via a tail call for example), then the PEI code should take care of that. It would see the call to the outlined procedure and do the right thing. After PEI the liveness of LR needs to be accurately reflected and tail calls could (should?) always "use" LR. That would either prevent outlining or cause the outliner to preserve LR across introduced calls. On the caller side, the call instruction clobbers LR, so it can't really be considered live-out (unless the calling convention requires it to be preserved, which I think it doesn't). |
I'll elaborate on this a bit. I think long-term, the way we want to model this is that pre-PEI, LR is treated as if it were a callee-save register. (Nothing outside the prologue/epilogue should interact with the return address directly except for a few special cases like __builtin_return_address.) Then PEI marks the LR use on the relevant "return" instructions, and post-PEI it's not callee-save anymore. I think that correctly models the underlying weirdness: at a machine level, the return address is basically just an argument to the function, but it's special to PEI because of the interaction with the frame layout/exception handling/pop instructions/etc. Stuff before PEI doesn't need to be aware it's an argument, and stuff after PEI can just treat it as a normal register.
Correct, it's not really live-out. Short-term, can we make this fix a bit more targeted? The point of the "isRestored" bit is that it's supposed to avoid marking liveness when the function ends in a "pop pc". The relevant code is ARMFrameLowering::emitPopInst: it calls |
Just looked at https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973 ; that seems roughly appropriate. It's a little ugly to set the bit to false, then set it back to true, though; I'd rather just explicitly check whether all return instructions are LDMIA_RET/t2LDMIA_RET/tPOP_RET. |
Thanks @efriedma-quic, I'll prepare an updated version |
Thanks for all the feedback. I created #75527 which uses the approach based on https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973 with the tweak suggested by @efriedma-quic |
…75527) emitPopInst checks a single function exit MBB. If other paths also exit the function and any of there terminators uses LR implicitly, it is not save to clear the Restored bit. Check all terminators for the function before clearing Restored. This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll where the machine-outliner previously introduced BLs that clobbered LR which in turn is used by the tail call return. Alternative to #73553
…lvm#75527) emitPopInst checks a single function exit MBB. If other paths also exit the function and any of there terminators uses LR implicitly, it is not save to clear the Restored bit. Check all terminators for the function before clearing Restored. This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll where the machine-outliner previously introduced BLs that clobbered LR which in turn is used by the tail call return. Alternative to llvm#73553 (cherry-picked from b1a5ee1)
…lvm#75527) emitPopInst checks a single function exit MBB. If other paths also exit the function and any of there terminators uses LR implicitly, it is not save to clear the Restored bit. Check all terminators for the function before clearing Restored. This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll where the machine-outliner previously introduced BLs that clobbered LR which in turn is used by the tail call return. Alternative to llvm#73553 (cherry picked from commit b1a5ee1)
…lvm#75527) emitPopInst checks a single function exit MBB. If other paths also exit the function and any of there terminators uses LR implicitly, it is not save to clear the Restored bit. Check all terminators for the function before clearing Restored. This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll where the machine-outliner previously introduced BLs that clobbered LR which in turn is used by the tail call return. Alternative to llvm#73553 (cherry picked from commit b1a5ee1)
Some callee-saved registers may be implicitly used by a
return/terminator instruction in return blocks, e.g. LR on ARM. When
computing the live-outs for a return block, add all callee-saved
registers from the current frame info. This is in line with how PEI
updates liveness in updateLiveness.
This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.
Almost all est changes are in MTE, where we have a sequence of
tPUSH 14 /* CC::al /, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
tPOP_RET 14 / CC::al */, $noreg, def $r7, def $pc
With this patch, the exit block is assumed to use LR implicitly, even
though POP_RET restores it to PC and doesn't use LR. I am not sure if
there's a way around that without more accurate modeling of uses of LR.
It would be great to teach the machine-verifier to check liveness of LR
on ARM, but I couldn't find an appropriate hook to teach it abou