Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LivePhysRegs] Add callee-saved regs from MFI in addLiveOutsNoPristines. #73553

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Nov 27, 2023

Some callee-saved registers may be implicitly used by a
return/terminator instruction in return blocks, e.g. LR on ARM. When
computing the live-outs for a return block, add all callee-saved
registers from the current frame info. This is in line with how PEI
updates liveness in updateLiveness.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Almost all est changes are in MTE, where we have a sequence of
tPUSH 14 /* CC::al /, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
tPOP_RET 14 /
CC::al */, $noreg, def $r7, def $pc

With this patch, the exit block is assumed to use LR implicitly, even
though POP_RET restores it to PC and doesn't use LR. I am not sure if
there's a way around that without more accurate modeling of uses of LR.

It would be great to teach the machine-verifier to check liveness of LR
on ARM, but I couldn't find an appropriate hook to teach it abou

@fhahn fhahn force-pushed the users/fhahn/live-phys-regs-arm-callee-saved-mfi branch from b6d1671 to 2336350 Compare November 27, 2023 18:11
Some callee-saved registers may be implicitly used by a
return/terminator instruction in return blocks, e.g. LR on ARM. When
computing the live-ins for a return block, add all callee-saved
registers from the current frame info. This is in line with how PEI
updates liveness in updateLiveness.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Almost all est changes are in MTE, where we have a sequence of
    tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
    tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc

With this patch, the exit block is assumed to use LR implicitly, even
though POP_RET restores it to PC and doesn't use LR. I am not sure if
there's a way around that without more accurate modeling of uses of LR.

It would be great to teach the machine-verifier to check liveness of LR
on ARM, but I couldn't find an appropriate hook to teach it about which
instructions implicitly use LR.
@fhahn fhahn force-pushed the users/fhahn/live-phys-regs-arm-callee-saved-mfi branch from 2336350 to 9f71389 Compare November 27, 2023 18:13
Copy link

github-actions bot commented Nov 27, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@fhahn fhahn requested a review from MatzeB November 27, 2023 18:19
@MatzeB MatzeB requested a review from qcolombet November 27, 2023 18:50
@MatzeB
Copy link
Contributor

MatzeB commented Nov 27, 2023

Is this about computing live-outs of the return block as the code suggests? (The summary currently talks about live-ins?)

@MatzeB
Copy link
Contributor

MatzeB commented Nov 27, 2023

I don't remember the situation on aarch64, but if by chance LR is modeled with this "pristine register" concept, then maybe the caller needs to use addLiveIns() rather than addLiveInsNoPristines()?

@fhahn
Copy link
Contributor Author

fhahn commented Nov 27, 2023

Is this about computing live-outs of the return block as the code suggests? (The summary currently talks about live-ins?)

Thanks, it should be live-outs in the description, updated!

I don't remember the situation on aarch64, but if by chance LR is modeled with this "pristine register" concept, then maybe the caller needs to use addLiveIns() rather than addLiveInsNoPristines()?

I am not sure, but looking at updateLiveness (https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/PrologEpilogInserter.cpp#L582) it looks like it uses the saved registers from MFI. Pristine registers I think contain all callee-saved registers for the target, which may be overestimating the liveness quite a bit.

@MatzeB
Copy link
Contributor

MatzeB commented Nov 27, 2023

I still feel like I am missing something here, and it's been a while since I looked at this. But my impression is that LLVM modeling is "cheating" a bit in that technically all the callee-saves should be implicit-uses of the return instruction (and not live afterwards) but we don't model it that way and instead make them appear as live-outs of the return block? So doesn't seem like overestimating the liveness because of our current modeling?

If I'm reading your patch correctly it would mean we would start adding all pristine registers for the return block[1]. I am just confused so far because this is happening in a function called addLiveOutsNoPristines...

[1] Pristine == "callee saved but happens to be unused and hence not saved/restored" which is what happens when you remove that Info.isRestored() check?

@MatzeB
Copy link
Contributor

MatzeB commented Nov 27, 2023

Which code/pass is using LivePhysRegs that is causing you trouble here?

@MatzeB
Copy link
Contributor

MatzeB commented Nov 27, 2023

Looking at the MachineOutliner it seems to already be using addLiveOuts() so I must be missing something on why this change has any effect anyway...

@fhahn
Copy link
Contributor Author

fhahn commented Nov 27, 2023

I still feel like I am missing something here, and it's been a while since I looked at this. But my impression is that LLVM modeling is "cheating" a bit in that technically all the callee-saves should be implicit-uses of the return instruction (and not live afterwards) but we don't model it that way and instead make them appear as live-outs of the return block? So doesn't seem like overestimating the liveness because of our current modeling?

Yep, the current modeling in general may overestimates the liveness. With overestimating I meant that with this patch we overestimate in more places, but that's by design.

If I'm reading your patch correctly it would mean we would start adding all pristine registers for the return block[1]. I am just confused so far because this is happening in a function called addLiveOutsNoPristines...

I am not sure what exactly the definition for pristines is (and not super familiar with this code in general), and maybe the function name needs to be changed. The main thing to note is that it won't add all pristines; addPristines adds all callee saved registers (via TRI) and removes the ones which are in he machine function's CalleeSavedInfo. The patch adds pristines, but only those that have been added to CalleeSavedInfo.

[1] Pristine == "callee saved but happens to be unused and hence not saved/restored" which is what happens when you remove that Info.isRestored() check?

Which code/pass is using LivePhysRegs that is causing you trouble here?

The issue is in BranchFolding, where after splitting, the liveness of LR isn't preserved in the new or merged blocks. It could be fixed locally there by iterating over the registers in CalleeSavedInfo and checking if they were live-in in the original block (see below), but I am worried that fixing this locally leaves us open for similar issues in other parts of the codebase.

@qcolombet
Copy link
Collaborator

I haven't looked closely to the patch, but I share @MatzeB's concerns here.

Essentially this patch is reverting https://reviews.llvm.org/D36160, which was fixing a modeling issue with LR on ARM to begin with!

@qcolombet qcolombet requested a review from kparzysz November 28, 2023 11:05
@fhahn
Copy link
Contributor Author

fhahn commented Nov 28, 2023

I haven't looked closely to the patch, but I share @MatzeB's concerns here.

Essentially this patch is reverting https://reviews.llvm.org/D36160, which was fixing a modeling issue with LR on ARM to begin with!

Thanks for sharing the additional context and where IsRestored is actually set. Taking a look at the original patch, it seems like it doesn't properly account for the fact that there could be multiple return blocks which may not be using POP to restore LR to PC. This could be due to shrink-wrapping + tail-calls in some exit blocks (like in outlined-fn-may-clobber-lr-in-caller.ll) or possibly some return instructions not using POP.

IIUC D36160 tries to track LR liveness more accurately and doesn't fix a miscompile, but potentially introduced an mis-compile due to underestimating liveness of LR.

I don't think the current interface allows to properly check if all exit blocks are covered by POP insts in restoreCalleeSavedRegisters, as it works on single return blocks only.

Without changing the API, we could check if LR is marked as not restored, and if it is check if there are multiple return blocks, as sketched in https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973.

This could be further refined to check if POP could be used for all returns (not sure if it is worth it given the small impact on the tests) or the API could be changed to pass all return blocks to avoid re-scanning for returns on each call (not sure if we should extend the general API even more for this workaround though). WDYT

@kparzysz
Copy link
Contributor

I think we should get rid of the "restored" flag. The problem is specific to LR, and how LR is handled depends on the calling convention. If the convention expects the caller to preserve its LR, then LR is never live out of the call. If the callee restores the LR to the value before the call (e.g. by popping it from the stack), then LR is a live-out.

I haven't looked at the updated testcases in detail, but I see that most of the changes are in treating LR as live (whereas it was dead before). At the first glance that doesn't look right.

@fhahn
Copy link
Contributor Author

fhahn commented Nov 28, 2023

@kparzysz please take a loo at https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973, which has much more limited impact.

I haven't looked at the updated testcases in detail, but I see that most of the changes are in treating LR as live (whereas it was dead before). At the first glance that doesn't look right.

We might now overestimate liveness, which only results in missed perf correct? Although I think this is mostly theoretical at this point, as there's no test cases that show that. The issue is that underestimating as we currently do leads to incorrect results, in particular with tail calls that use LR implicitly. If LR isn't marked as live in that case, other passes are free to clobber LR (e.g. the machine-outliner by introducing calls using BL, as in https://github.com/llvm/llvm-project/blob/20f634f275b431ff256ba45cbcbb6dc5bd945fb3/llvm/test/CodeGen/Thumb2/outlined-fn-may-clobber-lr-in-caller.ll

@kparzysz
Copy link
Contributor

@kparzysz please take a loo at https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973, which has much more limited impact.

If it's an urgent issue, then it's fine to have a limited-impact fix. I think the root issue still remains though.

If LR isn't marked as live in that case, other passes are free to clobber LR (e.g. the machine-outliner by introducing calls using BL, as in https://github.com/llvm/llvm-project/blob/20f634f275b431ff256ba45cbcbb6dc5bd945fb3/llvm/test/CodeGen/Thumb2/outlined-fn-may-clobber-lr-in-caller.ll

Does the outliner run after PEI? If the callee has to ensure the correct value of the LR before returning (via a tail call for example), then the PEI code should take care of that. It would see the call to the outlined procedure and do the right thing. After PEI the liveness of LR needs to be accurately reflected and tail calls could (should?) always "use" LR. That would either prevent outlining or cause the outliner to preserve LR across introduced calls.

On the caller side, the call instruction clobbers LR, so it can't really be considered live-out (unless the calling convention requires it to be preserved, which I think it doesn't).

@efriedma-quic
Copy link
Collaborator

After PEI the liveness of LR needs to be accurately reflected and tail calls could (should?) always "use" LR. That would either prevent outlining or cause the outliner to preserve LR across introduced calls.

I'll elaborate on this a bit. I think long-term, the way we want to model this is that pre-PEI, LR is treated as if it were a callee-save register. (Nothing outside the prologue/epilogue should interact with the return address directly except for a few special cases like __builtin_return_address.) Then PEI marks the LR use on the relevant "return" instructions, and post-PEI it's not callee-save anymore. I think that correctly models the underlying weirdness: at a machine level, the return address is basically just an argument to the function, but it's special to PEI because of the interaction with the frame layout/exception handling/pop instructions/etc. Stuff before PEI doesn't need to be aware it's an argument, and stuff after PEI can just treat it as a normal register.

On the caller side, the call instruction clobbers LR, so it can't really be considered live-out

Correct, it's not really live-out.


Short-term, can we make this fix a bit more targeted? The point of the "isRestored" bit is that it's supposed to avoid marking liveness when the function ends in a "pop pc". The relevant code is ARMFrameLowering::emitPopInst: it calls Info.setRestored(false); to do precisely this. But it's triggering in a case where it shouldn't. I think the problem is that the code isn't accounting for the possibility that there are other paths that return from the function.

@efriedma-quic
Copy link
Collaborator

Just looked at https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973 ; that seems roughly appropriate. It's a little ugly to set the bit to false, then set it back to true, though; I'd rather just explicitly check whether all return instructions are LDMIA_RET/t2LDMIA_RET/tPOP_RET.

@fhahn
Copy link
Contributor Author

fhahn commented Dec 13, 2023

Just looked at https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973 ; that seems roughly appropriate. It's a little ugly to set the bit to false, then set it back to true, though; I'd rather just explicitly check whether all return instructions are LDMIA_RET/t2LDMIA_RET/tPOP_RET.

Thanks @efriedma-quic, I'll prepare an updated version

@fhahn
Copy link
Contributor Author

fhahn commented Dec 14, 2023

Thanks for all the feedback. I created #75527 which uses the approach based on https://gist.github.com/fhahn/67937125b64440a8a414909c4a1b7973 with the tweak suggested by @efriedma-quic

@fhahn fhahn closed this Dec 14, 2023
@fhahn fhahn deleted the users/fhahn/live-phys-regs-arm-callee-saved-mfi branch December 14, 2023 20:43
fhahn added a commit that referenced this pull request Dec 20, 2023
…75527)

emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.

Check all terminators for the function before clearing Restored.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Alternative to #73553
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 4, 2024
…lvm#75527)

emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.

Check all terminators for the function before clearing Restored.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Alternative to llvm#73553

(cherry-picked from b1a5ee1)
nikic pushed a commit to nikic/llvm-project that referenced this pull request Jan 9, 2024
…lvm#75527)

emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.

Check all terminators for the function before clearing Restored.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Alternative to llvm#73553

(cherry picked from commit b1a5ee1)
MingcongBai pushed a commit to AOSC-Tracking/llvm-project that referenced this pull request Mar 26, 2024
…lvm#75527)

emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.

Check all terminators for the function before clearing Restored.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Alternative to llvm#73553

(cherry picked from commit b1a5ee1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants