-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSRA: Honor the register aversion decision during allocation #92709
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsTaken from idea by @jakobbotsch in #92586 (comment) and slightly tweaked it.
|
src/coreclr/jit/lsra.cpp
Outdated
regMaskTP updatedPreferences = (preferences & ~currentInterval->registerAversion); | ||
if (updatedPreferences != RBM_NONE) | ||
{ | ||
// registerAversion is the best-effort opportunity. Do not update the preference | ||
// if registerAversion contains all the registers present in preferences. | ||
preferences = updatedPreferences; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defeats a lot of the purpose of having this -- I would expect the anti preferences to be more important than the preferences as allocating an anti-preferenced register is (always?) going to result in a spill, while allocating an unpreferred register is just going to result in a reg-reg move. Does this PR with this change fix my example in #92586?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we can get to (preferences & ~currentInterval->antiPreferences) == RBM_NONE
that your patch has is because during mergeRegisterPreferences()
we do not stop the update from happening if the registerPreferences
contains registers we want to avoid. The right way to do this is to in addition of tracking registerAversion
make sure registerPreferences
is up to date with that information throughout the allocation, not just do it during allocation of that interval/RefPosition. Reason being, we check if the interval's assigned physical register (physReg
) is already part of registerPreferences
and if yes, we skip the allocation and just assign the same register at the RefPosition. This happens regardless of if that register is present in registerAversion
mask. By making sure we never put register in registerPreferences
that are already in registerAversion
we avoid cases like following:
For V01
, if we do not keep registerPreferences
up to date, we end up with:
Interval 1: (V01) ref RefPositions {#0@0 #3@9 #18@35 #40@53} physReg:rdx Preferences=[rdx r8] Aversions=[rcx rdx]
When we allocate for RefPosition #0
, we see rdx
is in registerPreferences
and assign it.
By keeping things up to date, we get nice diff for such scenarios:
Now, I still have a check of RBM_NONE
which I ideally won't to get rid of and just add an assert(updatedPreferences != RBM_NONE)
, but I need to see if there are any corner JitStressRegs cases where this is not correct.
Does this PR with this change fix my example in #92586?
It does:
mov rax, rcx
cmp qword ptr [r8], 0
mov rcx, rdx
cmovl rcx, rax
mov qword ptr [r8], rcx
mov rcx, rdx
mov rdx, rax
call [helloworld._92586:Bar(long,long)]
nop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during mergeRegisterPreferences() we do not stop the update from happening if the registerPreferences contains registers we want to avoid
IMO that behavior is fine and better than trying to keep registerPreferences
as a mix of preferences and anti preferences during build, where both preferences and anti preferences may be found in any order. It seems simpler to me to only combine preferences
and antiPreferences
at the last step right before we need it (in effect making both sets "add-only" during the build step). But maybe it complicates things during allocation stage given that we also want to update the preferences there.
Now, I still have a check of RBM_NONE which I ideally won't to get rid of and just add an assert(updatedPreferences != RBM_NONE), but I need to see if there are any corner JitStressRegs cases where this is not correct.
I suppose it depends on what the model is... for example, I would think that preferences & ~antiPreferences == RBM_NONE
just means that at some point the interval would prefer to be in registers that at other points the interval would prefer not to be in. That seems like a perfectly reasonable situation that can occur.
It does:
With the change from 985c96a I also think that this case shouldn't occur anymore, but the comment in reset
is confusing to me since both preferences and anti preferences are best-effort and I think that anti preferences are more important.
We should check memory impact of this change too since adding a field to Interval
is pretty much equivalent to adding a field to GenTree
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check memory impact of this change too since adding a field to Interval is pretty much equivalent to adding a field to GenTree.
Can you remind me how to do it? I know that we spit that information as part of JITDump, but how do we get the aggregate data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flip the define here to enable memory stats in release builds:
Line 511 in dd8e9aa
#define MEASURE_MEM_ALLOC 0 // You can set this to 1 to get memory stats in retail, as well |
Then set DOTNET_JitMemStats=1
and run superpmi.exe with release JIT for some collection (without parallelization). It should output the aggregate results at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For benchmarks.run collection. Let's see how much asmdiff improvements we see and should justify the cost.
- LSRA_Interval | 123169280 | 2.67%
+ LSRA_Interval | 135486208 | 2.92%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm personally all for taking the memory hit to improve LSRA...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, so I have updated the logic to what you have proposed, but we still need to make sure that we honor the aversion
even while updating the preferences as done in here. Doing that gives us some more improvements as seen below for real-world win/x64 collection:
[14:12:51] 126 contexts with diffs (69 improvements, 16 regressions, 41 same size)
[14:12:51] -272/+36 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. There seems to be a somewhat significant TP hit, especially in MinOpts... is it just from the change in reset
?
It would also be good to check the memory impact for MinOpts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a somewhat significant TP hit, especially in MinOpts
It is impacted more because of changes in mergePreferences
which is called both by associateRefPositionWithInterval
and updateRegisterPreferences
. I could avoid calling updateRegisterPreferences
at 2 places where we update registerAversion
, but not sure if it is worth spreading the logic at other places.
?associateRefPosWithInterval@LinearScan@@AEAAXPEAVRefPosition@@@Z : 13402194 : +5.14% : 58.86% : +0.0829%
??$select@$0A@@RegisterSelection@LinearScan@@QEAA_KPEAVInterval@@PEAVRefPosition@@@Z : 6813596 : +0.76% : 29.93% : +0.0421%
?newInterval@LinearScan@@AEAAPEAVInterval@@W4var_types@@@Z : 2273723 : +2.56% : 9.99% : +0.0141%
?updateRegisterPreferences@Interval@@QEAAX_K@Z : 228420 : +35.73% : 1.00% : +0.0014%
?mergeRegisterPreferences@Interval@@QEAAX_K@Z : 38714 : NA : 0.17% : +0.0002%
asp.net collection tier0 MinOpts collection of windows/x64:
- LSRA_Interval | 181897840 | 7.39%
+ LSRA_Interval | 200087624 | 8.07%
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
@jakobbotsch - let me know if you have any other feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The TP cost / memory impact is perhaps a bit high for the diffs we're getting, but given that linux-x64 host sees TP improvements that might not reflect much. I'll leave it up to you.
I think there's some follow-up simplifications we can make, e.g. mergeRegisterPreferences
tries to guess when the mask represents aversions/anti-preferences. I don't think we need that kind of guessing anymore, so likely we can simplify a lot of stuff there.
Given that this PR is a bit old, I will go ahead and merge this one and do a follow-up PR. I need to see how other minor changes that I called out in #92709 (comment) could change things. |
Taken from idea by @jakobbotsch in #92586 (comment) and slightly tweaked it.