Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSRA: Honor the register aversion decision during allocation #92709

Merged
merged 6 commits into from
Nov 9, 2023

Conversation

kunalspathak
Copy link
Member

Taken from idea by @jakobbotsch in #92586 (comment) and slightly tweaked it.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 27, 2023
@ghost ghost assigned kunalspathak Sep 27, 2023
@ghost
Copy link

ghost commented Sep 27, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Taken from idea by @jakobbotsch in #92586 (comment) and slightly tweaked it.

Author: kunalspathak
Assignees: kunalspathak
Labels:

area-CodeGen-coreclr

Milestone: -

Comment on lines 11522 to 11528
regMaskTP updatedPreferences = (preferences & ~currentInterval->registerAversion);
if (updatedPreferences != RBM_NONE)
{
// registerAversion is the best-effort opportunity. Do not update the preference
// if registerAversion contains all the registers present in preferences.
preferences = updatedPreferences;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defeats a lot of the purpose of having this -- I would expect the anti preferences to be more important than the preferences as allocating an anti-preferenced register is (always?) going to result in a spill, while allocating an unpreferred register is just going to result in a reg-reg move. Does this PR with this change fix my example in #92586?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we can get to (preferences & ~currentInterval->antiPreferences) == RBM_NONE that your patch has is because during mergeRegisterPreferences() we do not stop the update from happening if the registerPreferences contains registers we want to avoid. The right way to do this is to in addition of tracking registerAversion make sure registerPreferences is up to date with that information throughout the allocation, not just do it during allocation of that interval/RefPosition. Reason being, we check if the interval's assigned physical register (physReg) is already part of registerPreferences and if yes, we skip the allocation and just assign the same register at the RefPosition. This happens regardless of if that register is present in registerAversion mask. By making sure we never put register in registerPreferences that are already in registerAversion we avoid cases like following:

For V01, if we do not keep registerPreferences up to date, we end up with:

Interval  1: (V01) ref RefPositions {#0@0 #3@9 #18@35 #40@53} physReg:rdx Preferences=[rdx r8] Aversions=[rcx rdx]

When we allocate for RefPosition #0, we see rdx is in registerPreferences and assign it.

image

By keeping things up to date, we get nice diff for such scenarios:

image

Now, I still have a check of RBM_NONE which I ideally won't to get rid of and just add an assert(updatedPreferences != RBM_NONE), but I need to see if there are any corner JitStressRegs cases where this is not correct.

Does this PR with this change fix my example in #92586?

It does:

       mov      rax, rcx
       cmp      qword ptr [r8], 0
       mov      rcx, rdx
       cmovl    rcx, rax
       mov      qword ptr [r8], rcx
       mov      rcx, rdx
       mov      rdx, rax
       call     [helloworld._92586:Bar(long,long)]
       nop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during mergeRegisterPreferences() we do not stop the update from happening if the registerPreferences contains registers we want to avoid

IMO that behavior is fine and better than trying to keep registerPreferences as a mix of preferences and anti preferences during build, where both preferences and anti preferences may be found in any order. It seems simpler to me to only combine preferences and antiPreferences at the last step right before we need it (in effect making both sets "add-only" during the build step). But maybe it complicates things during allocation stage given that we also want to update the preferences there.

Now, I still have a check of RBM_NONE which I ideally won't to get rid of and just add an assert(updatedPreferences != RBM_NONE), but I need to see if there are any corner JitStressRegs cases where this is not correct.

I suppose it depends on what the model is... for example, I would think that preferences & ~antiPreferences == RBM_NONE just means that at some point the interval would prefer to be in registers that at other points the interval would prefer not to be in. That seems like a perfectly reasonable situation that can occur.

It does:

With the change from 985c96a I also think that this case shouldn't occur anymore, but the comment in reset is confusing to me since both preferences and anti preferences are best-effort and I think that anti preferences are more important.

We should check memory impact of this change too since adding a field to Interval is pretty much equivalent to adding a field to GenTree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check memory impact of this change too since adding a field to Interval is pretty much equivalent to adding a field to GenTree.

Can you remind me how to do it? I know that we spit that information as part of JITDump, but how do we get the aggregate data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flip the define here to enable memory stats in release builds:

#define MEASURE_MEM_ALLOC 0 // You can set this to 1 to get memory stats in retail, as well

Then set DOTNET_JitMemStats=1 and run superpmi.exe with release JIT for some collection (without parallelization). It should output the aggregate results at the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For benchmarks.run collection. Let's see how much asmdiff improvements we see and should justify the cost.

-          LSRA_Interval |  123169280 |   2.67%
+         LSRA_Interval |  135486208 |   2.92%

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally all for taking the memory hit to improve LSRA...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so I have updated the logic to what you have proposed, but we still need to make sure that we honor the aversion even while updating the preferences as done in here. Doing that gives us some more improvements as seen below for real-world win/x64 collection:

[14:12:51] 126 contexts with diffs (69 improvements, 16 regressions, 41 same size)
[14:12:51]   -272/+36 bytes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. There seems to be a somewhat significant TP hit, especially in MinOpts... is it just from the change in reset?

It would also be good to check the memory impact for MinOpts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a somewhat significant TP hit, especially in MinOpts

It is impacted more because of changes in mergePreferences which is called both by associateRefPositionWithInterval and updateRegisterPreferences. I could avoid calling updateRegisterPreferences at 2 places where we update registerAversion, but not sure if it is worth spreading the logic at other places.

?associateRefPosWithInterval@LinearScan@@AEAAXPEAVRefPosition@@@Z                    : 13402194 : +5.14%  : 58.86% : +0.0829%
??$select@$0A@@RegisterSelection@LinearScan@@QEAA_KPEAVInterval@@PEAVRefPosition@@@Z : 6813596  : +0.76%  : 29.93% : +0.0421%
?newInterval@LinearScan@@AEAAPEAVInterval@@W4var_types@@@Z                           : 2273723  : +2.56%  : 9.99%  : +0.0141%
?updateRegisterPreferences@Interval@@QEAAX_K@Z                                       : 228420   : +35.73% : 1.00%  : +0.0014%
?mergeRegisterPreferences@Interval@@QEAAX_K@Z                                        : 38714    : NA      : 0.17%  : +0.0002%

asp.net collection tier0 MinOpts collection of windows/x64:

-         LSRA_Interval |  181897840 |   7.39%
+         LSRA_Interval |  200087624 |   8.07%

@kunalspathak kunalspathak changed the title Honor the register aversion decision during allocation LSRA: Honor the register aversion decision during allocation Sep 27, 2023
@ghost ghost closed this Oct 28, 2023
@ghost
Copy link

ghost commented Oct 28, 2023

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@kunalspathak kunalspathak marked this pull request as ready for review November 7, 2023 22:14
@kunalspathak
Copy link
Member Author

@jakobbotsch - let me know if you have any other feedback.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The TP cost / memory impact is perhaps a bit high for the diffs we're getting, but given that linux-x64 host sees TP improvements that might not reflect much. I'll leave it up to you.

I think there's some follow-up simplifications we can make, e.g. mergeRegisterPreferences tries to guess when the mask represents aversions/anti-preferences. I don't think we need that kind of guessing anymore, so likely we can simplify a lot of stuff there.

@kunalspathak
Copy link
Member Author

kunalspathak commented Nov 8, 2023

I think there's some follow-up simplifications we can make, e.g. mergeRegisterPreferences tries to guess when the mask represents aversions/anti-preferences. I don't think we need that kind of guessing anymore, so likely we can simplify a lot of stuff there.

Given that this PR is a bit old, I will go ahead and merge this one and do a follow-up PR. I need to see how other minor changes that I called out in #92709 (comment) could change things.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants