Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Process all register kills in bulk in LSRA #101760

Merged
merged 8 commits into from
May 2, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented May 1, 2024

RefTypeKill is one of the most common types of ref position because we
create a separate one for every register whenever registers are killed,
and registers are killed with very high frequency (for example, all
callee trashed registers at calls). This can be seen in metrics: in
libraries_tests.run, RefTypeKill is as common as RefTypeUse and
RefTypeDef combined on x64, and 3x as common as them combined on arm64
due to having more registers.

The main complication around changing RefTypeKill to represent killing
a full set of registers is the fact that we want to be able to easily
look up the next RefPosition at which a particular register is killed
or used. Today, this is very simple: all kills (RefTypeKill) and uses
(RefTypeFixedReg) participate in an intrusive linked list, and finding
the next RefPosition is as simple as following this list.

This PR changes LSRA to model the kills in bulk instead of creating an
individual RefPosition for every kill. To handle finding the next
fixed reg ref position LSRA is updated to take into account that there
can be RefPosition's corresponding to a register from two different
linked lists: the RefTypeFixedReg intrusive linked list, and also a
linked list of all RefTypeKill ref positions. The latter linked list
may take some searching (not every kill necessarily kills the register
we are looking for), but the reduction in number of RefPosition's
created more than makes up for this.

No diffs are expected.

This change reduces memory usage of the JIT by around 12% for arm64,
and by around 7% for x64.

`RefTypeKill` is one of the most common type of ref position because we
create a separate one for every register whenever registers are killed,
and registers are killed with very high frequency (for example, all
callee trashed registers at calls). This can be seen in metrics: in
`libraries_tests.run`, `RefTypeKill` is as common as `RefTypeUse` and
`RefTypeDef` combined on x64, and 3x as common as them combined on arm64
due to having more registers.

The main complication around changing `RefTypeKill` to represent killing
a full set of registers is the fact that we want to be able to easily
look up the next `RefPosition` at which a particular register is killed
or used. Today, this is very simple: all kills (`RefTypeKill`) and uses
(`RefTypeFixedReg`) participate in an intrusive linked list, and finding
the next `RefPosition` is as simple as following this list.

This PR changes LSRA to model the kills in bulk instead of creating an
individual `RefPosition` for every kill. To handle finding the next
fixed reg ref position LSRA is updated to take into account that there
can be `RefPosition`'s corresponding to a register from two different
linked lists: the `RefTypeFixedReg` intrusive linked list, and also a
linked list of all `RefTypeKill` ref positions. The latter linked list
may take some searching (not every kill necessarily kills the register
we are looking for), but the reduction in number of `RefPosition`'s
created more than makes up for this.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 1, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch jakobbotsch marked this pull request as ready for review May 1, 2024 17:43
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @kunalspathak

No asm diffs. Large TP improvements (4 to 8% on MinOpts arm64), and large reduction in memory use (~12% on arm64, ~7% on x64).

This mostly pays off the cost of predicate registers in MinOpts. It will also make it slightly less expensive to add predicate registers since we no longer pay for multiple new ref positions for their kills around calls.

image

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. When I tried this last year #87294 (comment), I had a different approach to get the next RefPosition, but your approach is simpler and giving the expected TP win.

This mostly pays off the cost of predicate registers in MinOpts.

Yes, I had a TODO that I removed in here for zeroing out predicate registers, but yes, this will lower the TP cost definitely.

@@ -1395,7 +1395,7 @@ int LinearScan::BuildCall(GenTreeCall* call)

// We could use RefTypeKill, but RefTypeFixedReg is used less commonly, so the check for delayRegFree
// during register allocation should be cheaper in terms of TP.
RefPosition* pos = newRefPosition(REG_SWIFT_ERROR, currentLoc, RefTypeFixedReg, call, RBM_SWIFT_ERROR);
RefPosition* pos = newRefPosition(REG_SWIFT_ERROR, currentLoc + 1, RefTypeFixedReg, call, RBM_SWIFT_ERROR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were hitting this assert:

// We can't have two RefPositions on a RegRecord at the same location, unless they are different types.
assert((regRecord->lastRefPosition == nullptr) || (regRecord->lastRefPosition->nodeLocation < theLocation) ||
(regRecord->lastRefPosition->refType != theRefType));

In the baseline we create these RefPositions:

DefList: { N047.t48. PUTARG_REG; N051.t49. PUTARG_REG }
N075 ( 20,  8) [000016] --CXG------                         ▌  CALL ind unman long   REG NA $103
<RefPosition #69  @75  RefTypeFixedReg <Reg:x0 > BB02 regmask=[x0] minReg=1 wt=100.00>
<RefPosition #70  @75  RefTypeUse <Ivl:17> BB02 regmask=[x0] minReg=1 last fixed wt=100.00>
<RefPosition #71  @75  RefTypeFixedReg <Reg:x21> BB02 regmask=[x21] minReg=1 wt=100.00>
<RefPosition #72  @75  RefTypeUse <Ivl:19> BB02 regmask=[x21] minReg=1 last fixed wt=100.00>
Last use of V09 between PUTARG and CALL. Removing occupied arg regs from preferences: [x0 x21]
<RefPosition #73  @75  RefTypeUse <Ivl:2 V09> LCL_VAR BB02 regmask=[x0-xip0 x19-x28] minReg=1 last wt=400.00>
<RefPosition #74  @76  RefTypeKill <Reg:x0 > BB02 regmask=[x0] minReg=1 wt=100.00>
<RefPosition #75  @76  RefTypeKill <Reg:x1 > BB02 regmask=[x1] minReg=1 wt=100.00>
<RefPosition #76  @76  RefTypeKill <Reg:x2 > BB02 regmask=[x2] minReg=1 wt=100.00>
<RefPosition #77  @76  RefTypeKill <Reg:x3 > BB02 regmask=[x3] minReg=1 wt=100.00>
<RefPosition #78  @76  RefTypeKill <Reg:x4 > BB02 regmask=[x4] minReg=1 wt=100.00>
<RefPosition #79  @76  RefTypeKill <Reg:x5 > BB02 regmask=[x5] minReg=1 wt=100.00>
<RefPosition #80  @76  RefTypeKill <Reg:x6 > BB02 regmask=[x6] minReg=1 wt=100.00>
<RefPosition #81  @76  RefTypeKill <Reg:x7 > BB02 regmask=[x7] minReg=1 wt=100.00>
<RefPosition #82  @76  RefTypeKill <Reg:x8 > BB02 regmask=[x8] minReg=1 wt=100.00>
<RefPosition #83  @76  RefTypeKill <Reg:x9 > BB02 regmask=[x9] minReg=1 wt=100.00>
<RefPosition #84  @76  RefTypeKill <Reg:x10> BB02 regmask=[x10] minReg=1 wt=100.00>
<RefPosition #85  @76  RefTypeKill <Reg:x11> BB02 regmask=[x11] minReg=1 wt=100.00>
<RefPosition #86  @76  RefTypeKill <Reg:x12> BB02 regmask=[x12] minReg=1 wt=100.00>
<RefPosition #87  @76  RefTypeKill <Reg:x13> BB02 regmask=[x13] minReg=1 wt=100.00>
<RefPosition #88  @76  RefTypeKill <Reg:x14> BB02 regmask=[x14] minReg=1 wt=100.00>
<RefPosition #89  @76  RefTypeKill <Reg:x15> BB02 regmask=[x15] minReg=1 wt=100.00>
<RefPosition #90  @76  RefTypeKill <Reg:xip0> BB02 regmask=[xip0] minReg=1 wt=100.00>
<RefPosition #91  @76  RefTypeKill <Reg:xip1> BB02 regmask=[xip1] minReg=1 wt=100.00>
<RefPosition #92  @76  RefTypeKill <Reg:x21> BB02 regmask=[x21] minReg=1 wt=100.00>
<RefPosition #93  @76  RefTypeKill <Reg:lr > BB02 regmask=[lr] minReg=1 wt=100.00>
<RefPosition #94  @76  RefTypeKillGCRefs CALL BB02 regmask=[x8-xip0 x19-x28] minReg=1 wt=400.00>
Interval 22: long RefPositions {} physReg:NA Preferences=[x0-xip0 x19-x28] Aversions=[]
<RefPosition #95  @76  RefTypeFixedReg <Reg:x0 > BB02 regmask=[x0] minReg=1 wt=100.00>
<RefPosition #96  @76  RefTypeDef <Ivl:22> CALL BB02 regmask=[x0] minReg=1 fixed wt=400.00>
<RefPosition #97  @75  RefTypeFixedReg <Reg:x21> CALL BB02 regmask=[x21] minReg=1 wt=400.00>

Notice that RefPosition #97 is at location 75 even though it comes after a RefPosition at location 76, so that on its own seems wrong (hence why I changed this). But it doesn't hit the assert because between RefPosition #71 and RefPosition #97 there is the kill at RefPosition #92.

With this change and this PR we now create these RefPositions:

DefList: { N047.t48. PUTARG_REG; N051.t49. PUTARG_REG }
N075 ( 20,  8) [000016] --CXG------                         ▌  CALL ind unman long   REG NA $103
<RefPosition #51  @75  RefTypeFixedReg <Reg:x0 > BB02 regmask=[x0] minReg=1 wt=100.00>
<RefPosition #52  @75  RefTypeUse <Ivl:17> BB02 regmask=[x0] minReg=1 last fixed wt=100.00>
<RefPosition #53  @75  RefTypeFixedReg <Reg:x21> BB02 regmask=[x21] minReg=1 wt=100.00>
<RefPosition #54  @75  RefTypeUse <Ivl:19> BB02 regmask=[x21] minReg=1 last fixed wt=100.00>
Last use of V09 between PUTARG and CALL. Removing occupied arg regs from preferences: [x0 x21]
<RefPosition #55  @75  RefTypeUse <Ivl:2 V09> LCL_VAR BB02 regmask=[x0-xip0 x19-x28] minReg=1 last wt=400.00>
<RefPosition #56  @76  RefTypeKill BB02 regmask=[x0-xip1 x21 lr] minReg=1 wt=100.00>
<RefPosition #57  @76  RefTypeKillGCRefs CALL BB02 regmask=[x8-xip0 x19-x28] minReg=1 wt=400.00>
Interval 22: long RefPositions {} physReg:NA Preferences=[x0-xip0 x19-x28] Aversions=[]
<RefPosition #58  @76  RefTypeFixedReg <Reg:x0 > BB02 regmask=[x0] minReg=1 wt=100.00>
<RefPosition #59  @76  RefTypeDef <Ivl:22> CALL BB02 regmask=[x0] minReg=1 fixed wt=400.00>
<RefPosition #60  @76  RefTypeFixedReg <Reg:x21> CALL BB02 regmask=[x21] minReg=1 wt=400.00>

Looking at this, however, it seems like the ref position is not inserted in the right location. We need to insert it before the definition of the call to ensure that doesn't get allocated into x21. In fact we should probably just mark the existing RefPosition #53 as delay freed. I'll open a separate issue about this.

@@ -11726,8 +11773,6 @@ void LinearScan::verifyFreeRegisters(regMaskTP regsToFree)
assert(nextIntervalRef[reg] == MaxLocation);
assert(spillCost[reg] == 0);
}
LsraLocation thisNextFixedRef = physRegRecord->getNextRefLocation();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not needed anymore?

Copy link
Member Author

@jakobbotsch jakobbotsch May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert isn't true anymore -- nextFixedRef[reg] has either the location of the next RefTypeFixedReg or of the next RefTypeKill that kills the register (to match the behavior before this change). But physRegRecord->getNextRefLocation() is always the next RefTypeFixedReg after this change.

@jakobbotsch jakobbotsch merged commit f95a76b into dotnet:main May 2, 2024
105 of 107 checks passed
@jakobbotsch jakobbotsch deleted the bulk-kill-registers branch May 2, 2024 09:24
michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
`RefTypeKill` is one of the most common types of ref position because we
create a separate one for every register whenever registers are killed,
and registers are killed with very high frequency (for example, all
callee trashed registers at calls). This can be seen in metrics: in
`libraries_tests.run`, `RefTypeKill` is as common as `RefTypeUse` and
`RefTypeDef` combined on x64, and 3x as common as them combined on arm64
due to having more registers.

The main complication around changing `RefTypeKill` to represent killing
a full set of registers is the fact that we want to be able to easily
look up the next `RefPosition` at which a particular register is killed
or used. Today, this is very simple: all kills (`RefTypeKill`) and uses
(`RefTypeFixedReg`) participate in an intrusive linked list, and finding
the next `RefPosition` is as simple as following this list.

This PR changes LSRA to model the kills in bulk instead of creating an
individual `RefPosition` for every kill. To handle finding the next
fixed reg ref position LSRA is updated to take into account that there
can be `RefPosition`'s corresponding to a register from two different
linked lists: the `RefTypeFixedReg` intrusive linked list, and also a
linked list of all `RefTypeKill` ref positions. The latter linked list
may take some searching (not every kill necessarily kills the register
we are looking for), but the reduction in number of `RefPosition`'s
created more than makes up for this.

No codegen diffs, but significant throughput improvements.

This change reduces memory usage of the JIT by around 12% for arm64,
and by around 7% for x64.
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
`RefTypeKill` is one of the most common types of ref position because we
create a separate one for every register whenever registers are killed,
and registers are killed with very high frequency (for example, all
callee trashed registers at calls). This can be seen in metrics: in
`libraries_tests.run`, `RefTypeKill` is as common as `RefTypeUse` and
`RefTypeDef` combined on x64, and 3x as common as them combined on arm64
due to having more registers.

The main complication around changing `RefTypeKill` to represent killing
a full set of registers is the fact that we want to be able to easily
look up the next `RefPosition` at which a particular register is killed
or used. Today, this is very simple: all kills (`RefTypeKill`) and uses
(`RefTypeFixedReg`) participate in an intrusive linked list, and finding
the next `RefPosition` is as simple as following this list.

This PR changes LSRA to model the kills in bulk instead of creating an
individual `RefPosition` for every kill. To handle finding the next
fixed reg ref position LSRA is updated to take into account that there
can be `RefPosition`'s corresponding to a register from two different
linked lists: the `RefTypeFixedReg` intrusive linked list, and also a
linked list of all `RefTypeKill` ref positions. The latter linked list
may take some searching (not every kill necessarily kills the register
we are looking for), but the reduction in number of `RefPosition`'s
created more than makes up for this.

No codegen diffs, but significant throughput improvements.

This change reduces memory usage of the JIT by around 12% for arm64,
and by around 7% for x64.
@github-actions github-actions bot locked and limited conversation to collaborators Jun 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants