-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Process all register kills in bulk in LSRA #101760
Conversation
`RefTypeKill` is one of the most common type of ref position because we create a separate one for every register whenever registers are killed, and registers are killed with very high frequency (for example, all callee trashed registers at calls). This can be seen in metrics: in `libraries_tests.run`, `RefTypeKill` is as common as `RefTypeUse` and `RefTypeDef` combined on x64, and 3x as common as them combined on arm64 due to having more registers. The main complication around changing `RefTypeKill` to represent killing a full set of registers is the fact that we want to be able to easily look up the next `RefPosition` at which a particular register is killed or used. Today, this is very simple: all kills (`RefTypeKill`) and uses (`RefTypeFixedReg`) participate in an intrusive linked list, and finding the next `RefPosition` is as simple as following this list. This PR changes LSRA to model the kills in bulk instead of creating an individual `RefPosition` for every kill. To handle finding the next fixed reg ref position LSRA is updated to take into account that there can be `RefPosition`'s corresponding to a register from two different linked lists: the `RefTypeFixedReg` intrusive linked list, and also a linked list of all `RefTypeKill` ref positions. The latter linked list may take some searching (not every kill necessarily kills the register we are looking for), but the reduction in number of `RefPosition`'s created more than makes up for this.
This reverts commit c52c798.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
cc @dotnet/jit-contrib PTAL @kunalspathak No asm diffs. Large TP improvements (4 to 8% on MinOpts arm64), and large reduction in memory use (~12% on arm64, ~7% on x64). This mostly pays off the cost of predicate registers in MinOpts. It will also make it slightly less expensive to add predicate registers since we no longer pay for multiple new ref positions for their kills around calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. When I tried this last year #87294 (comment), I had a different approach to get the next RefPosition
, but your approach is simpler and giving the expected TP win.
This mostly pays off the cost of predicate registers in MinOpts.
Yes, I had a TODO that I removed in here for zeroing out predicate registers, but yes, this will lower the TP cost definitely.
@@ -1395,7 +1395,7 @@ int LinearScan::BuildCall(GenTreeCall* call) | |||
|
|||
// We could use RefTypeKill, but RefTypeFixedReg is used less commonly, so the check for delayRegFree | |||
// during register allocation should be cheaper in terms of TP. | |||
RefPosition* pos = newRefPosition(REG_SWIFT_ERROR, currentLoc, RefTypeFixedReg, call, RBM_SWIFT_ERROR); | |||
RefPosition* pos = newRefPosition(REG_SWIFT_ERROR, currentLoc + 1, RefTypeFixedReg, call, RBM_SWIFT_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were hitting this assert:
runtime/src/coreclr/jit/lsrabuild.cpp
Lines 533 to 535 in 71f8fb6
// We can't have two RefPositions on a RegRecord at the same location, unless they are different types. | |
assert((regRecord->lastRefPosition == nullptr) || (regRecord->lastRefPosition->nodeLocation < theLocation) || | |
(regRecord->lastRefPosition->refType != theRefType)); |
In the baseline we create these RefPositions:
DefList: { N047.t48. PUTARG_REG; N051.t49. PUTARG_REG }
N075 ( 20, 8) [000016] --CXG------ ▌ CALL ind unman long REG NA $103
<RefPosition #69 @75 RefTypeFixedReg <Reg:x0 > BB02 regmask=[x0] minReg=1 wt=100.00>
<RefPosition #70 @75 RefTypeUse <Ivl:17> BB02 regmask=[x0] minReg=1 last fixed wt=100.00>
<RefPosition #71 @75 RefTypeFixedReg <Reg:x21> BB02 regmask=[x21] minReg=1 wt=100.00>
<RefPosition #72 @75 RefTypeUse <Ivl:19> BB02 regmask=[x21] minReg=1 last fixed wt=100.00>
Last use of V09 between PUTARG and CALL. Removing occupied arg regs from preferences: [x0 x21]
<RefPosition #73 @75 RefTypeUse <Ivl:2 V09> LCL_VAR BB02 regmask=[x0-xip0 x19-x28] minReg=1 last wt=400.00>
<RefPosition #74 @76 RefTypeKill <Reg:x0 > BB02 regmask=[x0] minReg=1 wt=100.00>
<RefPosition #75 @76 RefTypeKill <Reg:x1 > BB02 regmask=[x1] minReg=1 wt=100.00>
<RefPosition #76 @76 RefTypeKill <Reg:x2 > BB02 regmask=[x2] minReg=1 wt=100.00>
<RefPosition #77 @76 RefTypeKill <Reg:x3 > BB02 regmask=[x3] minReg=1 wt=100.00>
<RefPosition #78 @76 RefTypeKill <Reg:x4 > BB02 regmask=[x4] minReg=1 wt=100.00>
<RefPosition #79 @76 RefTypeKill <Reg:x5 > BB02 regmask=[x5] minReg=1 wt=100.00>
<RefPosition #80 @76 RefTypeKill <Reg:x6 > BB02 regmask=[x6] minReg=1 wt=100.00>
<RefPosition #81 @76 RefTypeKill <Reg:x7 > BB02 regmask=[x7] minReg=1 wt=100.00>
<RefPosition #82 @76 RefTypeKill <Reg:x8 > BB02 regmask=[x8] minReg=1 wt=100.00>
<RefPosition #83 @76 RefTypeKill <Reg:x9 > BB02 regmask=[x9] minReg=1 wt=100.00>
<RefPosition #84 @76 RefTypeKill <Reg:x10> BB02 regmask=[x10] minReg=1 wt=100.00>
<RefPosition #85 @76 RefTypeKill <Reg:x11> BB02 regmask=[x11] minReg=1 wt=100.00>
<RefPosition #86 @76 RefTypeKill <Reg:x12> BB02 regmask=[x12] minReg=1 wt=100.00>
<RefPosition #87 @76 RefTypeKill <Reg:x13> BB02 regmask=[x13] minReg=1 wt=100.00>
<RefPosition #88 @76 RefTypeKill <Reg:x14> BB02 regmask=[x14] minReg=1 wt=100.00>
<RefPosition #89 @76 RefTypeKill <Reg:x15> BB02 regmask=[x15] minReg=1 wt=100.00>
<RefPosition #90 @76 RefTypeKill <Reg:xip0> BB02 regmask=[xip0] minReg=1 wt=100.00>
<RefPosition #91 @76 RefTypeKill <Reg:xip1> BB02 regmask=[xip1] minReg=1 wt=100.00>
<RefPosition #92 @76 RefTypeKill <Reg:x21> BB02 regmask=[x21] minReg=1 wt=100.00>
<RefPosition #93 @76 RefTypeKill <Reg:lr > BB02 regmask=[lr] minReg=1 wt=100.00>
<RefPosition #94 @76 RefTypeKillGCRefs CALL BB02 regmask=[x8-xip0 x19-x28] minReg=1 wt=400.00>
Interval 22: long RefPositions {} physReg:NA Preferences=[x0-xip0 x19-x28] Aversions=[]
<RefPosition #95 @76 RefTypeFixedReg <Reg:x0 > BB02 regmask=[x0] minReg=1 wt=100.00>
<RefPosition #96 @76 RefTypeDef <Ivl:22> CALL BB02 regmask=[x0] minReg=1 fixed wt=400.00>
<RefPosition #97 @75 RefTypeFixedReg <Reg:x21> CALL BB02 regmask=[x21] minReg=1 wt=400.00>
Notice that RefPosition #97
is at location 75 even though it comes after a RefPosition
at location 76, so that on its own seems wrong (hence why I changed this). But it doesn't hit the assert because between RefPosition #71
and RefPosition #97
there is the kill at RefPosition #92
.
With this change and this PR we now create these RefPositions
:
DefList: { N047.t48. PUTARG_REG; N051.t49. PUTARG_REG }
N075 ( 20, 8) [000016] --CXG------ ▌ CALL ind unman long REG NA $103
<RefPosition #51 @75 RefTypeFixedReg <Reg:x0 > BB02 regmask=[x0] minReg=1 wt=100.00>
<RefPosition #52 @75 RefTypeUse <Ivl:17> BB02 regmask=[x0] minReg=1 last fixed wt=100.00>
<RefPosition #53 @75 RefTypeFixedReg <Reg:x21> BB02 regmask=[x21] minReg=1 wt=100.00>
<RefPosition #54 @75 RefTypeUse <Ivl:19> BB02 regmask=[x21] minReg=1 last fixed wt=100.00>
Last use of V09 between PUTARG and CALL. Removing occupied arg regs from preferences: [x0 x21]
<RefPosition #55 @75 RefTypeUse <Ivl:2 V09> LCL_VAR BB02 regmask=[x0-xip0 x19-x28] minReg=1 last wt=400.00>
<RefPosition #56 @76 RefTypeKill BB02 regmask=[x0-xip1 x21 lr] minReg=1 wt=100.00>
<RefPosition #57 @76 RefTypeKillGCRefs CALL BB02 regmask=[x8-xip0 x19-x28] minReg=1 wt=400.00>
Interval 22: long RefPositions {} physReg:NA Preferences=[x0-xip0 x19-x28] Aversions=[]
<RefPosition #58 @76 RefTypeFixedReg <Reg:x0 > BB02 regmask=[x0] minReg=1 wt=100.00>
<RefPosition #59 @76 RefTypeDef <Ivl:22> CALL BB02 regmask=[x0] minReg=1 fixed wt=400.00>
<RefPosition #60 @76 RefTypeFixedReg <Reg:x21> CALL BB02 regmask=[x21] minReg=1 wt=400.00>
Looking at this, however, it seems like the ref position is not inserted in the right location. We need to insert it before the definition of the call to ensure that doesn't get allocated into x21
. In fact we should probably just mark the existing RefPosition #53
as delay freed. I'll open a separate issue about this.
@@ -11726,8 +11773,6 @@ void LinearScan::verifyFreeRegisters(regMaskTP regsToFree) | |||
assert(nextIntervalRef[reg] == MaxLocation); | |||
assert(spillCost[reg] == 0); | |||
} | |||
LsraLocation thisNextFixedRef = physRegRecord->getNextRefLocation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert isn't true anymore -- nextFixedRef[reg]
has either the location of the next RefTypeFixedReg
or of the next RefTypeKill
that kills the register (to match the behavior before this change). But physRegRecord->getNextRefLocation()
is always the next RefTypeFixedReg
after this change.
`RefTypeKill` is one of the most common types of ref position because we create a separate one for every register whenever registers are killed, and registers are killed with very high frequency (for example, all callee trashed registers at calls). This can be seen in metrics: in `libraries_tests.run`, `RefTypeKill` is as common as `RefTypeUse` and `RefTypeDef` combined on x64, and 3x as common as them combined on arm64 due to having more registers. The main complication around changing `RefTypeKill` to represent killing a full set of registers is the fact that we want to be able to easily look up the next `RefPosition` at which a particular register is killed or used. Today, this is very simple: all kills (`RefTypeKill`) and uses (`RefTypeFixedReg`) participate in an intrusive linked list, and finding the next `RefPosition` is as simple as following this list. This PR changes LSRA to model the kills in bulk instead of creating an individual `RefPosition` for every kill. To handle finding the next fixed reg ref position LSRA is updated to take into account that there can be `RefPosition`'s corresponding to a register from two different linked lists: the `RefTypeFixedReg` intrusive linked list, and also a linked list of all `RefTypeKill` ref positions. The latter linked list may take some searching (not every kill necessarily kills the register we are looking for), but the reduction in number of `RefPosition`'s created more than makes up for this. No codegen diffs, but significant throughput improvements. This change reduces memory usage of the JIT by around 12% for arm64, and by around 7% for x64.
`RefTypeKill` is one of the most common types of ref position because we create a separate one for every register whenever registers are killed, and registers are killed with very high frequency (for example, all callee trashed registers at calls). This can be seen in metrics: in `libraries_tests.run`, `RefTypeKill` is as common as `RefTypeUse` and `RefTypeDef` combined on x64, and 3x as common as them combined on arm64 due to having more registers. The main complication around changing `RefTypeKill` to represent killing a full set of registers is the fact that we want to be able to easily look up the next `RefPosition` at which a particular register is killed or used. Today, this is very simple: all kills (`RefTypeKill`) and uses (`RefTypeFixedReg`) participate in an intrusive linked list, and finding the next `RefPosition` is as simple as following this list. This PR changes LSRA to model the kills in bulk instead of creating an individual `RefPosition` for every kill. To handle finding the next fixed reg ref position LSRA is updated to take into account that there can be `RefPosition`'s corresponding to a register from two different linked lists: the `RefTypeFixedReg` intrusive linked list, and also a linked list of all `RefTypeKill` ref positions. The latter linked list may take some searching (not every kill necessarily kills the register we are looking for), but the reduction in number of `RefPosition`'s created more than makes up for this. No codegen diffs, but significant throughput improvements. This change reduces memory usage of the JIT by around 12% for arm64, and by around 7% for x64.
RefTypeKill
is one of the most common types of ref position because wecreate a separate one for every register whenever registers are killed,
and registers are killed with very high frequency (for example, all
callee trashed registers at calls). This can be seen in metrics: in
libraries_tests.run
,RefTypeKill
is as common asRefTypeUse
andRefTypeDef
combined on x64, and 3x as common as them combined on arm64due to having more registers.
The main complication around changing
RefTypeKill
to represent killinga full set of registers is the fact that we want to be able to easily
look up the next
RefPosition
at which a particular register is killedor used. Today, this is very simple: all kills (
RefTypeKill
) and uses(
RefTypeFixedReg
) participate in an intrusive linked list, and findingthe next
RefPosition
is as simple as following this list.This PR changes LSRA to model the kills in bulk instead of creating an
individual
RefPosition
for every kill. To handle finding the nextfixed reg ref position LSRA is updated to take into account that there
can be
RefPosition
's corresponding to a register from two differentlinked lists: the
RefTypeFixedReg
intrusive linked list, and also alinked list of all
RefTypeKill
ref positions. The latter linked listmay take some searching (not every kill necessarily kills the register
we are looking for), but the reduction in number of
RefPosition
'screated more than makes up for this.
No diffs are expected.
This change reduces memory usage of the JIT by around 12% for arm64,
and by around 7% for x64.