-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Rewrite register parameter homing #100572
Conversation
Generalize register parameter homing to handle float and integer parameters simultaneously, and to handle all parameters (including the Swift self register). Base it on the new ABI representation. The new algorithm constructs a graph in which nodes are the source and destination registers of all parameters. Edges in the graph indicate that (part of) a register has to be moved into (part of) another register. To home parameters we repeatedly pick a register (preferring nodes without any outgoing edges) and perform the reg-reg moves indicated by its incoming edges. If we pick a register that has any outgoing edges it means there is circularity, so we need a temporary register to save its value.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn |
Azure Pipelines successfully started running 3 pipeline(s). |
I added a stress mode that makes LSRA pick initial argument registers in a way that is much harder to resolve. It works by preferencing every parameter to a random parameter register. ****** START compiling HFATest.TestMan:Sum19_HFA01(float,double,float,double,float,double,float,double,float,double,float,double,float,HFATest.HFA01):float (MethodHash=a83b2040)
...
Arg #0 passed in register(s) f0
Arg #1 passed in register(s) f2/f3
Arg #2 passed in register(s) f1
Arg #3 passed in register(s) f4/f5
Arg #4 passed in register(s) f6
Arg #5 passed in register(s) f8/f9
Arg #6 passed in register(s) f7
Arg #7 passed in register(s) f10/f11
Arg #8 passed in register(s) f12
Arg #9 passed in register(s) f14/f15
Arg #10 passed in register(s) f13
...
*************** In genHomeRegisterParams()
11 registers in register parameter interference graph
f2 (double)
<- f1 (float)
<- f0 (float) (offset: 4)
f4 (double)
<- f14 (double)
f8 (double)
<- f4 (double)
f10 (double)
<- f8 (double)
f14 (double)
<- f13 (float)
<- f6 (float) (offset: 4)
f0 (float)
<- f2 (double)
f1 (float)
f6 (float)
<- f10 (double)
f7 (float)
f12 (float)
<- f7 (float)
f13 (float)
<- f12 (float) I'll add some logic to conservatively detect and allocate a callee save in these cases. |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
cc @dotnet/jit-contrib PTAL @AndyAyersMS. Also FYI @kunalspathak since I'm adding a new stress mode to LSRA (and some reg mask related code...). The failures are #99163 and #100754. Diffs. TP improvements in most collections with some small regressions in a few others.
|
I will just note that it is simply lovely to see a 1.4K lines function which is rather challenging to reason about collapsed to a 200 lines one that is possible to understand on first read. |
src/coreclr/jit/codegencommon.cpp
Outdated
|
||
regSet.verifyRegUsed(xtraReg); | ||
// ----------------------------------------------------------------------------- | ||
// FindNodeToHandle: Find the next register node to handle incoming moves to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit -- maybe use "process" instead of "handle", since the latter has other meanings...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed.
for (unsigned i = 0; i < abiInfo.NumSegments; i++) | ||
{ | ||
const ABIPassingSegment& seg = abiInfo.Segments[i]; | ||
if (seg.IsPassedInRegister() && ((paramRegs & genRegMask(seg.GetRegister())) != 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm confused by the second part of the &&
here -- what are the cases where we won't (or don't need) to store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically the liveness check, but LSRA uses a check that's a bit different when creating intervals and marking live registers in this set:
runtime/src/coreclr/jit/lsrabuild.cpp
Lines 2329 to 2338 in 8e2655b
// Only reserve a register if the argument is actually used. | |
// Is it dead on entry? If compJmpOpUsed is true, then the arguments | |
// have to be kept alive, so we have to consider it as live on entry. | |
// Use lvRefCnt instead of checking bbLiveIn because if it's volatile we | |
// won't have done dataflow on it, but it needs to be marked as live-in so | |
// it will get saved in the prolog. | |
if (!compiler->compJmpOpUsed && argDsc->lvRefCnt() == 0 && !compiler->opts.compDbgCode) | |
{ | |
continue; | |
} |
Not totally convinced that LSRA and this code shouldn't be able to use more precise bbLiveIn
, but I guess that's something for another day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this particular one you're commenting on has to be based on lvRefCnt()
, either directly or through this reg mask, since it is for MinOpts. But the one inside genSpillOrAddRegisterParam
could probably use real liveness instead.
#endif | ||
void genEnregisterIncomingStackArgs(); | ||
void genEstablishFramePointer(int delta, bool reportUnwindData); | ||
void genHomeRegisterParams(regNumber initReg, bool* initRegStillZeroed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bool could be a return value, but seems ok this way too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also most functions are using this pattern so it seems good to be consistent.
Generalize register parameter homing to handle float and integer parameters simultaneously, and to handle all parameters (including the Swift self register). Base it on the new ABI representation. This implements the alternate ideal fix mentioned in dotnet#96439. The new algorithm constructs a graph in which nodes are the source and destination registers of all parameters. Edges in the graph indicate that (part of) a register has to be moved into (part of) another register. To home parameters we repeatedly pick a register (preferring nodes without any outgoing edges) and perform the reg-reg moves indicated by its incoming edges. If we pick a register that has any outgoing edges it means there is circularity, so we need a temporary register to save its value. LA64 and RISCV64 should be able to reuse this implementation and get rid of their two own copies as well, but first it requires adding the new style ABI classifiers for them. Diffs from: - Improvements on xarch due to stack homing not rounding up sizes of structs to TYP_I_IMPL size unnecessarily, leading to smaller encodings - Improvements from no longer zeroing initReg after register homing when unnecessary because it is still zero - Improvements on arm32 from storing doubles to stack in a single store rather than two stores - Some improvements and regressions on arm64 from different homing order, leading to differences in stp optimizations - Some regressions on arm32 from reserving an extra callee save in functions with all callee trashed float registers used by parameters
Generalize register parameter homing to handle float and integer parameters simultaneously, and to handle all parameters (including the Swift self register). Base it on the new ABI representation. This implements the alternate ideal fix mentioned in #96439.
The new algorithm constructs a graph in which nodes are the source and destination registers of all parameters. Edges in the graph indicate that (part of) a register has to be moved into (part of) another register. To home parameters we repeatedly pick a register (preferring nodes without any outgoing edges) and perform the reg-reg moves indicated by its incoming edges. If we pick a register that has any outgoing edges it means there is circularity, so we need a temporary register to save its value.
LA64 and RISCV64 should be able to reuse this implementation and get rid of their two own copies as well, but first it requires adding the new style ABI classifiers for them.