Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Rewrite register parameter homing #100572

Merged
merged 27 commits into from
Apr 9, 2024

Conversation

jakobbotsch
Copy link
Member

Generalize register parameter homing to handle float and integer parameters simultaneously, and to handle all parameters (including the Swift self register). Base it on the new ABI representation. This implements the alternate ideal fix mentioned in #96439.

The new algorithm constructs a graph in which nodes are the source and destination registers of all parameters. Edges in the graph indicate that (part of) a register has to be moved into (part of) another register. To home parameters we repeatedly pick a register (preferring nodes without any outgoing edges) and perform the reg-reg moves indicated by its incoming edges. If we pick a register that has any outgoing edges it means there is circularity, so we need a temporary register to save its value.

LA64 and RISCV64 should be able to reuse this implementation and get rid of their two own copies as well, but first it requires adding the new style ABI classifiers for them.

Generalize register parameter homing to handle float and integer
parameters simultaneously, and to handle all parameters (including the
Swift self register). Base it on the new ABI representation.

The new algorithm constructs a graph in which nodes are the source and
destination registers of all parameters. Edges in the graph indicate
that (part of) a register has to be moved into (part of) another
register. To home parameters we repeatedly pick a register (preferring
nodes without any outgoing edges) and perform the reg-reg moves
indicated by its incoming edges. If we pick a register that has any
outgoing edges it means there is circularity, so we need a temporary
register to save its value.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 3, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Apr 5, 2024

I added a stress mode that makes LSRA pick initial argument registers in a way that is much harder to resolve. It works by preferencing every parameter to a random parameter register.
With that we do hit the hypothetical case I told you about yesterday @AndyAyersMS, where on arm32 we can have parameters in all callee trashed float registers and thus may need to make an additional callee save float register available to the handling. Relevant dump details are:

****** START compiling HFATest.TestMan:Sum19_HFA01(float,double,float,double,float,double,float,double,float,double,float,double,float,HFATest.HFA01):float (MethodHash=a83b2040)
... 
Arg #0    passed in register(s) f0
Arg #1    passed in register(s) f2/f3
Arg #2    passed in register(s) f1
Arg #3    passed in register(s) f4/f5
Arg #4    passed in register(s) f6
Arg #5    passed in register(s) f8/f9
Arg #6    passed in register(s) f7
Arg #7    passed in register(s) f10/f11
Arg #8    passed in register(s) f12
Arg #9    passed in register(s) f14/f15
Arg #10    passed in register(s) f13
...
*************** In genHomeRegisterParams()
11 registers in register parameter interference graph
  f2 (double)
    <- f1 (float)
    <- f0 (float) (offset: 4)
  f4 (double)
    <- f14 (double)
  f8 (double)
    <- f4 (double)
  f10 (double)
    <- f8 (double)
  f14 (double)
    <- f13 (float)
    <- f6 (float) (offset: 4)
  f0 (float)
    <- f2 (double)
  f1 (float)
  f6 (float)
    <- f10 (double)
  f7 (float)
  f12 (float)
    <- f7 (float)
  f13 (float)
    <- f12 (float)

I'll add some logic to conservatively detect and allocate a callee save in these cases.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @AndyAyersMS. Also FYI @kunalspathak since I'm adding a new stress mode to LSRA (and some reg mask related code...).

The failures are #99163 and #100754.

Diffs. TP improvements in most collections with some small regressions in a few others.
Diffs from:

  • Improvements on xarch due to stack homing not rounding up sizes of structs to TYP_I_IMPL size unnecessarily, leading to smaller encodings
  • Improvements from no longer zeroing initReg after register homing when unnecessary because it is still zero
  • Improvements on arm32 from storing doubles to stack in a single store rather than two stores
  • Some improvements and regressions on arm64 from different homing order, leading to differences in stp optimizations
  • Some regressions on arm32 from reserving an extra callee save in functions with all callee trashed float registers used by parameters

@jakobbotsch jakobbotsch marked this pull request as ready for review April 8, 2024 08:30
@SingleAccretion
Copy link
Contributor

I will just note that it is simply lovely to see a 1.4K lines function which is rather challenging to reason about collapsed to a 200 lines one that is possible to understand on first read.


regSet.verifyRegUsed(xtraReg);
// -----------------------------------------------------------------------------
// FindNodeToHandle: Find the next register node to handle incoming moves to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit -- maybe use "process" instead of "handle", since the latter has other meanings...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

for (unsigned i = 0; i < abiInfo.NumSegments; i++)
{
const ABIPassingSegment& seg = abiInfo.Segments[i];
if (seg.IsPassedInRegister() && ((paramRegs & genRegMask(seg.GetRegister())) != 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm confused by the second part of the && here -- what are the cases where we won't (or don't need) to store?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically the liveness check, but LSRA uses a check that's a bit different when creating intervals and marking live registers in this set:

// Only reserve a register if the argument is actually used.
// Is it dead on entry? If compJmpOpUsed is true, then the arguments
// have to be kept alive, so we have to consider it as live on entry.
// Use lvRefCnt instead of checking bbLiveIn because if it's volatile we
// won't have done dataflow on it, but it needs to be marked as live-in so
// it will get saved in the prolog.
if (!compiler->compJmpOpUsed && argDsc->lvRefCnt() == 0 && !compiler->opts.compDbgCode)
{
continue;
}

Not totally convinced that LSRA and this code shouldn't be able to use more precise bbLiveIn, but I guess that's something for another day.

Copy link
Member Author

@jakobbotsch jakobbotsch Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this particular one you're commenting on has to be based on lvRefCnt(), either directly or through this reg mask, since it is for MinOpts. But the one inside genSpillOrAddRegisterParam could probably use real liveness instead.

#endif
void genEnregisterIncomingStackArgs();
void genEstablishFramePointer(int delta, bool reportUnwindData);
void genHomeRegisterParams(regNumber initReg, bool* initRegStillZeroed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bool could be a return value, but seems ok this way too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also most functions are using this pattern so it seems good to be consistent.

@jakobbotsch jakobbotsch merged commit 0764864 into dotnet:main Apr 9, 2024
108 of 110 checks passed
@jakobbotsch jakobbotsch deleted the register-homing branch April 9, 2024 15:00
matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
Generalize register parameter homing to handle float and integer parameters
simultaneously, and to handle all parameters (including the Swift self
register). Base it on the new ABI representation. This implements the alternate
ideal fix mentioned in dotnet#96439.

The new algorithm constructs a graph in which nodes are the source and
destination registers of all parameters. Edges in the graph indicate that (part
of) a register has to be moved into (part of) another register. To home
parameters we repeatedly pick a register (preferring nodes without any outgoing
edges) and perform the reg-reg moves indicated by its incoming edges. If we pick
a register that has any outgoing edges it means there is circularity, so we need
a temporary register to save its value.

LA64 and RISCV64 should be able to reuse this implementation and get rid of
their two own copies as well, but first it requires adding the new style ABI
classifiers for them.

Diffs from:

- Improvements on xarch due to stack homing not rounding up sizes of structs to
  TYP_I_IMPL size unnecessarily, leading to smaller encodings

- Improvements from no longer zeroing initReg after register homing when
  unnecessary because it is still zero

- Improvements on arm32 from storing doubles to stack in a single store rather
  than two stores

- Some improvements and regressions on arm64 from different homing order,
  leading to differences in stp optimizations

- Some regressions on arm32 from reserving an extra callee save in functions
  with all callee trashed float registers used by parameters
@github-actions github-actions bot locked and limited conversation to collaborators May 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants