-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: handle case where we are cloning adjacent sibling loops #70874
JIT: handle case where we are cloning adjacent sibling loops #70874
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsWe can sometimes see adjacent sibling loops (where L1.bottom == L2.head) Detect this case and give L2 a temporary preheader before cloning L1. Fixes #70569.
|
@BruceForstall PTAL Not the most elegant solution, I'll admit, but the "temporary" preheader may get reinstated later on so the block may end up being useful. No diffs. The original issue was from random PGO stress in libraries-pgo. Final local flow graph for the example over in #70596 below, showing both siblings successfully cloned. Note RBO gets rid of the in-loop type test in the two fast loops (but not in the two slow ones, as on the slow path the in-loop type test is not dominated by cloning condition type test; control can also reach the slow loop with a null ref). |
/azp run runtime-coreclr superpmi-diffs |
Azure Pipelines successfully started running 1 pipeline(s). |
Installer test hit some kind of timing/permissions issue. Will retry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a hack. Ideally, I'd like to get rid of the 'head' concept; I think it's pretty broken/limiting.
Would it be cleaner to do some kind of pre-pass, say right after loop recognition, to create a new 'head' block (not a pre-header as defined) whenever a 'head' and 'bottom' coincide?
src/coreclr/jit/loopcloning.cpp
Outdated
} | ||
|
||
assert(sibling <= optLoopCount); | ||
if (optLoopTable[sibling].lpTop != x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not check optLoopTable[sibling].lpHead != b
directly?
src/coreclr/jit/loopcloning.cpp
Outdated
// If B is the header of a sibling loop, give that loop a preheader | ||
// that will serve as X, the join point above the sibling. | ||
// | ||
if ((b->bbJumpKind == BBJ_COND) && (ambientLoop != BasicBlock::NOT_IN_LOOP)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why check ambientLoop
here? I guess it's so you can walk the sibling list. But can't the same situation exist for top-level adjacent loops, in which case you need to walk the entire loop table looking for loops without a parent.
src/coreclr/jit/loopcloning.cpp
Outdated
// | ||
if ((loop.lpFlags & LPFLG_HAS_PREHEAD) != 0) | ||
{ | ||
JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd); | |
JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd, h->bbNum); |
Yeah, I can look into doing a prepass instead. With a prepass we may alter the flow graph even if we're not going to clone anything, so it might lead to some diffs. |
Seems like maybe this fits into |
We can sometimes see adjacent sibling loops (where L1.bottom == L2.head) and if so, cloning L1 will repurpose L1.bottom and so leave L2 in an inconsistent state. Detect this case during optCanonicalizeLoop, and add an intermediary block to serve as L2's head. Fixes dotnet#70569.
251b176
to
cae78ae
Compare
Reworked this to run as part of I saw 4 methods with minor diffs in local SPMI. |
@BruceForstall I changed my approach here. See what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach, as it happens during "normal" canonicalization, and establishes a new invariant for the loop table.
Can you add appropriate asserts to Compiler::fgDebugCheckLoopTable()
to ensure this invariant?
I think we have an ordering requirement on the loop table, but I don't see that Compiler::fgDebugCheckLoopTable()
enforces one. I.e., for two top-level loops l1
and l2
, they must be disjoint. If the blocks of l1
precede l2
, is l1
guaranteed to appear before (at a lower index) than l2
in the loop table? If so, there's no reason to loop over the entire table; for loop index loopInd
, look at loopInd - 1
(or the first preceding top-level loop), and you're done. Something similar should apply for siblings, but I guess we don't have a way to walk backward in the sibling list.
One thing I'm a little worried about is it looks like we allow a child loop to share "top" with its parent:
static bool lpContains(const unsigned* blockNumMap,
const LoopDsc* loop,
const BasicBlock* top,
const BasicBlock* bottom)
{
return (blockNumMap[loop->lpTop->bbNum] <= blockNumMap[top->bbNum]) &&
(blockNumMap[bottom->bbNum] < blockNumMap[loop->lpBottom->bbNum]);
}
(<=
instead of <
).
Do you know if we canonicalize to prevent this?
I think this will be true right after loop recognition, but I don't think this remains true forever, we may reorder entire ranges of blocks later on based on profile data.
This is a thing canonicalization fixes: it unshares tops. Until recently, this was the only thing canonicalization did. So, it should only be true for a short time (between registering the loop and canonicalizing the loop). I've now added a second kind of canonicalization (#70809, unshare top with non-loop backedges) and this PR adds a third kind (unshare bottom with head), and we really should add a fourth kind (unshare entry with non-loop backedges -- which instead I've worked around for now, see #70959). And possibly more: it seems like we could have a mid-entry loop whose entry is also the top of a nested loop. so we might want to unshare top and entry. An alternative to explore is to do backedge canonicalization so that each loop has one actual backedge (from inside the loop to entry) and this might collapse some of the things we'd currently call nested do-while loops to be a single do-while with multiple exits. And we might want to "describe" the non-loop like things we bail on today, because pretending they're not loops seems to be causing some problems. I'm a fan of the loop recognition advocated by Havlak though it might be there are more recent approaches that work out better. All of this comes about because we are now cloning more general things that we still recognize as loops; previously we'd restrict ourselves to cloning only iterable loops and those are more restricted in shape. |
Seems like we should change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment still applies:
Can you add appropriate asserts to Compiler::fgDebugCheckLoopTable() to ensure this invariant?
but I'll go ahead and approve. We can add this later (if we remember).
Added to 7.0 so should be hard to forget: #71084 |
We can sometimes see adjacent sibling loops (where L1.bottom == L2.head)
and if so, cloning L1 will repurpose L1.bottom and so leave L2 in
an inconsistent state.
Detect this case and give L2 a temporary preheader before cloning L1.
Revoke preheader status of L2 if we then go on to clone L2, as cloning
does not preserve preheaders currently.
Fixes #70569.