-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: morph blocks in RPO #94247
JIT: morph blocks in RPO #94247
Conversation
When optimizing, process blocks in RPO. Disallow creation of new blocks and new flow edges (the latter with certain preapproved exceptions). Morph does not yet take advantage of the RPO to enable more optimization. Contributes to dotnet#93246.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsWhen optimizing, process blocks in RPO. Disallow creation of new blocks and new flow edges (the latter with certain preapproved exceptions). Morph does not yet take advantage of the RPO to enable more optimization. Contributes to #93246.
|
@jakobbotsch PTAL Not yet a general purpose utility -- will have to wait for a second "customer" to figure out how to best refactor the RPO traversal. A bit higher TP impact than I'd like, around 0.25%. Some of this was paid for in advance by #93704, which saved about 0.15%. I will keep looking for ways to pare this down. Some diffs from places where morph is order sensitive. Mainly seems to be |
Re the order sensitivity (at least for address exposure): one could perhaps argue that any locals introduced during morph shouldn't count... that might reduce or remove this source of diffs all together. |
// | ||
fgRenumberBlocks(); | ||
EnsureBasicBlockEpoch(); | ||
fgComputeEnterBlocksSet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of the cost comes from having to do the above setup calls before even computing the order?
When I've tried to compute the order before it wasn't clear to me why I would need to renumber blocks to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can grab a new profile, but the old one is probably pretty accurate: see #86822 (comment)
Renumbering likely isn't needed and not renumbering should save a bit of TP. So let me try that.
Also looking back at that early PR reminded me that I should take a look at if SSA's topological sort, it might be more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also possible we can make DfsBlockEntry
cheaper in the same way as #86839 (even though GetSucc
looks cheap, it would probably still help a bit).
It also confuses me a bit that we have multiple notions of start nodes with fgEnterBlks
and fgDomFindStartNodes
. I assume the nodes found exclusively by fgDomFindStartNodes
are unreachable (?) so ideally we would get rid of them earlier, but even if not, isn't it optimizable by using bbRefs
? If not, then I think using VisitRegularSuccs
would be more efficient.
Anyway, don't feel the need to address it in this PR -- but might be a couple of avenues of future cleanup/improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like for now at least renumbering is needed. Let me add some notes to #93246 about making all this more efficient. I'll come back to that after I get the assertion prop part working.
Another source of these is that block morphing depends on whether structs are marked as DNER, and morph can mark structs as DNER in some cases. I think it's fine (and preferable) to take this churn – it at least makes it more well defined and depending on a logical order when this may happen, so probably reduces spurious diffs in the long run. |
/azp run runtime-coreclr jitstress |
Azure Pipelines successfully started running 1 pipeline(s). |
// Allow edge creation to genReturnBB (target of return merging) | ||
// and the scratch block successor (target for tail call to loop). | ||
// This will also disallow dataflow into these blocks. | ||
// | ||
if (genReturnBB != nullptr) | ||
{ | ||
genReturnBB->bbFlags |= BBF_CAN_ADD_PRED; | ||
} | ||
if (fgFirstBBisScratch()) | ||
{ | ||
fgFirstBB->Next()->bbFlags |= BBF_CAN_ADD_PRED; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate we need these special cases (and the new block flag), but I understand you tried to avoid it and this is a compromise.
fgFirstBB->Next()->bbFlags |= BBF_CAN_ADD_PRED; | ||
} | ||
|
||
unsigned const bbNumMax = fgBBNumMax; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you capture fgBBNumMax
because the number of blocks can change during morph? But you've disabled block creation. Do you want an assert after the for
loop to assert no new blocks were created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a vestige of my earlier versions where the number of blocks could change, before I removed all the bits that create new blocks.
You think this should have assert(fgBBNumMax == bbNumMax
)?
If so, sure. I can add that in my next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You think this should have
assert(fgBBNumMax == bbNumMax
)?
Right. That would clarify the assumption that the set of blocks doesn't change across this loop.
Jit stress failure looks like #93321, probably unrelated. |
When optimizing, process blocks in RPO. Disallow creation of new blocks and new flow edges (the latter with certain preapproved exceptions).
Morph does not yet take advantage of the RPO to enable more optimization.
Contributes to #93246.