-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
@briansull, @AndyAyersMS PTAL This change acheives the 30% in the microbenchmark from #9692 (added here to perf suite), as well as the 6% in fasta and 5% in TreeSort identified in #11192, plus 6% in fankuch-redux, 5% in Ackermann, and 1% in Puzzle. @redknightlois, it would be great to know if this fixes the issues in your code, or if that would require improved loop detection or discontiguous EH regions or something. Throughput impact negligible (0.07% instrs retired increased in release System.Private.CoreLib crossgen). Results from jit-diff:
|
@JosephTremoulet I have the right benchmark for those :) ... We will be moving the codebase to 2.0 by next week (hopefully) so I could be able to try this. Loop code flow has given us massive improvements, so even if you dont catch all versions it should be a net win performance wise for tight code. |
@JosephTremoulet, if this fixes the problem, it'd be great to also undo the workarounds employed in a variety of places, e.g.
https://github.com/dotnet/corefx/blob/b769a73ecf278f92993a8b789aaaaedbc55a7dc3/src/System.Net.Http/src/System/Net/Http/Headers/HeaderUtilities.cs#L334 https://github.com/dotnet/corefx/blob/cab2165f92380e8424589909158a26ce79e18683/src/System.Memory/src/System/SpanHelpers.byte.cs#L158 (I think there are also some cases where a comment wasn't left, so it might be worth searching for gotos.) |
@JosephTremoulet this change calls for full desktop testing plus tagging dotnet-bot to run a bunch of stress modes. |
@stephentoub sounds good, I'll verify and revert as the compiler change propagates |
@dotnet-bot help |
Welcome to the dotnet/coreclr Repository The following is a list of valid commands on this PR. To invoke a command, comment the indicated phrase on the PR The following commands are valid for all PRs and repositories. Click to expand
The following jobs are launched by default for each PR against dotnet/coreclr:master. Click to expand
The following optional jobs are available in PRs against dotnet/coreclr:master. Click to expand
Have a nice day! |
@dotnet-bot test Windows_NT x86 Release gcstress15_pri1r2r test Windows_NT x86 corefx_jitstressregs3 test Windows_NT x86 Checked jitstress2 test Windows_NT Checked r2r_jitstressregs8 test Windows_NT jitstress1 test Windows_NT gcstress0xc test Ubuntu jitstress2_jitstressregs1 @BruceForstall feel free to add if I missed any important ones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline comments...
src/jit/optimizer.cpp
Outdated
} | ||
} | ||
// The "back-edge" we identified isn't actually part of the flow cycle containing ENTRY | ||
goto NO_LOOP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to make sure I follow here -- we have a lexical "backedge" from BOTTOM to TOP, and a branch from HEAD which is before TOP to ENTRY which sits between TOP and BOTTOM (inclusive), and there is a cycle at ENTRY that does not involve BOTTOM.
So it must be the case that ENTRY != TOP and presumably we may rediscover this loop later on as HEAD moves down and we find another backedge candidate contained within this one.
Do we need a similar check for TOP? Seems like we want to ensure that TOP and BOTTOM define the lexical extent of the loop so that the range-inclusive checks later work out.
Also in pathological cases where we do bail out here, we might trace the in-loop blocks from ENTRY multiple times. Given that we are working from outer-inner the in-loop block set may be large and so we may be repeating something that is potentially costly. Is there some way to instrument and see how likely or frequent this may be? If it is frequent, could we cache the loop sets when we bail out like this?
Or could we avoid the walks and detect this case early? Say if ENTRY doesn't dominate both BOTTOM and TOP then bail out, because BOTTOM->TOP can't be an in-loop edge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're following correctly. Our loop detection algorithm (both before and after this change) already walks all inclusive blocks of each loop and is hence quadratic, regardless of this bail-out. We cap the number of loops we track at 16, which I thought was what prevented this from running away; I've just now discovered that the code didn't actually bail out but instead kept looking for loops that it was doomed to discard, so I've added a bail-out in the update I just pushed.
Regarding "Do we need a similar check for TOP? Seems like we want to ensure that TOP and BOTTOM define the lexical extent of the loop so that the range-inclusive checks later work out" -- yes, we did, thanks. I was thinking that finding BOTTOM in the set was sufficient because it has TOP as a successor, but I had that backwards because the walk is visiting predecessors -- so finding TOP is sufficient because it has BOTTOM as a predecessor, and I've pushed an update accordingly.
src/jit/optimizer.cpp
Outdated
if ((destNum >= top->bbNum) && (destNum <= bottom->bbNum) && | ||
!loopBlocks.isMember(destNum)) | ||
{ | ||
// Reversing this branch out of block `next` could confuse this algorithm, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit confused me -- we're not going to change anything about the way next
branches if we insert blocks before it. So should these tests be looking at moveAfter
(in which case -- assuming BBJ_NONE is not possible, only the BBJ_COND check makes sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we're not happy that moveAfter
splits up fall-through, so we're checking to see if next
is a viable candidate to become our moveAfter
-- in which case we would be inserting blocks after it. So I believe the code is correct but hard to follow. I've added a comment up where next
is defined to try to clarify a bit, let me know if you think something else would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you have an invariant that moveAfter
is always a viable (if not optimal) place to move the blocks.
Maybe the other break
cases confused me -- presumably you have a range of candidate blocks to insert after that starts with BOTTOM and extends all the way to the end of the EH region that contains the blocks you'd like to move.
So for these "reversing would confuse" cases why wouldn't you keep going instead of breaking the search there and accepting the current moveAfter
? And if just after BOTTOM is a viable point then presumably you could just search from there to the end for a non-fallthrough and if you find one, use it, otherwise just insert after BOTTOM.
Also note your search loop doesn't have to worry about walking off the end of the method since the last block is guaranteed to be non-fallthrough. So if you are not in an EH region you might just move the blocks to the end of the method and skip the searching all together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those branches would still be reversed if we moved the blocks anywhere after next
.
The clause where we don't search into a different EH region is one where we could later find another viable candidate (if the blocks after the loop fall through into a new try
and then back out of it), but I'm not sure it's worth the complexity to keep searching for those cases.
Regarding "if you find one, use it, otherwise just insert after BOTTOM", were you just sketching out how an algorithm that kept walking would work, or do you have a reason to think that BOTTOM is a better insertion point than the last legal block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those branches would still be reversed if we moved the blocks anywhere after next.
That's the bit I'm missing. If the code can find a fallthrough, then the "fixup" for the in loop blocks is something it can already handle, so I don't see yet how skipping past a problematic next to some non-fallthrough spot below it wouldn't work. Maybe you just need to be a bit more explicit on what exactly about reversal leads to confusion.
Regarding "if you find one, use it, otherwise just insert after BOTTOM", were you just sketching out how an algorithm that kept walking would work, or do you have a reason to think that BOTTOM is a better insertion point than the last legal block?
Just thinking how I would have started writing this bit. I probably would have tried to leverage fgFindInsertPoint
to do the actual searching, and looking at it now, it seems like there might be logic there you need here, eg a BBJ_CALLFINALLY does not fall through but you can't safely move new blocks after it.
As far as a "best" insertion point -- not creating a new block is preferable from a TP standpoint. Other than that I would say try and avoid creating new lexically backwards branches. Insert after BOTTOM would guarantee this, though you could get the same guarantee if you moved the run of blocks before any post-loop block they reach, and if there is just one such block and maybe try and move the run just before it.
I suppose it would also be nice to avoid moving the blocks into another loop where you'll probably just end up moving them again later on, but that might be tricky to pull off or maybe doesn't happen with any frequency. You could perhaps use BBF_BACKWARD_JUMP
as a screen.
None of this is all that important, if what you have now is correctly moving blocks. If so I would not worry as much about trying to make it optimal, unless you already see issues that come from particular placement choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you just need to be a bit more explicit on what exactly about reversal leads to confusion.
The algorithm expects that if it can find an edge from a block with higher number to a block with lower number, that it has found a lexically backward edge and therefore that walking bbNext
pointers from the successor will find the predecessor. It also assumes, when walking the loop blocks, that any block it visits with number between TOP and BOTTOM is lexically between the two.
Performing any motion that reverses the lexical direction of a flow edge risks breaking those invariants.
(and the motion we do perform can mean that blocks lexically between TOP and BOTTOM in a subsequent loop have numbers less than TOP's number, but I believe that can only cause conservativism [we'll treat flow into or out of that region as exits and side-entries when we maybe didn't need to], not correctness issues)
avoid creating new lexically backwards branches
I think that's a correctness constraint, and that's what this code is trying to do.
Insert after BOTTOM would guarantee this, though ...
I started with always inserting just after bottom, but the diffs showed lots of cases where that led to ugly layout; in particular, given a small method with a search loop that has an early return and the normal loop exit falls through to the other return, it unnecessarily puts an unconditional jump on the normal exit path.
you could get the same guarantee if you moved the run of blocks before any post-loop block they reach, and if there is just one such block and maybe try and move the run just before it.
That's really what this code is trying to do. And since I can't guarantee correctness if I move past any such block, the search stops at the first such block and we move the run just before it.
I probably would have tried to leverage
fgFindInsertPoint
to do the actual searching, and looking at it now, it seems like there might be logic there you need here, eg aBBJ_CALLFINALLY
does not fall through but you can't safely move new blocks after it.
Thanks, I didn't know about fgFindInsertPoint
or that constraint; I'll have to take a look and update...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other thought I've had: defer the actual motion to a post-pass, and just identify the set of blocks that do not belong and can be moved out. I suppose you might also need to simulate moving them...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through fgFindInsertPoint
and think the Call/Always pair check is the only thing to bring over, so have pushed an update that includes it.
defer the actual motion to a post-pass, and just identify the set of blocks that do not belong and can be moved out. I suppose you might also need to simulate moving them...
I considered that, but wanted to first see if it was tenable to do the in-place transform to avoid the extra heap traffic, and I'm happy with the result. And I agree, the "simulate moving them" bookkeeping might end up just as hairy.
src/jit/optimizer.cpp
Outdated
// This code is lexically between TOP and BOTTOM, but it does not | ||
// participate in the flow cycle. Check for a run of consecutive | ||
// such blocks. | ||
BasicBlock* lastExitBlock = block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastExitBlock
sounds a lot like a name for a block that is in a loop. Maybe lastNonLoopBlock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/jit/optimizer.cpp
Outdated
BasicBlock* moveAfter = nullptr; | ||
for (BasicBlock* previous = top->bbPrev; previous != bottom;) | ||
{ | ||
BasicBlock* block = previous->bbNext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this would be a clearer if it was conceptually split up as follows (with appropriate bits of state passed in and out):
for block in range [top..bottom]
previous = block->bbPrev;
if block is in loop: CountLoopExit(...); next = block->bbNext
if block is not in loop: next = MoveBlockRangeOutOfLoopOrTolerateInLoop(...) // ret null if fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the function is more monolithic than ideal, so I'll see if I can find an appropriate way to refactor, but at first blush I don't think it would end up reading better, given the amount of state that would need to get passed in and particularly that would need to get passed out, not to mention I'd need to pull the LoopBlockSet
definition out of line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this would be a clearer if it was conceptually split up...
I agree the function is more monolithic than ideal... the amount of state that would need to get passed in and particularly that would need to get passed out,..
Of course, I could wrap that state up in a class and pass it around implicitly. @AndyAyersMS, let me know if you think JosephTremoulet@0861de1 is an improvement and I can clean it up and push it to the PR branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for considering this -- I think it is an improvement and really helps clarify what is going on.
src/jit/optimizer.cpp
Outdated
|
||
if (moveAfter->bbFallsThrough()) | ||
{ | ||
// We've just inserted blocks between moveAfter and moveBefore which it was supposed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be clearer to me if moveBefore
was actually a thing here. It took me a while to figure out that lastExitBlock->bbNext
was the block that moveAfter
falls through to originally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, defined moveBefore
in the update.
src/jit/optimizer.cpp
Outdated
exitPoint = loopBlock->bbJumpDest; | ||
if (blockNum > oldBlockMaxNum) | ||
{ | ||
BlockSetOps::AddElemD(comp, newBlocksInLoop, blockNum - oldBlockMaxNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to add a comment explaining that newBlocksInLoop
will handle only cover one extra block per existing block, and is represented using (blockNum - oldBlockMaxNum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
src/jit/optimizer.cpp
Outdated
{ | ||
if (predEntry->flBlock->bbNum >= top->bbNum && predEntry->flBlock->bbNum <= bottom->bbNum) | ||
BasicBlock* block = worklist.back(); | ||
worklist.pop_back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the worklist contain blocks that are (blockNum > oldBlockMaxNum)
?
If so, fgDominate
won't work and you will have to use positionNum []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can; good catch, thanks. Updated, but rather than using positionNum
(since fgDominate
takes the blocks), I just defer the check until processing the predecessor.
src/jit/optimizer.cpp
Outdated
@@ -1483,12 +1483,17 @@ void Compiler::optFindNaturalLoops() | |||
|
|||
*/ | |||
|
|||
bool mod = false; | |||
BasicBlock* head; | |||
BasicBlock* top; | |||
BasicBlock* bottom; | |||
BasicBlock* entry; | |||
BasicBlock* exit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to rename this local: exit
as it appears to be colored as if it is a keyword.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I separated lastExit
and onlyExit
while at it, which makes it read more clearly I think.
} | ||
|
||
// Can't move out of loop without crossing try region boundary; should leave in loop | ||
// Lopo should still be recognized, and invariant multiplication hoisted out of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: Lopo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I pushed an update to fix an issue found in desktop testing where creating a goto-next caused an assertion failure. If desktop and normal CI testing are green with the fix, I'll squash the updates and re-trigger stress runs. |
src/jit/optimizer.cpp
Outdated
continue; | ||
} | ||
|
||
// There are multiple entries to this loop, don't consider it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "side entry" could also be from a disjoint part of the loop. It might be interesting someday to work through relaxing this condition. Something like: let the pred closure happen, since it seems like the new dominates check should prevent wandering outside the loop, and see how many disjoint bodies there are out there.
If these are common, we may now have the machinery to compact them too.
Updated to include refactoring. |
@dotnet-bot re-test OSX10.12 x64 Checked Build and Test (infrastructure failure) |
@dotnet-bot test Windows_NT x86 Release gcstress15_pri1r2r test Windows_NT x86 corefx_jitstressregs3 test Windows_NT x86 Checked jitstress2 test Windows_NT Checked r2r_jitstressregs8 test Windows_NT jitstress1 test Windows_NT gcstress0xc test Ubuntu jitstress2_jitstressregs1 |
src/jit/optimizer.cpp
Outdated
// but not the outer loop. ???) | ||
// TOP - the target of the backward edge from BOTTOM. In most cases FIRST and TOP are the same. | ||
// BOTTOM - the lexically last block in the loop (i.e. the block from which we jump to the top) | ||
// EXIT - the predecessor of loop's unique exit edge, if it has a unique exit edge; else nullptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be changed to LastExit?
BasicBlock* top, | ||
BasicBlock* entry, | ||
BasicBlock* bottom, | ||
BasicBlock* exit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be changed to lastexit?
*/ | ||
|
||
void Compiler::optRecordLoop(BasicBlock* head, | ||
bool Compiler::optRecordLoop(BasicBlock* head, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit => lastexit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be lastExit
because that is just something we track while identifying the loop and then throw it away if there were multiple exits. The code that checks for multiples has a local called onlyExit
that it populates iff lastExit
was unique. So if we want to change the name of this parameter and the comments describing loop structure and the lpExit
field in the loop table, we could change it to onlyExit
or perhaps better uniqueExit
, but that seems unnecessarily verbose to me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
I pushed a fix for an issue discovered by stress testing. |
src/jit/optimizer.cpp
Outdated
// HEAD - the basic block that flows into the loop ENTRY block (Currently MUST be lexically before entry). | ||
// Not part of the looping of the loop. | ||
// FIRST - the lexically first basic block (in bbNext order) within this loop. (May be part of a nested loop, | ||
// but not the outer loop. ???) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???) [](start = 39, length = 4)
You copied the "???" from before, but since you've been working on this code: do you know the answer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the question. Is the question whether fist
may be part of "a nested loop", and also whether it may be part of "the outer loop"? If so, "nested" and "outer" with respect to what? Why is it "a nested loop" vs. "the outer loop"? If we're talking about the full loop nest forest, then there aren't any restrictions on how deeply nested a loop with a "first" is. If we're talking about whether a given loop's "first" may be included in any "outer" loops enclosing the given one, then yes of course it is a member of all of them. If we're talking about whether a given loop's "first" may be included in any "nested" loops enclosed by the given one, then:
- as far as
LoopSearch
is concerned, there's nothing conceptually barring that - there's code called after
loopSearch
, but still withinoptFindNaturalLoops
, that canonicalizes loops; its comments say the point is to have no two loops sharetop
but then the implementation talks about sharingfist
- in the current implementation,
first
andtop
are always the same, and the search algorithm inLoopSearch
will never find two loops that share one
So I don't know what was supposed to be said here. Should I just remove the whole parenthetical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I just remove the whole parenthetical?
I'd be ok with that.
Perhaps a discussion somehere (here?) about the relationship between all the concepts here (FIRST/TOP/etc.) and nested loops. e.g., can a FIRST be the FIRST of multiple loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified comments in 73f1167
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for prompting me to take a closer look at this. I had misread and thought the old code wouldn't identify loops that shared a TOP
or ENTRY
or FIRST
-- on closer read, it allowed it as long as entry
's predecessor list had an inner-loop predecessor prior to having an outer-loop predecessor (oddly). So I've fixed my code to recognize nested loops that share TOP
/FIRST
/ENTRY
in 13bcb53 (before refactoring) and 1091e89 + a9c20d2 (after refactoring), without the bizarro restriction on predecessor list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is the whole "keep the pred list sorted" thing that came up over in #13322. If loop recognition no longer needs this then maybe we can drop the whole thing.
src/jit/optimizer.cpp
Outdated
// v | ||
// head | ||
// | | ||
// | top/beg <--+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "beg" in "top/beg" be "first"? Should there be an example where TOP and FIRST are not the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be "first". We do not currently have any examples where TOP and FIRST are not the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 73f1167
This is mainly done to increase readability, as `optFindNaturalLoops` had grown excessively large. It also facilitates re-using code to fix up fallthrough, and skipping past CallFinally/BBJ_ALWAYS pairs rather than aborting once they're found.
@dotnet-bot test Windows_NT x86 corefx_jitstressregs3 test Windows_NT x86 Checked jitstress2 test Windows_NT Checked r2r_jitstressregs8 test Windows_NT jitstress1 test Ubuntu jitstress2_jitstressregs1 |
Memstats reports 0.1% more allocated bytes in release System.Private.Corelib crossgen: diff --git "a/memstats-base.txt" "b/memstats-diff.txt"
index baa2dee93..9650c0d99 100644
--- "a/memstats-base.txt"
+++ "b/memstats-diff.txt"
@@ -1,133 +1,133 @@
D:\Source\coreclr>D:\Source\coreclr\bin\Product\Windows_NT.x64.Release\crossgen.exe /Platform_Assemblies_Paths D:\Source\coreclr\bin\Product\Windows_NT.x64.Release\IL /out D:\Source\coreclr\bin\Product\Windows_NT.x64.Release\System.Private.CoreLib.dll D:\Source\coreclr\bin\Product\Windows_NT.x64.Release\IL\System.Private.CoreLib.dll
All allocations:
For 26291 methods:
- count: 20198079 (avg 768 per method)
+ count: 20269450 (avg 770 per method)
- alloc size : 1813175957 (avg 68965 per method)
+ alloc size : 1815665942 (avg 69060 per method)
max alloc : 86912
- allocateMemory : 3336175616 (avg 126894 per method)
+ allocateMemory : 3338403840 (avg 126978 per method)
- nraUsed : 2546966432 (avg 96875 per method)
+ nraUsed : 2549453232 (avg 96970 per method)
Alloc'd bytes by kind:
kind | size | pct
---------------------+------------+--------
- AssertionProp | 170463080 | 9.40%
+ AssertionProp | 170463080 | 9.39%
- ASTNode | 332156679 | 18.32%
+ ASTNode | 332244447 | 18.30%
- InstDesc | 49788288 | 2.75%
+ InstDesc | 49833836 | 2.74%
ImpStack | 498576 | 0.03%
- BasicBlock | 59676128 | 3.29%
+ BasicBlock | 59727608 | 3.29%
- fgArgInfo | 10293192 | 0.57%
+ fgArgInfo | 10293752 | 0.57%
- fgArgInfoPtrArr | 1470456 | 0.08%
+ fgArgInfoPtrArr | 1470536 | 0.08%
- FlowList | 7445776 | 0.41%
+ FlowList | 8109200 | 0.45%
- TreeStatementList | 1820768 | 0.10%
+ TreeStatementList | 1823072 | 0.10%
- SiScope | 8998360 | 0.50%
+ SiScope | 8832344 | 0.49%
FlatFPStateX87 | 0 | 0.00%
- DominatorMemory | 5762000 | 0.32%
+ DominatorMemory | 6364728 | 0.35%
- LSRA | 88207680 | 4.86%
+ LSRA | 88227684 | 4.86%
- LSRA_Interval | 55517616 | 3.06%
+ LSRA_Interval | 55520168 | 3.06%
- LSRA_RefPosition | 171685080 | 9.47%
+ LSRA_RefPosition | 171686480 | 9.46%
Reachability | 504992 | 0.03%
- SSA | 43350424 | 2.39%
+ SSA | 43346104 | 2.39%
- ValueNumber | 338346518 | 18.66%
+ ValueNumber | 338356155 | 18.64%
- LvaTable | 82502848 | 4.55%
+ LvaTable | 82502872 | 4.54%
- UnwindInfo | 191968 | 0.01%
+ UnwindInfo | 191744 | 0.01%
- hashBv | 5633656 | 0.31%
+ hashBv | 5633568 | 0.31%
- bitset | 28625144 | 1.58%
+ bitset | 28714696 | 1.58%
FixedBitVect | 1056 | 0.00%
- AsIAllocator | 150184316 | 8.28%
+ AsIAllocator | 150910712 | 8.31%
IndirAssignMap | 85328 | 0.00%
FieldSeqStore | 4782488 | 0.26%
ZeroOffsetFieldMap | 1283880 | 0.07%
- ArrayInfoMap | 2489776 | 0.14%
+ ArrayInfoMap | 2490016 | 0.14%
- MemoryPhiArg | 945952 | 0.05%
+ MemoryPhiArg | 946720 | 0.05%
- CSE | 38697824 | 2.13%
+ CSE | 38698800 | 2.13%
- GC | 65404271 | 3.61%
+ GC | 65393663 | 3.60%
CorSig | 6770920 | 0.37%
- Inlining | 36225376 | 2.00%
+ Inlining | 36172960 | 1.99%
- ArrayStack | 13125504 | 0.72%
+ ArrayStack | 13500672 | 0.74%
- DebugInfo | 9907920 | 0.55%
+ DebugInfo | 9911400 | 0.55%
DebugOnly | 0 | 0.00%
Codegen | 0 | 0.00%
- LoopOpt | 67200 | 0.00%
+ LoopOpt | 67440 | 0.00%
- LoopHoist | 1586920 | 0.09%
+ LoopHoist | 1621912 | 0.09%
- Unknown | 18677997 | 1.03%
+ Unknown | 18682333 | 1.03%
Largest method:
-count: 78538, size: 4085578, max = 59992
+count: 80075, size: 4146298, max = 59992
-allocateMemory: 4259840, nraUsed: 4153976
+allocateMemory: 4325376, nraUsed: 4214760
Alloc'd bytes by kind:
kind | size | pct
---------------------+------------+--------
AssertionProp | 6460 | 0.16%
- ASTNode | 1033792 | 25.30%
+ ASTNode | 1035328 | 24.97%
- InstDesc | 68804 | 1.68%
+ InstDesc | 68804 | 1.66%
ImpStack | 192 | 0.00%
- BasicBlock | 88272 | 2.16%
+ BasicBlock | 90384 | 2.18%
- fgArgInfo | 22064 | 0.54%
+ fgArgInfo | 22064 | 0.53%
fgArgInfoPtrArr | 3152 | 0.08%
- FlowList | 62640 | 1.53%
+ FlowList | 73232 | 1.77%
- TreeStatementList | 10528 | 0.26%
+ TreeStatementList | 10528 | 0.25%
SiScope | 8960 | 0.22%
FlatFPStateX87 | 0 | 0.00%
- DominatorMemory | 41120 | 1.01%
+ DominatorMemory | 48048 | 1.16%
- LSRA | 114708 | 2.81%
+ LSRA | 117876 | 2.84%
- LSRA_Interval | 141944 | 3.47%
+ LSRA_Interval | 141944 | 3.42%
- LSRA_RefPosition | 446656 | 10.93%
+ LSRA_RefPosition | 446656 | 10.77%
Reachability | 32 | 0.00%
- SSA | 55236 | 1.35%
+ SSA | 55428 | 1.34%
- ValueNumber | 342067 | 8.37%
+ ValueNumber | 342491 | 8.26%
- LvaTable | 82776 | 2.03%
+ LvaTable | 82776 | 2.00%
UnwindInfo | 32 | 0.00%
hashBv | 5400 | 0.13%
- bitset | 635680 | 15.56%
+ bitset | 644168 | 15.54%
FixedBitVect | 0 | 0.00%
- AsIAllocator | 660700 | 16.17%
+ AsIAllocator | 685644 | 16.54%
IndirAssignMap | 0 | 0.00%
FieldSeqStore | 440 | 0.01%
ZeroOffsetFieldMap | 64 | 0.00%
- ArrayInfoMap | 15800 | 0.39%
+ ArrayInfoMap | 15800 | 0.38%
- MemoryPhiArg | 3184 | 0.08%
+ MemoryPhiArg | 3216 | 0.08%
CSE | 18528 | 0.45%
- GC | 96796 | 2.37%
+ GC | 96796 | 2.33%
CorSig | 8528 | 0.21%
- Inlining | 25680 | 0.63%
+ Inlining | 25680 | 0.62%
- ArrayStack | 40448 | 0.99%
+ ArrayStack | 42624 | 1.03%
DebugInfo | 9576 | 0.23%
DebugOnly | 0 | 0.00%
Codegen | 0 | 0.00%
- LoopOpt | 4320 | 0.11%
+ LoopOpt | 4320 | 0.10%
LoopHoist | 7928 | 0.19%
- Unknown | 23071 | 0.56%
+ Unknown | 23199 | 0.56%
---------------------------------------------------
Distribution of total memory allocated per method (in KB):
<= 20 ===> 0 count ( 0% of total)
21 .. 50 ===> 0 count ( 0% of total)
51 .. 75 ===> 11123 count ( 42% of total)
76 .. 100 ===> 0 count ( 42% of total)
- 101 .. 150 ===> 11091 count ( 84% of total)
+ 101 .. 150 ===> 11079 count ( 84% of total)
- 151 .. 250 ===> 2400 count ( 93% of total)
+ 151 .. 250 ===> 2407 count ( 93% of total)
- 251 .. 500 ===> 1455 count ( 99% of total)
+ 251 .. 500 ===> 1459 count ( 99% of total)
501 .. 1000 ===> 151 count ( 99% of total)
- 1001 .. 5000 ===> 71 count (100% of total)
+ 1001 .. 5000 ===> 72 count (100% of total)
---------------------------------------------------
Distribution of total memory used per method (in KB):
<= 20 ===> 0 count ( 0% of total)
21 .. 50 ===> 6051 count ( 23% of total)
51 .. 75 ===> 8470 count ( 55% of total)
- 76 .. 100 ===> 4944 count ( 74% of total)
+ 76 .. 100 ===> 4937 count ( 74% of total)
- 101 .. 150 ===> 3999 count ( 89% of total)
+ 101 .. 150 ===> 3997 count ( 89% of total)
- 151 .. 250 ===> 1964 count ( 96% of total)
+ 151 .. 250 ===> 1971 count ( 96% of total)
- 251 .. 500 ===> 683 count ( 99% of total)
+ 251 .. 500 ===> 675 count ( 99% of total)
- 501 .. 1000 ===> 109 count ( 99% of total)
+ 501 .. 1000 ===> 119 count ( 99% of total)
1001 .. 5000 ===> 71 count (100% of total)
Microsoft (R) CoreCLR Native Image Generator - Version 4.5.22220.0
Copyright (c) Microsoft Corporation. All rights reserved.
Native image D:\Source\coreclr\bin\Product\Windows_NT.x64.Release\System.Private.CoreLib.dll generated successfully. |
I've verified there are no SuperPMI desktop asm diffs for the refactoring (99edc2b). |
Awesome. Thanks. |
Remove some `goto`s that were added to work around #9692 (poor code layout for loop exit paths) -- the JIT's layout decisions were improved in dotnet#13314, and these particular `goto`s are no longer needed; crossgen of System.Private.CoreLib now produces the same machine code with or without this change. Part of #13466.
Remove some `goto`s that were added to work around #9692 (poor code layout for loop exit paths) -- the JIT's layout decisions were improved in #13314, and these particular `goto`s are no longer needed; crossgen of System.Private.CoreLib now produces the same machine code with or without this change. Part of #13466.
Remove some `goto`s that were added to work around dotnet/coreclr#9692 (poor code layout for loop exit paths) -- the JIT's layout decisions were improved in dotnet/coreclr#13314, and these particular `goto`s are no longer needed; the same machine code is generated with or without this change. Some `goto`s previously tagged as workarounds for dotnet/coreclr#9692 are still relevant for keeping codesize down pending dotnet/coreclr#13549; update their comments accordingly. Part of #23395.
Remove some `goto`s that were added to work around dotnet/coreclr#9692 (poor code layout for loop exit paths) -- the JIT's layout decisions were improved in dotnet/coreclr#13314, and these particular `goto`s are no longer needed; the same machine code is generated with or without this change. Some `goto`s previously tagged as workarounds for dotnet/coreclr#9692 are still relevant for keeping codesize down pending dotnet/coreclr#13549; update their comments accordingly. Part of #23395.
Remove some `goto`s that were added to work around dotnet/coreclr#9692 (poor code layout for loop exit paths) -- the JIT's layout decisions were improved in dotnet/coreclr#13314, and these particular `goto`s are no longer needed; the same machine code is generated with or without this change. Some `goto`s previously tagged as workarounds for dotnet/coreclr#9692 are still relevant for keeping codesize down pending dotnet/coreclr#13549; update their comments accordingly. Part of #23395.
Remove some `goto`s that were added to work around dotnet/coreclr#9692 (poor code layout for loop exit paths) -- the JIT's layout decisions were improved in dotnet/coreclr#13314, and these particular `goto`s are no longer needed; the same machine code is generated with or without this change. Some `goto`s previously tagged as workarounds for dotnet/coreclr#9692 are still relevant for keeping codesize down pending dotnet/coreclr#13549; update their comments accordingly. Part of #23395.
@redknightlois, this has made it into preview feeds as of Microsoft.NETCore.App 2.1.0-preview2-25628-01 -- with |
Remove some `goto`s that were added to work around undesirable jit layout (#9692, fixed in dotnet#13314) and epilog factoring (improved in dotnet#13792 and dotnet#13903), which are no longer needed. Resolves #13466.
Rearrange basic blocks during loop identification so that loop bodies
are kept contiguous when possible. Blocks in the lexical range of the
loop which do not participate in the flow cycle (which typically
correspond to code associated with early exits using
break
orreturn
) are moved out below the loop when possible without breaking EHregion nesting. The target insertion point, when possible, is chosen to
be the first spot below the loop that will not break up fall-through.
Layout can significantly affect the performance of loops, particularly
small search loops, by avoiding the taken branch on the hot path,
improving the locality of the code fetched while iterating the loop, and
potentially aiding loop stream detection.
Resolves #9692.