Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy prop equivalent phis #104458

Closed

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Jul 5, 2024

VN doesn't always give PHI defs the best possible values (in particular if there are backedge phi args). Look for equivalent phis later, during copy prop.

Addresses the regression case noted in #95645 (comment) where cross-block morph's copy prop plus loop bottom testing has created some unnecessary loop-carried values.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 5, 2024
@AndyAyersMS
Copy link
Member Author

@jakobbotsch FYI -- fixes a regression from cross-block local assertion prop, and unblocks a number of cases where we can now IV widen. But feels a bit iffy, maybe instead we should keep a side table of equivalent VNs or something.

Code from the case in #95645 (comment) now becomes
image
which is an IV widening's better than 8.0

@jakobbotsch
Copy link
Member

Nice, this looks similar to what I saw in #99048 (comment). Does it handle that case too?

@AndyAyersMS
Copy link
Member Author

Nice, this looks similar to what I saw in #99048 (comment). Does it handle that case too

There are diffs for that method in the NAOT collection where we now widen an IV. Seems like there should be more instances in the other collections perhaps? I will have to go and check.

@AndyAyersMS
Copy link
Member Author

Diffs are encouraging, but TP cost is pretty high.

I'd be surprised if the extra cost is from the phi arg scanning, but it's possible; phi arg lists can be big. Will need to dig in a bit more. I may also want to split out the other change to copy prop as its own PR.

@SingleAccretion
Copy link
Contributor

I'd be surprised if the extra cost is from the phi arg scanning, but it's possible; phi arg lists can be big. Will need to dig in a bit more.

optCopyProp is one of the hottest functions in the compiler, even tiny changes to it can result is big TP diffs. If I were to guess, the cost here is mainly due to the extra checks // Do we have phi definitions? Do the types match?... added to the hot path.

@AndyAyersMS AndyAyersMS marked this pull request as ready for review July 8, 2024 20:16
@AndyAyersMS
Copy link
Member Author

I will undo the AP side in favor of Egor's #104493. Some of whatever that may get will happen here anyways from the phi arg VN updates.

Comment on lines +209 to +212
// This happens because VN is one-pass and doesn't know the VNs for backedge PhiArgs,
// so the VNs for phi defs for loop entry blocks is always a novel opaque VN. But
// we can query those backedge VNs after VN is done, and if all Phi Arg VNs match, then
// the two Phi Defs are equivalent and could have had the same VN.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what some of these cases look like... the fact that we have a phi in the header should mean we saw a def inside the loop, so in this case that def has to be redefining the local to have the same value as when we come from outside the loop. I would (naively, maybe?) expect that to be rare.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defining a local to have the same value as some other local at the start of a loop. So yes both have the same value on initial entry and end up with the same value along the backedge. #95645 (comment) has some notes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely the fallout from of pre-increment or similar, the code modifies a value and then uses the pre-modified value... eg

public bool MoveNext()
{
return ++_index < _array.Length;
}

This requires a temp in our IR. Morph and loop inversion conspire to make both old and new value live around the loop.

@AndyAyersMS
Copy link
Member Author

latest diffs

TP cost is lower but still 0.68% worst case on windows.

Profiling shows this is all in optCopyProp. I currently don't see any way to make it cheaper there. Commenting out the guts of the PHI scanning shows it's not the extra number of ssa defs available. So likely there are just a lot of phi def copy prop candidates to consider, and we do more work for each one.

We might consider refining the PhiArg VNs right after the main value numbering pass. We could (say) keep a list of all the impacted operands and just fix their VNs immediately. But we would not be able to rename phi def VNs without leading to other problems (lcl uses quite reasonably expect their phi def VNs to match their own VNs). Even if we kept ssa use lists so we could renumber direct uses, we'd run into cascading VN updates when one local is copied to another.

@AndyAyersMS
Copy link
Member Author

Some metrics from the real world collection (win x64)

 1,117,580 calls to optCopyProp
30,806,713 defs available as potential copies
30,617,105 defs with vn mismatches
   932,064 tree def is at cycle entry
    59,587 copy def is at same block as tree def
    13,096 copy def is phi def

So not surprisingly adding a bit of extra logic to the VN mismatch path increases TP, though the profile tails off pretty quickly and we only walk phis infrequently. Could be some of the TP cost will be rectified by PGO.

Having 30 copy prop candidates (on average) per use seems pretty astounding. One thing that might help is (again) keeping track of SSA uses for each def and doing multiple updates instead of redoing all this at each use. But it also seems like we could keep the candidate set partitioned somehow (say by type if nothing else) and cut down on a lot of fruitless searching.

The other thing I wonder about copy prop is whether it should have more of a global plan/execute model, this seems more common in the literature and probably avoids a lot of redundant work too (first build a copy equivalence graph, decide what things will be copied where, then carry it all out). Might be a bit trickier if we need to also avoid overlapping lifetimes but seems like it is worth considering. Right now if there are multiple possible copy candidates it seems somewhat random which ones we'll pick.

@AndyAyersMS
Copy link
Member Author

@jakobbotsch PTAL

@jakobbotsch
Copy link
Member

Should the title/description of the PR be updated?

if (!foundEquivalentPhi)
{
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nesting here is pretty deep... consider introducing a function with some early outs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was tempted to try that to see if helped TP any, but haven't yet.

@jakobbotsch
Copy link
Member

We might consider refining the PhiArg VNs right after the main value numbering pass. We could (say) keep a list of all the impacted operands and just fix their VNs immediately. But we would not be able to rename phi def VNs without leading to other problems (lcl uses quite reasonably expect their phi def VNs to match their own VNs). Even if we kept ssa use lists so we could renumber direct uses, we'd run into cascading VN updates when one local is copied to another.

Would it be possible to propagate the VNs immediately after VN'ing the store, and then refine the phis there? At that point we can iterate the containing loops (or even just keep the dominance frontiers around, since we computed it during SSA anyway) to find what to annotate. Not sure what the cost of that would look like, however.

@AndyAyersMS AndyAyersMS changed the title Assertion prop equivalent phis Coy prop equivalent phis Jul 9, 2024
@AndyAyersMS
Copy link
Member Author

We might consider refining the PhiArg VNs right after the main value numbering pass. We could (say) keep a list of all the impacted operands and just fix their VNs immediately. But we would not be able to rename phi def VNs without leading to other problems (lcl uses quite reasonably expect their phi def VNs to match their own VNs). Even if we kept ssa use lists so we could renumber direct uses, we'd run into cascading VN updates when one local is copied to another.

Would it be possible to propagate the VNs immediately after VN'ing the store, and then refine the phis there? At that point we can iterate the containing loops (or even just keep the dominance frontiers around, since we computed it during SSA anyway) to find what to annotate. Not sure what the cost of that would look like, however.

Maybe similar to that we could keep track of when the last cycle entry block predecessor has been value numbered and just walk the phis of the cycle entry block making updates. Updating the phi arg seems simple enough, but updating the phi def maybe less so (if say we want that to propagate to the uses, which I think we'd have to do, to keep things sane). But maybe instead we just record the updated conservative VN on the phi def.

However the TP results suggest the phi update is relatively cheap, it is determining when you can look for the equivalent VNs that is costly; just those extra few checks in the copy candidate scanning loop are enough to cause the TP hit. So I think a more promising avenue is to try and pare down the number of candidates that we consider as possible copy sources -- currently we match only about 1% of the time, and possibly some sizeable fraction of the mismatches could have simply been excluded from consideration via some other means.

The other option I considered is to do the phi recalcs during VN but record the results as an equivalence table somewhere on the side (a map from VN->VN), and if the basic VN comparison fails in copy prop, we then map each VN to its "canonically equivalent" VN and compare those.

@AndyAyersMS AndyAyersMS changed the title Coy prop equivalent phis Copy prop equivalent phis Jul 10, 2024
@jakobbotsch
Copy link
Member

Updating the phi arg seems simple enough, but updating the phi def maybe less so (if say we want that to propagate to the uses, which I think we'd have to do, to keep things sane).

If we are just doing this for loops then you could make use of the LoopLocalOccurrences class I'm using for the IV opts. From what I have seen it does not seem super costly to keep track of the uses if we are only doing it within loops.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generally looks good to me, but it feels a bit misplaced to do this refinement during copy prop. I would suggest trying out some of the alternatives discussed, like doing this refinement in the VN phase.

One interesting thing it seems we could do is structure VN so that it follows loop structure: when we get to a loop header, make sure we visit the blocks of the loop in RPO (instead of using the DFS tree RPO). Then once we have finished a full loop we can do the refinement of the phis in the loop (using something akin to LoopLocalOccurrences), before we proceed with outer loops or subsequent blocks. It seems it would automatically allow us to propagate the refined phis outside the loop.

@AndyAyersMS
Copy link
Member Author

One interesting thing it seems we could do is structure VN so that it follows loop structure: when we get to a loop header, make sure we visit the blocks of the loop in RPO (instead of using the DFS tree RPO). Then once we have finished a full loop we can do the refinement of the phis in the loop (using something akin to LoopLocalOccurrences), before we proceed with outer loops or subsequent blocks. It seems it would automatically allow us to propagate the refined phis outside the loop.

I have VN reworked this way. Once it finishes with a loop it reinvokes value numbering for the PHIs in the loop header. Seems like we might not need to propagate the new VNs forward as most places don't care (some sporadic asserts in AP which we can probably sort later). At any rate this works to a degree, but there's a snag. A PhiDef VN also encodes its ssa def numbers, so even if the value part of a PHI is equal, the overall PhiDef VN differs, eg

OLD

***** BB03 [0001]
STMT00024 ( ??? ... ??? )
N004 (  0,  0) [000126] DA---------                         *  STORE_LCL_VAR int    V12 tmp7         d:4 $VN.Void
N003 (  0,  0) [000125] -----------                         \--*  PHI       int    $340
N001 (  0,  0) [000132] ----------- pred BB03                  +--*  PHI_ARG   int    V12 tmp7         u:5
N002 (  0,  0) [000130] ----------- pred BB02                  \--*  PHI_ARG   int    V12 tmp7         u:3 $c0

***** BB03 [0001]
STMT00023 ( ??? ... ??? )
N004 (  0,  0) [000124] DA---------                         *  STORE_LCL_VAR int    V09 tmp4         d:3 $VN.Void
N003 (  0,  0) [000123] -----------                         \--*  PHI       int    $341
N001 (  0,  0) [000133] ----------- pred BB03                  +--*  PHI_ARG   int    V09 tmp4         u:4
N002 (  0,  0) [000131] ----------- pred BB02                  \--*  PHI_ARG   int    V09 tmp4         u:2 $c0


NEW

***** BB03 [0001]
STMT00024 ( ??? ... ??? )
N004 (  0,  0) [000126] DA---------                         *  STORE_LCL_VAR int    V12 tmp7         d:4 $VN.Void
N003 (  0,  0) [000125] -----------                         \--*  PHI       int    $340
N001 (  0,  0) [000132] ----------- pred BB03                  +--*  PHI_ARG   int    V12 tmp7         u:5 $30b
N002 (  0,  0) [000130] ----------- pred BB02                  \--*  PHI_ARG   int    V12 tmp7         u:3 $c0

***** BB03 [0001]
STMT00023 ( ??? ... ??? )
N004 (  0,  0) [000124] DA---------                         *  STORE_LCL_VAR int    V09 tmp4         d:3 $VN.Void
N003 (  0,  0) [000123] -----------                         \--*  PHI       int    $341
N001 (  0,  0) [000133] ----------- pred BB03                  +--*  PHI_ARG   int    V09 tmp4         u:4 $30b
N002 (  0,  0) [000131] ----------- pred BB02                  \--*  PHI_ARG   int    V09 tmp4         u:2 $c0

So copy prop sees $340 vs $341 and ignores the potential copy.

It appears copy prop will still need to special case PHI defs by destructuring the funcs, or we'll need to reconsider why we are encoding PHI defs this way. Maybe we need to keep the ssa def numbers somewhere else (eg map from a PHI VN to the PHI IR node or suitably extracted data).

Even without this fix to copyprop I'm seeing interesting diffs, possibly because we now see some PHIs with all inputs the same, which we refine to be simply that input's VN.

If I suppress the PHI updates (that is, just run with the new VN traversal order) then the change is zero diff and minimal TP impact, which is nice to see.

Code diffs here if you're curious: main...AndyAyersMS:ValueNumRefinePhis

@jakobbotsch
Copy link
Member

One interesting thing it seems we could do is structure VN so that it follows loop structure: when we get to a loop header, make sure we visit the blocks of the loop in RPO (instead of using the DFS tree RPO). Then once we have finished a full loop we can do the refinement of the phis in the loop (using something akin to LoopLocalOccurrences), before we proceed with outer loops or subsequent blocks. It seems it would automatically allow us to propagate the refined phis outside the loop.

I have VN reworked this way. Once it finishes with a loop it reinvokes value numbering for the PHIs in the loop header. Seems like we might not need to propagate the new VNs forward as most places don't care (some sporadic asserts in AP which we can probably sort later). At any rate this works to a degree, but there's a snag. A PhiDef VN also encodes its ssa def numbers, so even if the value part of a PHI is equal, the overall PhiDef VN differs, eg

OLD

***** BB03 [0001]
STMT00024 ( ??? ... ??? )
N004 (  0,  0) [000126] DA---------                         *  STORE_LCL_VAR int    V12 tmp7         d:4 $VN.Void
N003 (  0,  0) [000125] -----------                         \--*  PHI       int    $340
N001 (  0,  0) [000132] ----------- pred BB03                  +--*  PHI_ARG   int    V12 tmp7         u:5
N002 (  0,  0) [000130] ----------- pred BB02                  \--*  PHI_ARG   int    V12 tmp7         u:3 $c0

***** BB03 [0001]
STMT00023 ( ??? ... ??? )
N004 (  0,  0) [000124] DA---------                         *  STORE_LCL_VAR int    V09 tmp4         d:3 $VN.Void
N003 (  0,  0) [000123] -----------                         \--*  PHI       int    $341
N001 (  0,  0) [000133] ----------- pred BB03                  +--*  PHI_ARG   int    V09 tmp4         u:4
N002 (  0,  0) [000131] ----------- pred BB02                  \--*  PHI_ARG   int    V09 tmp4         u:2 $c0


NEW

***** BB03 [0001]
STMT00024 ( ??? ... ??? )
N004 (  0,  0) [000126] DA---------                         *  STORE_LCL_VAR int    V12 tmp7         d:4 $VN.Void
N003 (  0,  0) [000125] -----------                         \--*  PHI       int    $340
N001 (  0,  0) [000132] ----------- pred BB03                  +--*  PHI_ARG   int    V12 tmp7         u:5 $30b
N002 (  0,  0) [000130] ----------- pred BB02                  \--*  PHI_ARG   int    V12 tmp7         u:3 $c0

***** BB03 [0001]
STMT00023 ( ??? ... ??? )
N004 (  0,  0) [000124] DA---------                         *  STORE_LCL_VAR int    V09 tmp4         d:3 $VN.Void
N003 (  0,  0) [000123] -----------                         \--*  PHI       int    $341
N001 (  0,  0) [000133] ----------- pred BB03                  +--*  PHI_ARG   int    V09 tmp4         u:4 $30b
N002 (  0,  0) [000131] ----------- pred BB02                  \--*  PHI_ARG   int    V09 tmp4         u:2 $c0

So copy prop sees $340 vs $341 and ignores the potential copy.

It appears copy prop will still need to special case PHI defs by destructuring the funcs, or we'll need to reconsider why we are encoding PHI defs this way. Maybe we need to keep the ssa def numbers somewhere else (eg map from a PHI VN to the PHI IR node or suitably extracted data).

Even without this fix to copyprop I'm seeing interesting diffs, possibly because we now see some PHIs with all inputs the same, which we refine to be simply that input's VN.

If I suppress the PHI updates (that is, just run with the new VN traversal order) then the change is zero diff and minimal TP impact, which is nice to see.

Code diffs here if you're curious: main...AndyAyersMS:ValueNumRefinePhis

Thanks for trying this out. Might be useful to PR that as a separate change, if the diffs look good.

It doesn't look like it will be super simple to switch the VN phis to store the VN from the pred instead of the SSA number. It seems only map selection is using it, but the way it currently works allows it to reason about the values on some backedges, provided the map selection happens after the definition on the backedge was VN'd. Not sure how important this is -- might be interesting to check the number of regressions if we just switch VNF_Phi to have VNs instead of SSA nums in the non-memory case.

We could probably keep the behavior by eagerly refining the VNs of the phi definitions as soon as we saw the relevant store. Not sure how expensive that would be -- I guess we could compute "(lclNum, ssaNum) -> (header) phi" map during optComputeLoopSideEffects. Maybe that wouldn't be too bad.

Anyway, beyond the cosmetic changes (like extracting a function) I am ok with the approach of this PR. I do think moving the refinement itself into VN would be nice, but without changing VNF_Phi it seems the destructuring has to be in copy prop (and it makes sense to me to do it here when VNF_Phi is represented like this).

@AndyAyersMS
Copy link
Member Author

Closing in favor of #104752

@github-actions github-actions bot locked and limited conversation to collaborators Aug 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants