-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor subrange assertion generation and propagation #55186
Refactor subrange assertion generation and propagation #55186
Conversation
d466b18
to
8e90830
Compare
@SingleAccretion - Is it possible to narrow this PR to just the fix for #54842 vs. improvement? We will take the fix for .NET6 and improvement can be merged in future milestone. |
I will see what can be done. |
6d5746d
to
eb68aa8
Compare
eb68aa8
to
77a6951
Compare
24a6b7c
to
e766cad
Compare
68fa9bd
to
8f63b0a
Compare
With the quirk in place, this PR is ready for review. Updated diffs: Now almost all are improvements, with the The only regressions (including the rather large - movzx rdx, sil
- mov r8, gword ptr [rbp-18H]
+ mov r8, gword ptr [rbp-20H] LSRA now decides to use a callee-saved register, or the prolog zeroing code gets a bit larger while the frame is smaller, etc. This can also happen due to new copy propagations. |
73234b1
to
471d02a
Compare
cc @dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this refactoring, but I'm confused on a few points.
{ | ||
switch (toType) | ||
{ | ||
// CAST_OVF(uint <- uint) - [INT_MIN..INT_MAX] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this case a bit more?
src/coreclr/jit/assertionprop.cpp
Outdated
{ | ||
return nullptr; | ||
} | ||
|
||
// Skip over a GT_COMMA node(s), if necessary to get to the lcl. | ||
GenTree* lcl = op1; | ||
while (lcl->gtOper == GT_COMMA) | ||
while (lcl->OperIs(GT_COMMA)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well use gtEffectiveVal
here, since you're changing this bit.
@@ -1064,6 +1064,90 @@ class LclVarDsc | |||
|
|||
}; // class LclVarDsc | |||
|
|||
enum class SymbolicIntegerValue : int32_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we might want ULongMax
in here too, for completeness?
upperBound = SymbolicIntegerValue::IntMax; | ||
break; | ||
|
||
// CAST_OVF(ulong <- uint) - [INT_MIN..INT_MAX] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here, why is this not [0..UINT_MAX]
?
A very intentional part of the design here is that the This makes This makes a cast like
|
Ok, good. It is probably worth mentioning this in comments somewhere -- maybe in the header comment for Let me test my understanding. In
So we'd say -- yes this can overflow. Valid input range is indeed |
#58776 changed how address exposure is handled and is causing your builds to break. |
Yes, will be rebasing...
Will write something more substantial than what is in the header of |
Move it into its own function. No diffs.
They are not useful: they do not tell us anything about the value. They can be harmful: code operating on the assertions can wrongly remove a checked cast because of an assertion for a non-checked one.
This change completely refactors the way subrange assertions are generated and consumed. It rationalizes and generalizes the model for subrange assertions which are generated and propagated for casts, makes AssertionDsc::Range into an actual abstraction, eliminates bugs related to wrong assertions generated for TYP_BOOL (without CQ loss), enables propagation for checked casts to non-small types both on 64 and 32 bit, all the while making the AssertionDsc structure smaller. There are quite a few moving parts, but at the heart of this change are following items: 1. Make ranges symbolic, i. e. made up of symbolic upper and lower bounds, represented by the new SymbolicIntegerValue enum class. This is necessary to not regress memory consumption and throughput of operations on 32 bit while handling long subranges. 2. Split the concept of "input" and "output" for a cast node. To properly handle checked cast and make the code "obviously correct", it is necessary to split the two values, one the cast node produces and one it consumes, apart. The old code only handles the "easy" case of the "input" and "output" ranges matching (such is the case for small types), the new code must be, and is, robust in reasoning about, e. g. CAST_OVF(uint <- ulong) vs CAST_OVF(int <- long). The ranges that represent these values are consistently in the "signed" format, making handling them easy. 3. Tweak the propagation code to not remove representation-changing casts, make it explicit and obvious in what cases (and why) it retypes the local under the cast. As a side-effect of the above changes, TYP_BOOL is now treated as TYP_UBYTE in propagation, as it should be. Actually, the old code didn't remove redundant casts to TYP_BOOL, the new code does, which is also where the bulk of the positive diffs come from. The old code also generated "useless" ranges like [INT_MIN..INT_MAX], the new code only generates ranges that could be used in subsequent propagation. Also, some miscellaneous cleanup, such as adding function headers, was performed.
In global assertion propagation, tentative assertions are created for casts, with the expectation that proving them via implication will allow for removing the cast. The old code did this through the common path with local assertions, however, this cannot be done the same way anymore because the two types of assertions are fundamnetally different. In local assertion propagation, subranges are derived from ASGs, where the cast is on the RHS and the local - LHS, so the "output" range for the cast must be used - local is assigned value the cast **produces**. In contrast, global subrange assertions say that the value the cast **consumes** is within the correct bounds for the cast to become a no-op or non-overflowing. Because of this fundamnetal distinction, I chose to make global assertions for casts in a new method instead of plumbing it through "optCreateAssertion". Note that it would be possible to do it via "optCreateAssertion" as well - global propagation always passes OAK_SUBRANGE to it, while local - OAK_EQUAL, but I decided against that option for the reason that it seems unintuitive to me that it would work the way it would. These tentative assertions are a very special case, so I think it is better to treat them that way. This leads to some minor code duplication, but I think that's ok. No diffs for this commit, though that's expected as "proving" the generated assertion is quite rare.
Make IntegralRange the owner of the logic for various methods relating to it and decouple it from AssertionDsc. IntegralRange will be useful in the future for other code that deals with cast removal, e. g. in fgMorphCast. Also some miscellaneous formatting fixups.
Using a static array results in smaller and more efficient code on all compilers.
This change is for completeness, since "fgMorphCast" decomposes casts from FP types to small types today and as such assertions are generated for them anyway. No diffs for this commit as expected.
To avoid regressions.
They did not handle "long" cases and as such were too conservative.
c7d8459
to
c513b9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Thanks again for your high-quality contributions.
This PR adds a new abstraction for dealing with casts,
IntegralRange
, a tuple of two symbolic values that represent the lower and upper bounds for an integral value. The symbolic representation has been chosen so that the range can be efficiently used on 32 hosts, since the alternative (using 64 bit integers) would have been, I feel, unacceptably inefficient. Notably, the range always interprets values produces by IR nodes as signed (since we're already using the signed semantic in a number of places, and it is easier to reason about for the purposes of casts as using unsigned representation would have meant a discontinuous range).Because assertion propagation was using integral ranges already, that's where
IntegralRange
is used in this PR. This enables reasoning about long checked casts on 32 bit hosts (rare scenario) and ensures the correctness of how the assertions are interpreted, since they're created for both the cast's node output (in local assertion propagation) and input (in global assertion propagation). It also allows for, in the future, shrinking the size of theAssertionDsc
structure. I note that since we're allocating a pretty sizeable array of them upfront in global assertion propagation (3K+ bytes), any savings in the size here translate into large percentages in memory allocation reduction (e. g., removing 8 bytes from theAssertionDsc
, I've observed -2.2% reduction, and I think we can reduce the size by at least 20 bytes), though this is somewhat misleading as the memory is only actually used for active assertions. Note that this PR does not alter the size ofAssertionDsc
, the addedif !defined(HOST_64BIT)
is for clarity only.Naturally, code propagating the assertions for casts has also been refactored, mostly to avoid retyping a
long
local toint
if it happened to have a range ofint
. Such retyping is redundant with whatoptNarrowTree
does already, and is in fact incorrect in global assertion propagation as it did not update the value number properly (leaving a node that hasTYP_INT
with a VN ofTYP_LONG
). Also, the retyping has been restricted to the known case where it is needed (and a comment added on why it is needed).An interesting property of the old code was how it treated
TYP_BOOL
.TYP_BOOL
in the compiler is pretty much aTYP_UBYTE
, but with a hint that it comes fromCORINFO_TYPE_BOOL
. Assertion generation treated assignments with aTYP_BOOL
source as those restricting the destination to a[0..1]
range, however, that is incorrect as per the IL semantics of booleans being treated as equivalent touint8
s. This did not have ill effects except for perhaps dropping a checked cast once in a while, but it was an internal inconsistency in the IR. This PR just makesTYP_BOOL
s equivalent toTYP_UBYTE
s for the purposes of assertion generation.(Note: the text below is an explanation for the quirk)
Another interesting property of the old code was that it did not propagate the assertion for casts to
TYP_BOOL
(perhaps hinting at the reason for why it was "safe" to giveTYP_BOOL
[0..1
] ranges in the first place...). The new code does, by virtue of not special casing it, and that is where the bulk of the diffs (win-x64
,win-x86
) come from.Unfortunately, there are, though a small number of, regressions with this handling of
TYP_BOOL
. There are two sources of them.First, there is the VN problem, it accounts for the vast majority of the regressions (missing CSEs + missed redundant branch elimination). Say we have the following situation:
Today, local assertion propagation removes the normalizing cast for
a is 1
. Then in VN we value number the local without the cast and the normalizing cast for the same local differently:The core issue here is that the VNs for "normalize-on-load" locals are fundamentally "wrong". In the example above, both
[000008]
and[000010]
have the same VN, but sayV00
's home was on the stack. The first tree would have produced 2 bytes of the value it was assigned, sign-extended, while the second would produce a fullint
, with the upper 2 bytes set to whatever happened to be next toV00
on the stack (this is how it actually works - the width of the local node signifies to the codegen how wide the load should be). Because of this representation problem, fixing this appears to be not so straightforward.One possible fix is to recognize, in VN, when a cast a redundant. That runs into regressions with "false CSEs", since we leave the cast tree intact, while in actuality it is redundant and CSEing such a tree has negative profitability (but CSE cannot know that because we leave the costs intact as well).
Another possible fix is to turn these casts into no-ops while renaming the uses (ideally we'd be doing that in early prop, but that's much more expensive TP-wise). That runs into regressions because, sometimes, removing a cast inhibits redundant branch optimization. Consider the following:
I have spent a considerable amount of time trying out different approaches to fixing the above issues, but they all ran into various regressions, so I tabled fixing it in this PR.
The second source of regressions is the fact that in lowering we can recognize
EQ(CAST(bool/ubyte <- int), SMALL_CONST)
and retype the tree toUBYTE
, leading to, effectively, containment for the cast. Expanding this to consider smallLCL_VAR
s is not so trivial as well, because it can regress when we have those locals assignedRSI
,RDI
,RBP
orRSP
, which take up more space to encode in their byte form on Amd64, not to mention we can end up with byte-able register constraints on x86.Due to the above issues and the fact that I was not able to mitigate them, I will be quirking the handling of
TYP_BOOL
in this PR to get to a near zero substantial regressions state.Fixes #54842, but properly.
Fixes #12936