-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Physical value numbering #68712
Physical value numbering #68712
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsLet's see how unhappy is the CI. I still need to update the documentation.
|
c9273b6
to
531f95f
Compare
@dotnet/jit-contrib I would like to request outerloop, stress and libraries stress. |
/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 3 pipeline(s). |
We must know precise load sizes for numbering purposes.
It was breaking the "precise maps only have memory types" invariant.
Not needed in the "identity stores are bitcasts" scheme.
10 diffs across CLR and libraries tests, all correctness fixes.
531f95f
to
d1d5aea
Compare
Outerloop and stress were clean except for known failures / Windows ARM/64 timeouts. Libraries stress exposed an issue with the pre-existing bug, and there was also one failure I could not reproduce locally and that was not pre-existing, so will request another round of it. |
/azp run runtime-coreclr libraries-jitstress, Fuzzlyn |
Azure Pipelines successfully started running 2 pipeline(s). |
Both libraries stress and Fuzzlyn failures look pre-existing. @dotnet/jit-contrib |
wow, I was dreaming about deleting the whole old buggy VN for years! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @sandreenko above, this is fantastic work. I think the trade offs are more than reasonable, especially given future follow-ups. I wouldn't even say this is a more complex VN model -- while we now have both precise and physical maps, I think this representation is so much more obviously correct and easy to understand given the aliasing model.
I have a few questions:
- Are the new VNs costly in terms of memory? It does look like there are some obvious optimizations if they are (e.g. encoding physical selectors without creating a new VN for common small cases)
- Does this work mean that field sequences no longer need to be sequences/recursive?
- Is
doesFieldBelongToClass
still needed?
Also, thanks a lot for updating the documentation.
I want to run through a few examples locally for my own understanding, but other than that and a few documentation nits everything LGTM.
Not that expensive I suppose: # CG-ing CoreLib with a native compiler
./diff-mem x64
Relative difference is: 0.083%
./diff-mem x86
Relative difference is: -0.029% # Curiously enough, an improvement. There is a lot of trickiness with the representation of physical selectors, I spent some time exploring various alternatives. There are two factors that make the current "naive" scheme more optimal than one would expect:
I thought quite a bit about splitting the selection map on the physical / precise axis as well, introducing
Correct, the "sequence" part of the field sequence will go. We will have "field offset"
There are some not-in-VN dependencies on field sequences now (e. g.
Nope, it can (and should) be deleted. Not sure how soon will I get to that because it requires updating the Jit-EE interface. |
Sounds great, I look forward to the follow-ups. Ran through some more examples locally, nice to see us VN'ing field accesses through reinterpretations accurately now. Thanks again for a great change! [edit] just asked @SingleAccretion for a commit message to include with the change |
@SingleAccretion just out of curiosity does the new VN support such cases:
? |
@sandreenko "it could but it doesn't". Currently this will be selected as if (GetVNFunc(map, &funcApp))
{
}
+ else if (IsVNConstant(map))
+ {
+ assert(MapIsPhysical(map));
+ unsigned size;
+ unsigned offset = DecodePhysicalSelector(index, &size);
+
+ assert(size == genTypeSize(type));
+ return VNEvalBitCast(type, map, offset); // This would be the function to implement.
+ } Currently though, I believe there is not a lot of value in adding this reduction. |
I remember seeing many cases where a struct was initialized using zero-init where the old VN could not then figure out a field value. However, I possibly saw that often because there were many errors in such cases and not because such cases were often. |
Note that structs are handled by the current implementation: runtime/src/coreclr/jit/valuenum.cpp Lines 2589 to 2597 in 51d51c1
So if you were to modify your example to something like this: Guid a = default;
if (*(int*)&a) == 0) { ... } That branch would get folded (after #68986 is merged, I suppose). In general the physical selection mechanism is "arbitrarily precise", it will only "give up" on out-of-bounds loads or stores. So it can support |
The problem
As we know, value numbering disambiguates loads from and stores to fields using field handles as the selectors, essentially meaning that the aliasing model it has presumes no two field accesses of different handles will refer to the same location.
This assumption simply does not hold for struct fields in modern .NET, with people using methods like
Unsafe.As
more and more. It also does not match the compiler's own internal model well, where precise types are not a unit of type identity (rather, layout compatibility is). Essentially, the field sequence model is too "high level", both for a subset of user code and (in some ways, perhaps more importantly) the IR.The latter mismatch has caused a good number of bugs, past and present, especially those induced by "zero-offset field sequences", which represent the extreme case of IR-VN mismatch:
#67360
#64805
#64701
#62687
#57282
#32085
dotnet/coreclr#23932
dotnet/coreclr#27108
dotnet/coreclr#20838
The solution
It is the case that the invalid (fragmented) field sequences can only be detected in value numbering itself, no other phase has the information necessary to recognize that this:
is a possibly partial sequence.
Thus, to address the correctness problem we will need to call back into the VM for each field sequence formed, to see if the parent field's type is the owner of the current field (what
LocalAddressVisitor
does now).This would solve the correctness problems.
Unfortunately, it has two very significant drawbacks:
StructWithOneField.Field
not being equivalent to*(FieldType*)&StructWithOneField
).a[0] = *(ArrayElemType*)&promotedStruct;
, where the LHS's type does not match the RHS's.An alternative approach, implemented in this change, is to "lower" VN to the IR's level and start using physical selectors (offset and size) to select values instead of field handles. It has a good number of advantages:
LCL_FLD
nodes, enabling seamlessTYP_STRUCT
LCL_FLD
.Essentially, we get CQ, correctness, and simplify the compiler code outside of VN.
The disadvantages?
0.2%
TP degradation according to PIN for this initial implementation - once the field sequence code is removed, some of that will be gotten back.I believe the advantages of this change far outweigh the drawbacks.
The implementation
We introduce a new kind of map - "the physical map" - to represent "simple" locations: object fields, static fields, array elements, as well as subsections of them. The field sequence will now be used only to get "the first field map" (and completely unused for array elements), the rest of the struct fields will be represented as "physical selectors": offsets relative to the location the map being updated represents plus the load/store size.
Physical loads will be represented as
MapSelect(location, PhysicalSelector)
. Physical stores will be represented via a new functionMapPhysicalStore
, mirroringMapStore
with the difference being that the selector is always "physical".At selection time, we will traverse back through
MapPhysicalStore
, same as with the precise maps (the new term for the old map concept to differentiate it from the physical maps), only use more sophisticated logic to detect whether the selector for the value aliases any stores present. As with the precise maps, we will support map nesting, that is, it will be possible toMapPhysicalStore(parentMap, ..., MapPhysicalStore(...))
.We will also introduce a special kind of map,
BitCast
that represent the "identity" selection (i. e. that of the whole value), to satisfy our type system that exists outside the physical selection process, where it will be a no-op.For some more details on the choices made in this design, see the updated comment at the top of
valuenum.h
.The diffs
There are some diffs with change. There are two sources (+ a tiny number of diffs for the block morphing workaround).
The first is because we now number cases like below precisely:
And so tend to CSE or copy propagate more. On 64 bit platforms that mostly results in better CQ, but on x86 in particular, "not so much". A variation of that is when we "consolidate" what used to be some number of CSEs into one, with a longer lifetime.
I believe it would be prohibitively expensive (in terms of throughput and code clarity) to quirk the behavior to match the old sequence-based one, so the only thing I can propose is to accept the diffs.
The second, a regression, is due to the fact the physical selection scheme does not handle things like
*(int*)&long.MaxValue
, i. e. selecting from a constant. I am not too worried about this because:BitCast
.