Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Physical value numbering #68712

Merged
merged 17 commits into from
May 6, 2022
Merged

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Apr 29, 2022

The problem

As we know, value numbering disambiguates loads from and stores to fields using field handles as the selectors, essentially meaning that the aliasing model it has presumes no two field accesses of different handles will refer to the same location.

This assumption simply does not hold for struct fields in modern .NET, with people using methods like Unsafe.As more and more. It also does not match the compiler's own internal model well, where precise types are not a unit of type identity (rather, layout compatibility is). Essentially, the field sequence model is too "high level", both for a subset of user code and (in some ways, perhaps more importantly) the IR.

The latter mismatch has caused a good number of bugs, past and present, especially those induced by "zero-offset field sequences", which represent the extreme case of IR-VN mismatch:

#67360
#64805
#64701
#62687
#57282
#32085
dotnet/coreclr#23932
dotnet/coreclr#27108
dotnet/coreclr#20838

The solution

It is the case that the invalid (fragmented) field sequences can only be detected in value numbering itself, no other phase has the information necessary to recognize that this:

var a = ref array[0];

// Several blocks later

Use(Unsafe.As<A, B>(ref a).Field);

is a possibly partial sequence.

Thus, to address the correctness problem we will need to call back into the VM for each field sequence formed, to see if the parent field's type is the owner of the current field (what LocalAddressVisitor does now).

This would solve the correctness problems.

Unfortunately, it has two very significant drawbacks:

  1. It does not address the CQ issues associated with using symbols for inherently aliasable locations (e. g. zero-field sequences "polluting" VNs of addresses, even the simple cases of unions being completely "given up on" by numbering, and StructWithOneField.Field not being equivalent to *(FieldType*)&StructWithOneField).
  2. It will hide compiler bugs related to bad field sequence maintenance. Consider block morphing as an example: today, it will fail to attach proper sequences for a case like this: a[0] = *(ArrayElemType*)&promotedStruct;, where the LHS's type does not match the RHS's.

An alternative approach, implemented in this change, is to "lower" VN to the IR's level and start using physical selectors (offset and size) to select values instead of field handles. It has a good number of advantages:

  1. It is naturally very precise - handles unions, etc.
  2. It should allow us to delete the zero-offset field sequence map.
  3. It will also allow us to delete quite a bit of field sequence maintenance code.
  4. It is "correct by construction" when it comes to aliasing, matching the expectations user code can place on the compiler.
  5. It will free up one pointer-sized field on LCL_FLD nodes, enabling seamless TYP_STRUCT LCL_FLD.
  6. It allows for more equivalences to be discovered for struct values -- today, their VNs are implicitly tied to their precise types (handles), while the IR generally supports equivalence between "compatible" structs, which is a good thing CQ-wise.

Essentially, we get CQ, correctness, and simplify the compiler code outside of VN.

The disadvantages?

  1. Risk & review cost.
  2. Up to 0.2% TP degradation according to PIN for this initial implementation - once the field sequence code is removed, some of that will be gotten back.
  3. A more complex VN model.

I believe the advantages of this change far outweigh the drawbacks.

The implementation

We introduce a new kind of map - "the physical map" - to represent "simple" locations: object fields, static fields, array elements, as well as subsections of them. The field sequence will now be used only to get "the first field map" (and completely unused for array elements), the rest of the struct fields will be represented as "physical selectors": offsets relative to the location the map being updated represents plus the load/store size.

Physical loads will be represented as MapSelect(location, PhysicalSelector). Physical stores will be represented via a new function MapPhysicalStore, mirroring MapStore with the difference being that the selector is always "physical".

At selection time, we will traverse back through MapPhysicalStore, same as with the precise maps (the new term for the old map concept to differentiate it from the physical maps), only use more sophisticated logic to detect whether the selector for the value aliases any stores present. As with the precise maps, we will support map nesting, that is, it will be possible to MapPhysicalStore(parentMap, ..., MapPhysicalStore(...)).

We will also introduce a special kind of map, BitCast that represent the "identity" selection (i. e. that of the whole value), to satisfy our type system that exists outside the physical selection process, where it will be a no-op.

For some more details on the choices made in this design, see the updated comment at the top of valuenum.h.

The diffs

There are some diffs with change. There are two sources (+ a tiny number of diffs for the block morphing workaround).

The first is because we now number cases like below precisely:

N001 (  3,  4) [000463] ------------                        |  \--*  LCL_FLD   byref  V01 arg1         u:3[+0] Fseq[_reference] $287

N001 (  3,  4) [000330] ------------                        \--*  LCL_FLD   byref  V01 arg1         u:3[+0] Fseq[_reference, _value] $287

And so tend to CSE or copy propagate more. On 64 bit platforms that mostly results in better CQ, but on x86 in particular, "not so much". A variation of that is when we "consolidate" what used to be some number of CSEs into one, with a longer lifetime.

I believe it would be prohibitively expensive (in terms of throughput and code clarity) to quirk the behavior to match the old sequence-based one, so the only thing I can propose is to accept the diffs.

The second, a regression, is due to the fact the physical selection scheme does not handle things like *(int*)&long.MaxValue, i. e. selecting from a constant. I am not too worried about this because:

  1. It only shows up in test code.
  2. Support for it could be added easily enough, but I would like to do that separately (if we decide it's worth it at all) as part of adding constant folding to BitCast.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Apr 29, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 29, 2022
@ghost
Copy link

ghost commented Apr 29, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Let's see how unhappy is the CI.

I still need to update the documentation.

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@SingleAccretion SingleAccretion force-pushed the Physical-VN-Upstream branch 3 times, most recently from c9273b6 to 531f95f Compare May 1, 2022 15:16
@SingleAccretion
Copy link
Contributor Author

@dotnet/jit-contrib I would like to request outerloop, stress and libraries stress.

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented May 2, 2022

Outerloop and stress were clean except for known failures / Windows ARM/64 timeouts.

Libraries stress exposed an issue with the pre-existing bug, and there was also one failure I could not reproduce locally and that was not pre-existing, so will request another round of it.

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr libraries-jitstress, Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@SingleAccretion SingleAccretion marked this pull request as ready for review May 2, 2022 21:02
@SingleAccretion
Copy link
Contributor Author

Both libraries stress and Fuzzlyn failures look pre-existing.

Diffs.

@dotnet/jit-contrib

@jakobbotsch jakobbotsch self-assigned this May 3, 2022
@jakobbotsch jakobbotsch self-requested a review May 3, 2022 19:25
@sandreenko
Copy link
Contributor

wow, I was dreaming about deleting the whole old buggy VN for years!
Great work @SingleAccretion!

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @sandreenko above, this is fantastic work. I think the trade offs are more than reasonable, especially given future follow-ups. I wouldn't even say this is a more complex VN model -- while we now have both precise and physical maps, I think this representation is so much more obviously correct and easy to understand given the aliasing model.

I have a few questions:

  1. Are the new VNs costly in terms of memory? It does look like there are some obvious optimizations if they are (e.g. encoding physical selectors without creating a new VN for common small cases)
  2. Does this work mean that field sequences no longer need to be sequences/recursive?
  3. Is doesFieldBelongToClass still needed?

Also, thanks a lot for updating the documentation.

I want to run through a few examples locally for my own understanding, but other than that and a few documentation nits everything LGTM.

src/coreclr/jit/valuenum.h Outdated Show resolved Hide resolved
src/coreclr/jit/valuenum.h Outdated Show resolved Hide resolved
@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented May 5, 2022

Are the new VNs costly in terms of memory? It does look like there are some obvious optimizations if they are (e.g. encoding physical selectors without creating a new VN for common small cases)

Not that expensive I suppose:

# CG-ing CoreLib with a native compiler

./diff-mem x64
Relative difference is: 0.083%

./diff-mem x86
Relative difference is: -0.029% # Curiously enough, an improvement.

There is a lot of trickiness with the representation of physical selectors, I spent some time exploring various alternatives. There are two factors that make the current "naive" scheme more optimal than one would expect:

  1. It is fairly common that VNForMapSelectWork stops at the first store map (i. e. load just after the store), and currently that is fairly efficient requiring one selector encode and one VN comparison. E. g. encoding offset and size directly in MapPhysialStore turned out to be a regression (or at least not a clear win).
  2. There is a hot loop in loop hoisting (optVNIsLoopInvariant) where special-casing VNFuncs with non-VN arguments makes for measurable regressions.

I thought quite a bit about splitting the selection map on the physical / precise axis as well, introducing MapPhysicalSelect, but it would mean outlining the PHI selection logic, possibly templatizing it, and that did not seem like too great an idea for an already fairly extensive change. I may revisit that one day though. It is ultimately required for employing smarter physical selector encodings.

Does this work mean that field sequences no longer need to be sequences/recursive?

Correct, the "sequence" part of the field sequence will go. We will have "field offset" CNS_INT nodes, carrying class or static field handles (actually, pointers to FieldInfo), with the following reduction rules:

FIELD    + NO_FIELD => FIELD
NO_FIELD + FIELD    => FIELD
NO_FIELD + NO_FIELD => NO_FIELD
FIELD    + FIELD    => illegal 

There are some not-in-VN dependencies on field sequences now (e. g. gtGetStructHandle), they will be deleted.

Is doesFieldBelongToClass still needed?

Nope, it can (and should) be deleted. Not sure how soon will I get to that because it requires updating the Jit-EE interface.

@jakobbotsch
Copy link
Member

jakobbotsch commented May 6, 2022

Sounds great, I look forward to the follow-ups.

Ran through some more examples locally, nice to see us VN'ing field accesses through reinterpretations accurately now.

Thanks again for a great change!

[edit] just asked @SingleAccretion for a commit message to include with the change

@jakobbotsch jakobbotsch merged commit ed2472b into dotnet:main May 6, 2022
@SingleAccretion SingleAccretion deleted the Physical-VN-Upstream branch May 6, 2022 19:59
@sandreenko
Copy link
Contributor

@SingleAccretion just out of curiosity does the new VN support such cases:

long l = 0;
int i = Unsafe.As<long, int>(ref l);
if (i == 0) { // Always true.

}

?

@SingleAccretion
Copy link
Contributor Author

@sandreenko "it could but it doesn't".

Currently this will be selected as $VN.LngZero[0:3], reducing this to $VN.IntZero would involve adding something like below to VNForMapSelectWork:

     if (GetVNFunc(map, &funcApp))
     {
     }
+    else if (IsVNConstant(map))
+    {
+        assert(MapIsPhysical(map));
+        unsigned size;
+        unsigned offset = DecodePhysicalSelector(index, &size);
+    
+        assert(size == genTypeSize(type));
+        return VNEvalBitCast(type, map, offset); // This would be the function to implement.
+    }

Currently though, I believe there is not a lot of value in adding this reduction.

@sandreenko
Copy link
Contributor

@sandreenko "it could but it doesn't".

Currently this will be selected as $VN.LngZero[0:3], reducing this to $VN.IntZero would involve adding something like below to VNForMapSelectWork:

     if (GetVNFunc(map, &funcApp))
     {
     }
+    else if (IsVNConstant(map))
+    {
+        assert(MapIsPhysical(map));
+        unsigned size;
+        unsigned offset = DecodePhysicalSelector(index, &size);
+    
+        assert(size == genTypeSize(type));
+        return VNEvalBitCast(type, map, offset); // This would be the function to implement.
+    }

Currently though, I believe there is not a lot of value in adding this reduction.

I remember seeing many cases where a struct was initialized using zero-init where the old VN could not then figure out a field value. However, I possibly saw that often because there were many errors in such cases and not because such cases were often.

@SingleAccretion
Copy link
Contributor Author

I remember seeing many cases where a struct was initialized using zero-init where the old VN could not then figure out a field value. However, I possibly saw that often because there were many errors in such cases and not because such cases were often.

Note that structs are handled by the current implementation:

case VNF_ZeroObj:
assert(MapIsPhysical(map));
// TODO-CQ: support selection of TYP_STRUCT here.
if (type != TYP_STRUCT)
{
return VNZeroForType(type);
}
break;

So if you were to modify your example to something like this:

Guid a = default;

if (*(int*)&a) == 0) { ... }

That branch would get folded (after #68986 is merged, I suppose).

In general the physical selection mechanism is "arbitrarily precise", it will only "give up" on out-of-bounds loads or stores. So it can support TYP_BLK variables, using int locals as byte[4]-like arrays, etc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants