PGO work planned for .NET 8 #74873

jakobbotsch · 2022-08-31T14:53:20Z

This issue captures the planned work for .NET 8. This list is expected to change throughout the release cycle according to ongoing planning and discussions, with possible additions and subtractions to the scope. Note that we have not finished .NET 8 planning so most items are only under consideration.

Completed in .NET 8

Opportunistic for .NET 8

Deferred

JIT: annotate pseudo-edges in pgo schema and/or internal model #81638

category:planning
theme:profile-feedback
skill-level:expert
cost:large
impact:large

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2022-08-31T14:53:25Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2022-08-31T14:53:41Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This issue captures the planned work for .NET 8. This list is expected to change throughout the release cycle according to ongoing planning and discussions, with possible additions and subtractions to the scope. Note that we have not finished .NET 8 planning so most items are only under consideration.

Planned for .NET 8

(EgorBo in PR) PGO: Add new tiers #70941

Under Consideration

Array interface method devirtualization #62457 (too many missing pieces)
Fix issues in profile maintenance: JIT: PGO: fgOptimizeUncondBranchToSimpleCond corrupts profile data #50419, [JIT][PGO] Incorrect weights produced by expand-casts optimizations #50030, optOptimizeLayout introducing edge profile inconsistencies #48476
PGO considerations for finally cloning #48925
JIT: refactor edge weight representation #46885
JIT: use normalized counts throughout #46883
Reduce performance overhead of Tier0 instrumentation
Enhance GDV-like capability for indirect calls (optimization of indirect calls #44610)
JIT: optimize redundant type tests created by PGO #46887 (needs SSA update)
Enhance delegate GDV
- Instance methods on value types (delegates pointing to unboxing stubs)
- Static methods (delegates pointing to shuffle thunks)
- R2R support
- Loop cloning support

Author:	jakobbotsch
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

jakobbotsch · 2022-08-31T14:53:49Z

cc @dotnet/jit-contrib

EgorBo · 2022-09-01T17:35:31Z

I'd add context-sensitive instrumentation, I do think it should be beneficial

jakobbotsch · 2022-09-02T13:00:27Z

@EgorBo feel free to edit the issue with anything you can think of 🙂

Enable edge based profiles for OSR, partial compilation, and optimized plus instrumented cases. For OSR this requires deferring flow graph modifications until after we have built the initial probe list, so that the initial list reflects the entirety of the method. This set of candidate edge probes is thus the same no matter how the method is compiled. A given compile may schematize a subset of these probes and materialize a subset of what gets schematized; this is tolerated by the PGO mechanism provided that the initial instrumented jitting produces a schema which is a superset of the schema produced by any subsequent instrumented rejitting. This is normally the case. Partial compilation may still need some work to ensure full schematization but it is currently off by default. Will address this subsequently. For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind of probe relocation abilities that we have in the BlockCountInstrumentor. In particular we need to move probes that might appear in return blocks that follow implicit tail call blocks, since those return blocks must remain empty. The details on how we do this are a bit different but the idea is the same: we create duplicate copies of any probe that was going to appear in the return block and instead instrument each pred. If the pred reached the return via a critical edge, we split the edge and put the probe there. This analysis relies on cheap preds, so to ensure we can use them we move all the critial edge splitting so it happens before we need the cheap pred lists. The ability to do block profiling is retained but will no longer be used without special config settings. There were also a few bug fixes in the spanning tree visitor. It must visit a superset of the blocks we end up importing and was missing visits in some cases. This should improve jit time and code quality for instrumented code. Fixes dotnet#47942. Fixes dotnet#66101. Contributes to dotnet#74873.

Enable edge based profiles for OSR, partial compilation, and optimized plus instrumented cases. For OSR this requires deferring flow graph modifications until after we have built the initial probe list, so that the initial list reflects the entirety of the method. This set of candidate edge probes is thus the same no matter how the method is compiled. A given compile may schematize a subset of these probes and materialize a subset of what gets schematized; this is tolerated by the PGO mechanism provided that the initial instrumented jitting produces a schema which is a superset of the schema produced by any subsequent instrumented rejitting. This is normally the case. Partial compilation may still need some work to ensure full schematization but it is currently off by default. Will address this subsequently. For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind of probe relocation abilities that we have in the BlockCountInstrumentor. In particular we need to move probes that might appear in return blocks that follow implicit tail call blocks, since those return blocks must remain empty. The details on how we do this are a bit different but the idea is the same: we create duplicate copies of any probe that was going to appear in the return block and instead instrument each pred. If the pred reached the return via a critical edge, we split the edge and put the probe there. This analysis relies on cheap preds, so to ensure we can use them we move all the critial edge splitting so it happens before we need the cheap pred lists. The ability to do block profiling is retained but will no longer be used without special config settings. There were also a few bug fixes in the spanning tree visitor. It must visit a superset of the blocks we end up importing and was missing visits in some cases. This should improve jit time and code quality for instrumented code. Fixes #47942. Fixes #66101. Contributes to #74873.

AndyAyersMS · 2023-02-08T20:30:06Z

Rough plan of attack for profile maintenance: enable profile post-phase checks for likelihoods (see #81738) by default and fix issues that arise. To approach this incrementally, I will start by enabling the checks for just the first phase, then fix all those issues, then move the disabling point back one more phase, etc.

Depending on the amount of churn this causes I may commit this work via a series of interim PRs where the checks are partially enabled. Right now I am still working on getting things clean after the profile incorporation phase, so can't say now many PRs will make sense.

OSR + PGO is continuing to cause problems. We expect the OSR method to see an inconsistent profile, because the profile data (assuming it comes from Tier0) will quite likely represent partial method executions. I have been trying to fix these since the SPMI collection I'm using to drive the profile work includes lots of OSR instances, but I may decide to defer checking for these methods initially as well.

Inconsistencies can arise in all methods, not just OSR methods, either because of exceptions or because the Tier1 rejitting happens while the Tier0 method instances may still be live and executing. So we need to tolerate inconsistencies and handle them, while at the same time also spotting flaws and incorrect behavior from profile reconstruction and maintenance in cases that are (or should be) fully consistent.

The profile checks will mainly focus on fixing issues with our new edge likelihood representation; the idea is to get this checked and plausibly clean through the important phases and then remove the current manipulations of block and edge weights and have everything rely on the new weights. This change will be quite disruptive.

AndyAyersMS · 2023-02-09T00:39:37Z

Looking at current ASP.NET SPMI collection, we reconstruct edge profiles for 31161 root methods, and have inconsistencies right after reconstruction in 2297 of them, around 7.3%. The OSR subset of this has slightly higher rate of issues (131 out of 1354, 10.2%). Both these are higher than I'd expect. Possible explanations:

loss of counter updates because we're not doing interlocked adds
profile data set tearing when tier1 rejit accesses live tier0 counter data
bugs in instrumentation or reconstruction
un-modelled control flow (OSR transfers, exceptions)
version skew (older static profile read into method with newer IL stream)

Example problem from very simple method. Here there are two counters, one for the 3->4 edge and one for the pseudo 4->1 edge (which would end up in block 4).

Note because of placement of the sparse counters all we really know is that the internal counter was hit about more often than the method return counter. We deduce BB01's weight from BB04.

This example is reading dynamic PGO data so version skew should not be possible. Since there are just two counters, it is hard to imagine how Tier1 rejit copying them both asynchronously could lead to such a large count discrepancy. Doesn't seem too likely that 1% of all calls to this method would throw exceptions and so not return. Reconstruction is simple and not buggy. This is not an OSR method.

So the most plausible explanation from the list above is that we are seeing count update losses because the instrumentation probes are not doing interlocked updates (lock inc say). Might be worth creating a mode that does these more costly updates just to see if it really does explain the high numbers of inconsistent profiles.

Also worth noting that (in general) we are using a very tight acceptance criteria here and even in a case like this, counts are in relatively good shape (just off by a bit more than 1%, so fgProfileWeightsConsistent is flagging them). So it would also be interesting to see how the numbers of failures change as we adjust the leniency factor there up and down.

MichalPetryka · 2023-02-09T15:51:49Z

Are there any plans to default Tiered PGO to being enabled in .NET 8 or 9?

AndyAyersMS · 2023-02-09T21:00:22Z

Here is a more detailed breakdown of consistency issues seen in the current aspnet.run SPMI collection.

PGO	Mode	Methods	Inconsistent	Consistent	%Inconsistent
Dynamic PGO	Tier0-MinOpts	9	0	9	0%
	Tier1	25737	985	24752	4%
	Tier1-OSR	1396	142	1254	10%
Dynamic PGO Total		27142	1127	26015	4%
Static PGO	FullOpts	28	12	16	43%
	Instrumented Tier1	2000	487	1513	24%
	Tier0-MinOpts	2	0	2	0%
	Tier1	2494	685	1809	27%
Static PGO Total		4524	1184	3340	26%
Grand Total		31666	2311	29355	7%

Looks like consistency of static PGO is considerably worse than dynamic PGO.

AndyAyersMS · 2023-02-09T21:03:48Z

Are there any plans to default Tiered PGO to being enabled in .NET 8 or 9?

I would not say "plan", but it's certainly our ambition. Currently there are some rough edges still needing attention. If and when we get those addressed, we will make a case for it to be on by default.

AndyAyersMS · 2023-02-10T01:43:23Z

With lock add (see #81934) I get no consistency issues from a bespoke ASP.NET SPMI collection for any of the dynamic PGO methods. So this suggests loss of counter updates because of update races is the main culprit there (and possibly for static PGO too, though that is harder to test).

AndyAyersMS · 2023-03-03T19:26:22Z

Given the larger than expected consistency issues I have shifted my plan of attack and will focus first on profile synthesis and repair. When an initial PGO profile is read, if it is inconsistent, we'll use the repair code to patch it up.

First bit of this work appears in #82926.

AndyAyersMS · 2023-04-01T02:41:14Z

We collect microbenchmark data in the lab for PGO for windows x64. Here are the 50 worst performing benchmarks (all regressions) with PGO enabled (data from last 7 days, ratio of median result with pgo to median without, for benchmarks that run for at least 2ns)

Benchmark	Pgo/noPgo	Pgo	noPgo
PerfLabTests.CastingPerf.CheckArrayIsInterfaceNo	0.08	398849.04	31215.90
PerfLabTests.CastingPerf.CheckArrayIsInterfaceYes	0.08	374212.91	31223.53
PerfLabTests.LowLevelPerf.TypeReflectionObjectGetType	0.13	234851.64	31199.62
PerfLabTests.LowLevelPerf.TypeReflectionArrayGetType	0.13	233117.95	31200.07
PerfLabTests.CastingPerf.CheckIsInstAnyIsInterfaceNo	0.23	272851.55	62375.07
PerfLabTests.CastingPerf.CheckObjIsInterfaceNo	0.23	272819.54	62375.07
PerfLabTests.CastingPerf.CheckObjIsInterfaceYes	0.29	218302.32	62375.47
PerfLabTests.CastingPerf.CheckIsInstAnyIsInterfaceYes	0.29	218295.23	62375.70
PerfLabTests.LowLevelPerf.ForeachOverList100Elements	0.32	22945136.54	7379551.48
SIMD.ConsoleMandel.ScalarDoubleSinglethreadADT	0.35	3020632385.00	1061635800.00
System.Text.Json.Tests.Perf_Get.GetByte	0.38	1534.81	580.82
System.Text.Json.Tests.Perf_Get.GetUInt16	0.40	1535.19	609.48
System.Text.Json.Tests.Perf_Get.GetInt16	0.41	1566.20	649.24
System.Text.Json.Tests.Perf_Get.GetSByte	0.43	1630.13	705.13
System.Text.Json.Tests.Perf_Get.GetUInt32	0.49	1194.25	580.18
System.Text.Json.Tests.Perf_Get.GetInt32	0.50	1232.12	613.24
PerfLabTests.LowLevelPerf.GenericClassWithIntGenericInstanceField	0.50	62373.91	31194.27
System.Collections.IterateFor.ImmutableArray(Size: 512)	0.52	324.57	169.43
PerfLabTests.CastingPerf2.CastingPerf.ScalarValueTypeObj	0.63	444482.56	281009.89
PerfLabTests.CastingPerf.FooObjCastIfIsa	0.69	528580.17	363016.07
PerfLabTests.CastingPerf2.CastingPerf.FooObjCastIfIsa	0.69	532391.38	367533.28
System.Collections.Sort.Array(Size: 512)	0.69	6653.33	4606.52
System.IO.Tests.Perf_Directory.Exists	0.71	15955.69	11325.38
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, String>>.SerializeToWriter(Mode: SourceGen)	0.71	16719.32	11946.75
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, String>>.SerializeToStream(Mode: SourceGen)	0.72	17271.54	12376.91
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, String>>.SerializeToUtf8Bytes(Mode: SourceGen)	0.72	17532.93	12566.95
System.Collections.ContainsKeyTrue<Int32, Int32>.ConcurrentDictionary(Size: 512)	0.72	3844.47	2761.13
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, String>>.SerializeToString(Mode: SourceGen)	0.72	18159.52	13103.63
System.Formats.Tar.Tests.Perf_TarFile.CreateFromDirectory_Stream_Async	0.72	269463.63	195243.43
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, String>>.SerializeObjectProperty(Mode: SourceGen)	0.73	18248.82	13402.00
PerfLabTests.LowLevelPerf.ObjectStringIsString	0.75	125009.63	93555.58
LinqBenchmarks.Count00ForX	0.75	163504176.92	123252745.00
System.IO.Tests.Perf_Path.GetDirectoryName	0.76	72.90	55.55
System.Memory.Span.Clear(Size: 4)	0.76	3.36	2.57
System.Text.Json.Tests.Perf_Segment.ReadMultiSegmentSequence(segmentSize: 4096, TestCase: Json4KB)	0.78	9855.06	7690.25
MicroBenchmarks.Serializers.Xml_ToStream.XmlSerializer_	0.78	546014.06	427784.01
System.Formats.Tar.Tests.Perf_TarFile.CreateFromDirectory_Stream	0.79	181593.06	143366.95
System.Text.Json.Tests.Perf_Get.GetUInt64	0.79	771.31	609.72
Benchstone.BenchI.MulMatrix.Test	0.80	495584423.08	395630186.67
System.Collections.Sort.List(Size: 512)	0.83	6379.94	5270.91
Benchstone.BenchF.NewtR.Test	0.83	99796125.00	82667905.00
ByteMark.BenchLUDecomp	0.83	1522751180.00	1261430178.57
System.Text.Json.Document.Tests.Perf_DocumentParse.Parse(IsDataIndented: False, TestRandomAccess: True, TestCase: Json400KB)	0.83	7413636.67	6143975.93
System.Buffers.Tests.RentReturnArrayPoolTests.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: True)	0.83	512.94	426.19
PerfLabTests.LowLevelPerf.GenericGenericMethod	0.83	187104.80	156016.86
System.Memory.Span.SequenceCompareToDifferent(Size: 4)	0.83	5.91	4.93
System.Perf_Convert.FromBase64String	0.84	90.49	75.64
System.Text.Tests.Perf_StringBuilder.Append_Char(length: 100000)	0.84	199735.26	167488.12
System.Collections.IndexerSet.Dictionary(Size: 512)	0.84	4257.67	3577.62
System.Tests.Perf_Char.Char_ToLowerInvariant(input: "Hello World!")	0.85	15.15	12.81

EgorBo · 2023-04-03T22:06:34Z

@AndyAyersMS let me know if you want me to look at some of these on your choice

AndyAyersMS · 2023-04-04T15:44:35Z

@AndyAyersMS let me know if you want me to look at some of these on your choice

If you want to get a head start perhaps look into SIMD.ConsoleMandel.ScalarDoubleSinglethreadADT.

Otherwise let me reassess after I get a fix in for this first issue -- it seems pretty widespread. Still working on it.

AndyAyersMS · 2023-07-05T18:37:43Z

At this point we've wrapped up all the planned work on Dynamic PGO for .NET 8.

So closing this; we'll revisit some of the open items here in the next release's planning.

ghost added the untriaged New issue has not been triaged by the area owner label Aug 31, 2022

jakobbotsch added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels Aug 31, 2022

jakobbotsch added this to the 8.0.0 milestone Aug 31, 2022

JulieLeeMSFT added the User Story A single user-facing feature. Can be grouped under an epic. label Aug 31, 2022

BruceForstall mentioned this issue Sep 28, 2022

Remove GT_ADDR nodes #11057

Closed

JulieLeeMSFT assigned jakobbotsch Oct 5, 2022

JulieLeeMSFT assigned AndyAyersMS and unassigned jakobbotsch Jan 9, 2023

AndyAyersMS mentioned this issue Jan 11, 2023

JIT: Enable edge based profiles for all scenarios #80481

Merged

JulieLeeMSFT mentioned this issue Feb 8, 2023

What's new in .NET 8 Preview 1 dotnet/core#8133

Closed

3 tasks

AndyAyersMS mentioned this issue Apr 3, 2023

Investigate microbenchmarks that regress with PGO enabled #84264

Closed

AndyAyersMS added the Priority:3 Work that is nice to have label Jun 5, 2023

AndyAyersMS closed this as completed Jul 5, 2023

ghost locked as resolved and limited conversation to collaborators Aug 4, 2023

JulieLeeMSFT added this to .NET Core CodeGen Jun 5, 2024

JulieLeeMSFT moved this to Done in .NET Core CodeGen Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PGO work planned for .NET 8 #74873

PGO work planned for .NET 8 #74873

jakobbotsch commented Aug 31, 2022 •

edited by AndyAyersMS

Loading

dotnet-issue-labeler bot commented Aug 31, 2022

ghost commented Aug 31, 2022

Planned for .NET 8

Under Consideration

jakobbotsch commented Aug 31, 2022

EgorBo commented Sep 1, 2022

jakobbotsch commented Sep 2, 2022

AndyAyersMS commented Feb 8, 2023

AndyAyersMS commented Feb 9, 2023 •

edited

Loading

MichalPetryka commented Feb 9, 2023

AndyAyersMS commented Feb 9, 2023

AndyAyersMS commented Feb 9, 2023

AndyAyersMS commented Feb 10, 2023 •

edited

Loading

AndyAyersMS commented Mar 3, 2023

AndyAyersMS commented Apr 1, 2023

EgorBo commented Apr 3, 2023

AndyAyersMS commented Apr 4, 2023 •

edited

Loading

AndyAyersMS commented Jul 5, 2023

PGO work planned for .NET 8 #74873

PGO work planned for .NET 8 #74873

Comments

jakobbotsch commented Aug 31, 2022 • edited by AndyAyersMS Loading

Completed in .NET 8

Opportunistic for .NET 8

Deferred

dotnet-issue-labeler bot commented Aug 31, 2022

ghost commented Aug 31, 2022

Planned for .NET 8

Under Consideration

jakobbotsch commented Aug 31, 2022

EgorBo commented Sep 1, 2022

jakobbotsch commented Sep 2, 2022

AndyAyersMS commented Feb 8, 2023

AndyAyersMS commented Feb 9, 2023 • edited Loading

MichalPetryka commented Feb 9, 2023

AndyAyersMS commented Feb 9, 2023

AndyAyersMS commented Feb 9, 2023

AndyAyersMS commented Feb 10, 2023 • edited Loading

AndyAyersMS commented Mar 3, 2023

AndyAyersMS commented Apr 1, 2023

EgorBo commented Apr 3, 2023

AndyAyersMS commented Apr 4, 2023 • edited Loading

AndyAyersMS commented Jul 5, 2023

jakobbotsch commented Aug 31, 2022 •

edited by AndyAyersMS

Loading

AndyAyersMS commented Feb 9, 2023 •

edited

Loading

AndyAyersMS commented Feb 10, 2023 •

edited

Loading

AndyAyersMS commented Apr 4, 2023 •

edited

Loading