-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGO work planned for .NET 8 #74873
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis issue captures the planned work for .NET 8. This list is expected to change throughout the release cycle according to ongoing planning and discussions, with possible additions and subtractions to the scope. Note that we have not finished .NET 8 planning so most items are only under consideration. Planned for .NET 8
Under Consideration
|
cc @dotnet/jit-contrib |
I'd add context-sensitive instrumentation, I do think it should be beneficial |
@EgorBo feel free to edit the issue with anything you can think of 🙂 |
Enable edge based profiles for OSR, partial compilation, and optimized plus instrumented cases. For OSR this requires deferring flow graph modifications until after we have built the initial probe list, so that the initial list reflects the entirety of the method. This set of candidate edge probes is thus the same no matter how the method is compiled. A given compile may schematize a subset of these probes and materialize a subset of what gets schematized; this is tolerated by the PGO mechanism provided that the initial instrumented jitting produces a schema which is a superset of the schema produced by any subsequent instrumented rejitting. This is normally the case. Partial compilation may still need some work to ensure full schematization but it is currently off by default. Will address this subsequently. For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind of probe relocation abilities that we have in the BlockCountInstrumentor. In particular we need to move probes that might appear in return blocks that follow implicit tail call blocks, since those return blocks must remain empty. The details on how we do this are a bit different but the idea is the same: we create duplicate copies of any probe that was going to appear in the return block and instead instrument each pred. If the pred reached the return via a critical edge, we split the edge and put the probe there. This analysis relies on cheap preds, so to ensure we can use them we move all the critial edge splitting so it happens before we need the cheap pred lists. The ability to do block profiling is retained but will no longer be used without special config settings. There were also a few bug fixes in the spanning tree visitor. It must visit a superset of the blocks we end up importing and was missing visits in some cases. This should improve jit time and code quality for instrumented code. Fixes dotnet#47942. Fixes dotnet#66101. Contributes to dotnet#74873.
Enable edge based profiles for OSR, partial compilation, and optimized plus instrumented cases. For OSR this requires deferring flow graph modifications until after we have built the initial probe list, so that the initial list reflects the entirety of the method. This set of candidate edge probes is thus the same no matter how the method is compiled. A given compile may schematize a subset of these probes and materialize a subset of what gets schematized; this is tolerated by the PGO mechanism provided that the initial instrumented jitting produces a schema which is a superset of the schema produced by any subsequent instrumented rejitting. This is normally the case. Partial compilation may still need some work to ensure full schematization but it is currently off by default. Will address this subsequently. For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind of probe relocation abilities that we have in the BlockCountInstrumentor. In particular we need to move probes that might appear in return blocks that follow implicit tail call blocks, since those return blocks must remain empty. The details on how we do this are a bit different but the idea is the same: we create duplicate copies of any probe that was going to appear in the return block and instead instrument each pred. If the pred reached the return via a critical edge, we split the edge and put the probe there. This analysis relies on cheap preds, so to ensure we can use them we move all the critial edge splitting so it happens before we need the cheap pred lists. The ability to do block profiling is retained but will no longer be used without special config settings. There were also a few bug fixes in the spanning tree visitor. It must visit a superset of the blocks we end up importing and was missing visits in some cases. This should improve jit time and code quality for instrumented code. Fixes #47942. Fixes #66101. Contributes to #74873.
Rough plan of attack for profile maintenance: enable profile post-phase checks for likelihoods (see #81738) by default and fix issues that arise. To approach this incrementally, I will start by enabling the checks for just the first phase, then fix all those issues, then move the disabling point back one more phase, etc. Depending on the amount of churn this causes I may commit this work via a series of interim PRs where the checks are partially enabled. Right now I am still working on getting things clean after the profile incorporation phase, so can't say now many PRs will make sense. OSR + PGO is continuing to cause problems. We expect the OSR method to see an inconsistent profile, because the profile data (assuming it comes from Tier0) will quite likely represent partial method executions. I have been trying to fix these since the SPMI collection I'm using to drive the profile work includes lots of OSR instances, but I may decide to defer checking for these methods initially as well. Inconsistencies can arise in all methods, not just OSR methods, either because of exceptions or because the Tier1 rejitting happens while the Tier0 method instances may still be live and executing. So we need to tolerate inconsistencies and handle them, while at the same time also spotting flaws and incorrect behavior from profile reconstruction and maintenance in cases that are (or should be) fully consistent. The profile checks will mainly focus on fixing issues with our new edge likelihood representation; the idea is to get this checked and plausibly clean through the important phases and then remove the current manipulations of block and edge weights and have everything rely on the new weights. This change will be quite disruptive. |
Looking at current ASP.NET SPMI collection, we reconstruct edge profiles for 31161 root methods, and have inconsistencies right after reconstruction in 2297 of them, around 7.3%. The OSR subset of this has slightly higher rate of issues (131 out of 1354, 10.2%). Both these are higher than I'd expect. Possible explanations:
Example problem from very simple method. Here there are two counters, one for the 3->4 edge and one for the pseudo 4->1 edge (which would end up in block 4). Note because of placement of the sparse counters all we really know is that the internal counter was hit about more often than the method return counter. We deduce BB01's weight from BB04. This example is reading dynamic PGO data so version skew should not be possible. Since there are just two counters, it is hard to imagine how Tier1 rejit copying them both asynchronously could lead to such a large count discrepancy. Doesn't seem too likely that 1% of all calls to this method would throw exceptions and so not return. Reconstruction is simple and not buggy. This is not an OSR method. So the most plausible explanation from the list above is that we are seeing count update losses because the instrumentation probes are not doing interlocked updates ( Also worth noting that (in general) we are using a very tight acceptance criteria here and even in a case like this, counts are in relatively good shape (just off by a bit more than 1%, so |
Are there any plans to default Tiered PGO to being enabled in .NET 8 or 9? |
Here is a more detailed breakdown of consistency issues seen in the current
Looks like consistency of static PGO is considerably worse than dynamic PGO. |
I would not say "plan", but it's certainly our ambition. Currently there are some rough edges still needing attention. If and when we get those addressed, we will make a case for it to be on by default. |
With |
Given the larger than expected consistency issues I have shifted my plan of attack and will focus first on profile synthesis and repair. When an initial PGO profile is read, if it is inconsistent, we'll use the repair code to patch it up. First bit of this work appears in #82926. |
We collect microbenchmark data in the lab for PGO for windows x64. Here are the 50 worst performing benchmarks (all regressions) with PGO enabled (data from last 7 days, ratio of median result with pgo to median without, for benchmarks that run for at least 2ns)
|
@AndyAyersMS let me know if you want me to look at some of these on your choice |
If you want to get a head start perhaps look into Otherwise let me reassess after I get a fix in for this first issue -- it seems pretty widespread. Still working on it. |
At this point we've wrapped up all the planned work on Dynamic PGO for .NET 8. So closing this; we'll revisit some of the open items here in the next release's planning. |
This issue captures the planned work for .NET 8. This list is expected to change throughout the release cycle according to ongoing planning and discussions, with possible additions and subtractions to the scope. Note that we have not finished .NET 8 planning so most items are only under consideration.
Completed in .NET 8
fgReplaceJumpTarget
to maintain pred lists #81246Interlocked
profiling mode and see if it helps steady state perf.Opportunistic for .NET 8
Deferred
category:planning
theme:profile-feedback
skill-level:expert
cost:large
impact:large
The text was updated successfully, but these errors were encountered: