Theme: Improve startup and throughput using runtime execution information (PGO) #5491

richlander · 2020-10-29T21:19:39Z

Profile guided optimization is a standard feature of many development platforms. We would like to make it a standard feature of .NET. It is already a feature of .NET in that the platform is PGO-optimized, but it doesn't extend beyond that. PGO for .NET is a multi-release journey, with the goal of significantly improved startup and throughput performance and working set reduction using both automatic and opt-in mechanisms/scenarios.

Today, both the native runtime and managed libraries are PGO-compiled using training data that we produce ourselves. The native runtime uses standard C++ compiler tools. The managed libraries use a technology called "Instruction Block Count" (IBC). IBC is notoriously hard to use (only a few people on the .NET team can use it effectively). Going forward, we intend to produce new tools that are straightforward to use, for both training (that product PGO data) and PGO-optimized compilation (that consume PGO data).

A key split in the PGO domain is static versus dynamic PGO. We are interested in both, with the following prioritization/staging:

Enable optimizing ready-to-run images based on static PGO data.
Enable optimizing JITed code based on static PGO data, as a new feature of tiered compilation.
Enable optimizing JITed code based on PGO data collected (and not persisted) at runtime.

More context:

MPGO is a past attempt at achieving similar goals.
Other platforms, like Rust support PGO.

User stories:

PGO collection tools can be used to collect PGO data from apps in a variety of scenarios: local machine, VMs and containers. (Developers can use PGO collection tools to collect PGO data from apps in a variety of scenarios #5492)
People that build .NET from source can use existing PGO data (stored in a repo in the dotnet org) to optimize the .NET runtime and libraries.
.NET applications startup and run faster and use less memory, using static PGO data
- Ready to run images are optimized using PGO data, including when assemblies are trimmed (which currently breaks the existing PGO system).
- The JIT can use static PGO data, either when read-to-run images are not available, or when tiering from tier 0 to tier 1. The JIT should produce better code than what is available in ready-to-run images.
.NET applications are progressively optimized at runtime, using dynamic PGO data, for example by inlining devirtualized virtual methods. (Developers can progressively optimize their .NET apps at runtime, using dynamic PGO data #5494)

danmoseley · 2021-04-05T21:17:03Z

@billwert @DrewScoggins @davidwrighton will our microbenchmark data continue to be gathered on pre-PGO bits, or on PGOized bits? My assumption is the former, because (1) PGO may introduce unexpected "regressions" and "improvements" when training data is updated, which would hide real regressions (2) PGO is not designed to improve microbenchmarks, in general. cc @stephentoub

I recall working on VS that when optimization data changed (or grew stale), perf moved around - but in that case, we were measuring E2E scenarios, which is what the optimization was specifically for, and it was valuable to know that e.g, the data had become stale.

DrewScoggins · 2021-04-06T17:45:00Z

My understanding is that when you do a normal release build you get PGO data applied. This is the pattern that we follow. I know that we do not use the JIT PGO opt-out currently. I know that in the past we have been able to see the signs of stale PGO data in microbenchmark numbers, but I will admit I am not an expert in this space.

danmoseley · 2021-04-06T18:30:39Z

I see. I am interested in others' thoughts
Cc @jeffschwMSFT @AndyAyersMS

In my mind it is a non goal for micro benchmarks results to closely track customer experience but rather the goal is to reliably and promptly detect regressions caused by changes in the implementation, by keeping as many other factors constant as possible. Perhaps I'm thinking about it wrongly.

jeffschwMSFT · 2021-04-06T18:35:14Z

I would expect to use our previous release as a means to detect regression.

danmoseley · 2021-04-06T18:37:39Z

I was talking about day to day measurements, but good point: it's probably inevitable that any GA baseline is PGO bits.

jeffschwMSFT · 2021-04-06T18:40:46Z

Even for day to day, I would only measure our shipping scenario (eg. pgo) and then compare to previous release for regression/improvement.

AndyAyersMS · 2021-04-06T18:43:32Z

Ideally we'd track both PGO and non-PGO performance anywhere we care to make measurements. This accomplishes a few things:

makes it easier to spot PGO-only regressions/improvements during triage
if aggregated, gives us an overall figure of merit for the impact of PGO (which, if stable, can also be used to detect stale PGO)

For local benchmarking disabling PGO may give more reliable results (as PGO data may not apply to changed code).

For BDN style measurements it is simple to disable PGO via a runtime option (COMPlus_JitDisablePgo=1), since all the code we measure is jitted. For end-to-end scenarios it is harder to measure non-PGO; PGO alters the R2R contents and so non-PGO measurements would require a special build.

Given resource constraints we can't measure both PGO and non-PGO all the time. Since the product default will (likely) be to use PGO, we should measure and focus on that as our primary metric, and try and measure non-PGO if/when we can -- maybe 1x day or so would be sufficient.

danmoseley · 2021-04-06T18:51:07Z

Sounds good. We should maybe document the PGO disable flag in our perf workflow if we don't already.
Cc @adamsitnik

billwert · 2021-04-07T19:16:30Z

For BDN style measurements it is simple to disable PGO via a runtime option (COMPlus_JitDisablePgo=1), since all the code we measure is jitted.

This is not true. The vast majority of the code we run through microbenchmark testing with BDN is library code, which is all precompiled and would have PGO applied. We would need to build the product in a release configuration without PGO applied to effectively test it in this manner.

billwert · 2021-04-07T19:21:24Z

Perf measurements on non-PGO builds in general are of questionable value to me. For scenario (startup, size, etc) we count on PGO to fix pages and improve page density. Unoptimized binaries are also going to be somewhat more random day over day as code is added and removed, which can lead to unexplained changes in the non-PGO trend. This decreases the usefulness.

For microbenchmark style testing, non-PGO perhaps makes more sense. (At least until PGO starts doing heavy lifting on block rearrangement, intra-method splitting, etc.) (Maybe it already does?) However, let me pose a question: if the non-PGO trend moves (and it's not due to a random effect) and the PGO trend does not, what do you do? There's nothing to "fix" from the customer perspective.

AndyAyersMS · 2021-04-07T21:27:18Z

The vast majority of the code we run through microbenchmark testing with BDN is library code, which is all precompiled and would have PGO applied

That would be true if we measured BDN perf with TC disabled. But when TC is on, we will only ever be measuring the perf of jitted code.

At least until PGO starts doing heavy lifting on block rearrangement, intra-method splitting, etc.) (Maybe it already does?)

Block arrangement yes, splitting, not yet -- though it is opportunistic for 6.0 (but not likely, I would guess).

Perf measurements on non-PGO builds in general are of questionable value to me.

Despite what I wrote above, I agree with that, for non-BDN scenarios. Especially so if we have some independent measure of "work" like instructions retired to look at (we seems to have lost that ability recently on windows, and have never had it elsewhere).

For scenario (startup, size, etc) we count on PGO to fix pages and improve page density.

We don't do those things (yet) for R2R images. Hopefully we get to it soon.

let me pose a question: if the non-PGO trend moves (and it's not due to a random effect) and the PGO trend does not, what do you do?

We only use the non-PGO results for diagnostic purposes. Which is why we only run them ever so often.

If both PGO and non-PGO regress then the issue is likely not a PGO issue (so we look at libraries, jit, runtime commits). If only PGO regresses the issue is with the PGO data or the way the jit interprets the PGO data (so we look at PGO data age/collection, and jit commits). If only non-PGO regresses then we don't worry about it as much (provided instructions retired data can verify non-PGO perf difference was purely in clocks and not in instructions).

AndyAyersMS · 2021-04-07T21:33:05Z

Not to muddy the waters any further, but it would also be nice to start measuring dynamic PGO perf on a regular basis. I do this on a somewhat ad-hoc basis currently for parts of TechEmpower.

So I'd suggest:

frequent, dense collection of perf for the default case: static PGO. Triaged regularly. (we do this already)
infrequent, sparse collection of perf for dynamic PGO. Triaged less regularly. (new)
infrequent, sparse collection of perf for non-PGO. Used only during triage of the above two. (new)

billwert · 2021-04-07T21:33:52Z

That would be true if we measured BDN perf with TC disabled. But when TC is on, we will only ever be measuring the perf of jitted code.

This is something I didn't appreciate. My belief was that R2R would be used even if TC was on, and R2R would be the best code we could make. That of course changes things. (of course that's not true re: indirections and lots of other things. I just didn't think it through.)

mairaw · 2023-05-26T19:14:33Z

Bulk closing .NET 6 epics and user stories. If you think this issue was closed in error, please reopen the issue and update it accordingly.

richlander added the Epic Groups multiple user stories. Can be grouped under a theme. label Oct 29, 2020

richlander changed the title ~~Improve startup and throughput using runtime execution information (PGO)~~ Epic: Improve startup and throughput using runtime execution information (PGO) Oct 29, 2020

richlander mentioned this issue Oct 29, 2020

Theme: .NET is recognized as a compelling framework for building cloud native apps #5397

Closed

11 tasks

richlander added Theme Groups multiple epics. and removed Epic Groups multiple user stories. Can be grouped under a theme. labels Oct 29, 2020

richlander changed the title ~~Epic: Improve startup and throughput using runtime execution information (PGO)~~ Theme: Improve startup and throughput using runtime execution information (PGO) Oct 29, 2020

danmoseley assigned richlander and jeffschwMSFT Nov 24, 2020

jeffschwMSFT added the Team:Runtime label Dec 2, 2020

richlander added Priority:1 Work that is critical for the release, but we could probably ship without Priority:2 Work that is important, but not critical for the release and removed Priority:1 Work that is critical for the release, but we could probably ship without labels Jan 21, 2021

jkotas mentioned this issue Feb 26, 2021

net6 plan to support aot? dotnet/runtimelab#748

Closed

mairaw added the bulk-closed label May 26, 2023

mairaw closed this as completed May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Theme: Improve startup and throughput using runtime execution information (PGO) #5491

Theme: Improve startup and throughput using runtime execution information (PGO) #5491

richlander commented Oct 29, 2020 •

edited by terrajobst

Loading

danmoseley commented Apr 5, 2021 •

edited

Loading

DrewScoggins commented Apr 6, 2021

danmoseley commented Apr 6, 2021

jeffschwMSFT commented Apr 6, 2021

danmoseley commented Apr 6, 2021

jeffschwMSFT commented Apr 6, 2021

AndyAyersMS commented Apr 6, 2021

danmoseley commented Apr 6, 2021

billwert commented Apr 7, 2021

billwert commented Apr 7, 2021

AndyAyersMS commented Apr 7, 2021

AndyAyersMS commented Apr 7, 2021

billwert commented Apr 7, 2021 •

edited

Loading

mairaw commented May 26, 2023

Theme: Improve startup and throughput using runtime execution information (PGO) #5491

Theme: Improve startup and throughput using runtime execution information (PGO) #5491

Comments

richlander commented Oct 29, 2020 • edited by terrajobst Loading

danmoseley commented Apr 5, 2021 • edited Loading

DrewScoggins commented Apr 6, 2021

danmoseley commented Apr 6, 2021

jeffschwMSFT commented Apr 6, 2021

danmoseley commented Apr 6, 2021

jeffschwMSFT commented Apr 6, 2021

AndyAyersMS commented Apr 6, 2021

danmoseley commented Apr 6, 2021

billwert commented Apr 7, 2021

billwert commented Apr 7, 2021

AndyAyersMS commented Apr 7, 2021

AndyAyersMS commented Apr 7, 2021

billwert commented Apr 7, 2021 • edited Loading

mairaw commented May 26, 2023

richlander commented Oct 29, 2020 •

edited by terrajobst

Loading

danmoseley commented Apr 5, 2021 •

edited

Loading

billwert commented Apr 7, 2021 •

edited

Loading