Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Theme: Improve startup and throughput using runtime execution information (PGO) #5491

Closed
4 tasks
richlander opened this issue Oct 29, 2020 · 14 comments
Closed
4 tasks
Assignees
Labels
bulk-closed Priority:2 Work that is important, but not critical for the release Team:Runtime Theme Groups multiple epics.

Comments

@richlander
Copy link
Member

richlander commented Oct 29, 2020

Profile guided optimization is a standard feature of many development platforms. We would like to make it a standard feature of .NET. It is already a feature of .NET in that the platform is PGO-optimized, but it doesn't extend beyond that. PGO for .NET is a multi-release journey, with the goal of significantly improved startup and throughput performance and working set reduction using both automatic and opt-in mechanisms/scenarios.

Today, both the native runtime and managed libraries are PGO-compiled using training data that we produce ourselves. The native runtime uses standard C++ compiler tools. The managed libraries use a technology called "Instruction Block Count" (IBC). IBC is notoriously hard to use (only a few people on the .NET team can use it effectively). Going forward, we intend to produce new tools that are straightforward to use, for both training (that product PGO data) and PGO-optimized compilation (that consume PGO data).

A key split in the PGO domain is static versus dynamic PGO. We are interested in both, with the following prioritization/staging:

  • Enable optimizing ready-to-run images based on static PGO data.
  • Enable optimizing JITed code based on static PGO data, as a new feature of tiered compilation.
  • Enable optimizing JITed code based on PGO data collected (and not persisted) at runtime.

More context:

User stories:

  • PGO collection tools can be used to collect PGO data from apps in a variety of scenarios: local machine, VMs and containers. (Developers can use PGO collection tools to collect PGO data from apps in a variety of scenarios #5492)
  • People that build .NET from source can use existing PGO data (stored in a repo in the dotnet org) to optimize the .NET runtime and libraries.
  • .NET applications startup and run faster and use less memory, using static PGO data
    • Ready to run images are optimized using PGO data, including when assemblies are trimmed (which currently breaks the existing PGO system).
    • The JIT can use static PGO data, either when read-to-run images are not available, or when tiering from tier 0 to tier 1. The JIT should produce better code than what is available in ready-to-run images.
  • .NET applications are progressively optimized at runtime, using dynamic PGO data, for example by inlining devirtualized virtual methods. (Developers can progressively optimize their .NET apps at runtime, using dynamic PGO data #5494)
@richlander richlander added the Epic Groups multiple user stories. Can be grouped under a theme. label Oct 29, 2020
@richlander richlander changed the title Improve startup and throughput using runtime execution information (PGO) Epic: Improve startup and throughput using runtime execution information (PGO) Oct 29, 2020
@richlander richlander added Theme Groups multiple epics. and removed Epic Groups multiple user stories. Can be grouped under a theme. labels Oct 29, 2020
@richlander richlander changed the title Epic: Improve startup and throughput using runtime execution information (PGO) Theme: Improve startup and throughput using runtime execution information (PGO) Oct 29, 2020
@richlander richlander added Priority:1 Work that is critical for the release, but we could probably ship without Priority:2 Work that is important, but not critical for the release and removed Priority:1 Work that is critical for the release, but we could probably ship without labels Jan 21, 2021
@danmoseley
Copy link
Member

danmoseley commented Apr 5, 2021

@billwert @DrewScoggins @davidwrighton will our microbenchmark data continue to be gathered on pre-PGO bits, or on PGOized bits? My assumption is the former, because (1) PGO may introduce unexpected "regressions" and "improvements" when training data is updated, which would hide real regressions (2) PGO is not designed to improve microbenchmarks, in general. cc @stephentoub

I recall working on VS that when optimization data changed (or grew stale), perf moved around - but in that case, we were measuring E2E scenarios, which is what the optimization was specifically for, and it was valuable to know that e.g, the data had become stale.

@DrewScoggins
Copy link
Member

My understanding is that when you do a normal release build you get PGO data applied. This is the pattern that we follow. I know that we do not use the JIT PGO opt-out currently. I know that in the past we have been able to see the signs of stale PGO data in microbenchmark numbers, but I will admit I am not an expert in this space.

@danmoseley
Copy link
Member

I see. I am interested in others' thoughts
Cc @jeffschwMSFT @AndyAyersMS

In my mind it is a non goal for micro benchmarks results to closely track customer experience but rather the goal is to reliably and promptly detect regressions caused by changes in the implementation, by keeping as many other factors constant as possible. Perhaps I'm thinking about it wrongly.

@jeffschwMSFT
Copy link
Member

I would expect to use our previous release as a means to detect regression.

@danmoseley
Copy link
Member

I was talking about day to day measurements, but good point: it's probably inevitable that any GA baseline is PGO bits.

@jeffschwMSFT
Copy link
Member

Even for day to day, I would only measure our shipping scenario (eg. pgo) and then compare to previous release for regression/improvement.

@AndyAyersMS
Copy link
Member

Ideally we'd track both PGO and non-PGO performance anywhere we care to make measurements. This accomplishes a few things:

  • makes it easier to spot PGO-only regressions/improvements during triage
  • if aggregated, gives us an overall figure of merit for the impact of PGO (which, if stable, can also be used to detect stale PGO)

For local benchmarking disabling PGO may give more reliable results (as PGO data may not apply to changed code).

For BDN style measurements it is simple to disable PGO via a runtime option (COMPlus_JitDisablePgo=1), since all the code we measure is jitted. For end-to-end scenarios it is harder to measure non-PGO; PGO alters the R2R contents and so non-PGO measurements would require a special build.

Given resource constraints we can't measure both PGO and non-PGO all the time. Since the product default will (likely) be to use PGO, we should measure and focus on that as our primary metric, and try and measure non-PGO if/when we can -- maybe 1x day or so would be sufficient.

@danmoseley
Copy link
Member

Sounds good. We should maybe document the PGO disable flag in our perf workflow if we don't already.
Cc @adamsitnik

@billwert
Copy link
Member

billwert commented Apr 7, 2021

For BDN style measurements it is simple to disable PGO via a runtime option (COMPlus_JitDisablePgo=1), since all the code we measure is jitted.

This is not true. The vast majority of the code we run through microbenchmark testing with BDN is library code, which is all precompiled and would have PGO applied. We would need to build the product in a release configuration without PGO applied to effectively test it in this manner.

@billwert
Copy link
Member

billwert commented Apr 7, 2021

Perf measurements on non-PGO builds in general are of questionable value to me. For scenario (startup, size, etc) we count on PGO to fix pages and improve page density. Unoptimized binaries are also going to be somewhat more random day over day as code is added and removed, which can lead to unexplained changes in the non-PGO trend. This decreases the usefulness.

For microbenchmark style testing, non-PGO perhaps makes more sense. (At least until PGO starts doing heavy lifting on block rearrangement, intra-method splitting, etc.) (Maybe it already does?) However, let me pose a question: if the non-PGO trend moves (and it's not due to a random effect) and the PGO trend does not, what do you do? There's nothing to "fix" from the customer perspective.

@AndyAyersMS
Copy link
Member

The vast majority of the code we run through microbenchmark testing with BDN is library code, which is all precompiled and would have PGO applied

That would be true if we measured BDN perf with TC disabled. But when TC is on, we will only ever be measuring the perf of jitted code.

At least until PGO starts doing heavy lifting on block rearrangement, intra-method splitting, etc.) (Maybe it already does?)

Block arrangement yes, splitting, not yet -- though it is opportunistic for 6.0 (but not likely, I would guess).

Perf measurements on non-PGO builds in general are of questionable value to me.

Despite what I wrote above, I agree with that, for non-BDN scenarios. Especially so if we have some independent measure of "work" like instructions retired to look at (we seems to have lost that ability recently on windows, and have never had it elsewhere).

For scenario (startup, size, etc) we count on PGO to fix pages and improve page density.

We don't do those things (yet) for R2R images. Hopefully we get to it soon.

let me pose a question: if the non-PGO trend moves (and it's not due to a random effect) and the PGO trend does not, what do you do?

We only use the non-PGO results for diagnostic purposes. Which is why we only run them ever so often.

If both PGO and non-PGO regress then the issue is likely not a PGO issue (so we look at libraries, jit, runtime commits). If only PGO regresses the issue is with the PGO data or the way the jit interprets the PGO data (so we look at PGO data age/collection, and jit commits). If only non-PGO regresses then we don't worry about it as much (provided instructions retired data can verify non-PGO perf difference was purely in clocks and not in instructions).

@AndyAyersMS
Copy link
Member

Not to muddy the waters any further, but it would also be nice to start measuring dynamic PGO perf on a regular basis. I do this on a somewhat ad-hoc basis currently for parts of TechEmpower.

So I'd suggest:

  • frequent, dense collection of perf for the default case: static PGO. Triaged regularly. (we do this already)
  • infrequent, sparse collection of perf for dynamic PGO. Triaged less regularly. (new)
  • infrequent, sparse collection of perf for non-PGO. Used only during triage of the above two. (new)

@billwert
Copy link
Member

billwert commented Apr 7, 2021

That would be true if we measured BDN perf with TC disabled. But when TC is on, we will only ever be measuring the perf of jitted code.

This is something I didn't appreciate. My belief was that R2R would be used even if TC was on, and R2R would be the best code we could make. That of course changes things. (of course that's not true re: indirections and lots of other things. I just didn't think it through.)

@mairaw
Copy link
Contributor

mairaw commented May 26, 2023

Bulk closing .NET 6 epics and user stories. If you think this issue was closed in error, please reopen the issue and update it accordingly.

@mairaw mairaw closed this as completed May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bulk-closed Priority:2 Work that is important, but not critical for the release Team:Runtime Theme Groups multiple epics.
Projects
None yet
Development

No branches or pull requests

7 participants