-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Theme: Improve startup and throughput using runtime execution information (PGO) #5491
Comments
@billwert @DrewScoggins @davidwrighton will our microbenchmark data continue to be gathered on pre-PGO bits, or on PGOized bits? My assumption is the former, because (1) PGO may introduce unexpected "regressions" and "improvements" when training data is updated, which would hide real regressions (2) PGO is not designed to improve microbenchmarks, in general. cc @stephentoub I recall working on VS that when optimization data changed (or grew stale), perf moved around - but in that case, we were measuring E2E scenarios, which is what the optimization was specifically for, and it was valuable to know that e.g, the data had become stale. |
My understanding is that when you do a normal release build you get PGO data applied. This is the pattern that we follow. I know that we do not use the JIT PGO opt-out currently. I know that in the past we have been able to see the signs of stale PGO data in microbenchmark numbers, but I will admit I am not an expert in this space. |
I see. I am interested in others' thoughts In my mind it is a non goal for micro benchmarks results to closely track customer experience but rather the goal is to reliably and promptly detect regressions caused by changes in the implementation, by keeping as many other factors constant as possible. Perhaps I'm thinking about it wrongly. |
I would expect to use our previous release as a means to detect regression. |
I was talking about day to day measurements, but good point: it's probably inevitable that any GA baseline is PGO bits. |
Even for day to day, I would only measure our shipping scenario (eg. pgo) and then compare to previous release for regression/improvement. |
Ideally we'd track both PGO and non-PGO performance anywhere we care to make measurements. This accomplishes a few things:
For local benchmarking disabling PGO may give more reliable results (as PGO data may not apply to changed code). For BDN style measurements it is simple to disable PGO via a runtime option ( Given resource constraints we can't measure both PGO and non-PGO all the time. Since the product default will (likely) be to use PGO, we should measure and focus on that as our primary metric, and try and measure non-PGO if/when we can -- maybe 1x day or so would be sufficient. |
Sounds good. We should maybe document the PGO disable flag in our perf workflow if we don't already. |
This is not true. The vast majority of the code we run through microbenchmark testing with BDN is library code, which is all precompiled and would have PGO applied. We would need to build the product in a release configuration without PGO applied to effectively test it in this manner. |
Perf measurements on non-PGO builds in general are of questionable value to me. For scenario (startup, size, etc) we count on PGO to fix pages and improve page density. Unoptimized binaries are also going to be somewhat more random day over day as code is added and removed, which can lead to unexplained changes in the non-PGO trend. This decreases the usefulness. For microbenchmark style testing, non-PGO perhaps makes more sense. (At least until PGO starts doing heavy lifting on block rearrangement, intra-method splitting, etc.) (Maybe it already does?) However, let me pose a question: if the non-PGO trend moves (and it's not due to a random effect) and the PGO trend does not, what do you do? There's nothing to "fix" from the customer perspective. |
That would be true if we measured BDN perf with TC disabled. But when TC is on, we will only ever be measuring the perf of jitted code.
Block arrangement yes, splitting, not yet -- though it is opportunistic for 6.0 (but not likely, I would guess).
Despite what I wrote above, I agree with that, for non-BDN scenarios. Especially so if we have some independent measure of "work" like instructions retired to look at (we seems to have lost that ability recently on windows, and have never had it elsewhere).
We don't do those things (yet) for R2R images. Hopefully we get to it soon.
We only use the non-PGO results for diagnostic purposes. Which is why we only run them ever so often. If both PGO and non-PGO regress then the issue is likely not a PGO issue (so we look at libraries, jit, runtime commits). If only PGO regresses the issue is with the PGO data or the way the jit interprets the PGO data (so we look at PGO data age/collection, and jit commits). If only non-PGO regresses then we don't worry about it as much (provided instructions retired data can verify non-PGO perf difference was purely in clocks and not in instructions). |
Not to muddy the waters any further, but it would also be nice to start measuring dynamic PGO perf on a regular basis. I do this on a somewhat ad-hoc basis currently for parts of TechEmpower. So I'd suggest:
|
This is something I didn't appreciate. My belief was that R2R would be used even if TC was on, and R2R would be the best code we could make. That of course changes things. (of course that's not true re: indirections and lots of other things. I just didn't think it through.) |
Bulk closing .NET 6 epics and user stories. If you think this issue was closed in error, please reopen the issue and update it accordingly. |
Profile guided optimization is a standard feature of many development platforms. We would like to make it a standard feature of .NET. It is already a feature of .NET in that the platform is PGO-optimized, but it doesn't extend beyond that. PGO for .NET is a multi-release journey, with the goal of significantly improved startup and throughput performance and working set reduction using both automatic and opt-in mechanisms/scenarios.
Today, both the native runtime and managed libraries are PGO-compiled using training data that we produce ourselves. The native runtime uses standard C++ compiler tools. The managed libraries use a technology called "Instruction Block Count" (IBC). IBC is notoriously hard to use (only a few people on the .NET team can use it effectively). Going forward, we intend to produce new tools that are straightforward to use, for both training (that product PGO data) and PGO-optimized compilation (that consume PGO data).
A key split in the PGO domain is static versus dynamic PGO. We are interested in both, with the following prioritization/staging:
More context:
User stories:
The text was updated successfully, but these errors were encountered: