-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Enable edge based profiles for all scenarios #80481
JIT: Enable edge based profiles for all scenarios #80481
Conversation
Enable edge based profiles for OSR, partial compilation, and optimized plus instrumented cases. For OSR this requires deferring flow graph modifications until after we have built the initial probe list, so that the initial list reflects the entirety of the method. This set of candidate edge probes is thus the same no matter how the method is compiled. A given compile may schematize a subset of these probes and materialize a subset of what gets schematized; this is tolerated by the PGO mechanism provided that the initial instrumented jitting produces a schema which is a superset of the schema produced by any subsequent instrumented rejitting. This is normally the case. Partial compilation may still need some work to ensure full schematization but it is currently off by default. Will address this subsequently. For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind of probe relocation abilities that we have in the BlockCountInstrumentor. In particular we need to move probes that might appear in return blocks that follow implicit tail call blocks, since those return blocks must remain empty. The details on how we do this are a bit different but the idea is the same: we create duplicate copies of any probe that was going to appear in the return block and instead instrument each pred. If the pred reached the return via a critical edge, we split the edge and put the probe there. This analysis relies on cheap preds, so to ensure we can use them we move all the critial edge splitting so it happens before we need the cheap pred lists. The ability to do block profiling is retained but will no longer be used without special config settings. There were also a few bug fixes in the spanning tree visitor. It must visit a superset of the blocks we end up importing and was missing visits in some cases. This should improve jit time and code quality for instrumented code. Fixes dotnet#47942. Fixes dotnet#66101. Contributes to dotnet#74873.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsEnable edge based profiles for OSR, partial compilation, and optimized plus instrumented cases. For OSR this requires deferring flow graph modifications until after we have built the initial probe list, so that the initial list reflects the entirety of the method. This set of candidate edge probes is thus the same no matter how the method is compiled. A given compile may schematize a subset of these probes and materialize a subset of what gets schematized; this is tolerated by the PGO mechanism provided that the initial instrumented jitting produces a schema which is a superset of the schema produced by any subsequent instrumented rejitting. This is normally the case. Partial compilation may still need some work to ensure full schematization but it is currently off by default. Will address this subsequently. For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind of probe relocation abilities that we have in the BlockCountInstrumentor. In particular we need to move probes that might appear in return blocks that follow implicit tail call blocks, since those return blocks must remain empty. The details on how we do this are a bit different but the idea is the same: we create duplicate copies of any probe that was going to appear in the return block and instead instrument each pred. If the pred reached the return via a critical edge, we split the edge and put the probe there. This analysis relies on cheap preds, so to ensure we can use them we move all the critial edge splitting so it happens before we need the cheap pred lists. The ability to do block profiling is retained but will no longer be used without special config settings. There were also a few bug fixes in the spanning tree visitor. It must visit a superset of the blocks we end up importing and was missing visits in some cases. This should improve jit time and code quality for instrumented code. Fixes #47942.
|
@EgorBo PTAL Expect sizeable diffs in ASP.NET. |
Nice diffs for the aspnet collection! Btw, perhaps it makes sense to enable Dynamic PGO for coreclr_tests.run collection I'll take a look tomorrow if you don't mind, I'm OOF today |
/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo, runtime-coreclr pgostress |
Azure Pipelines successfully started running 3 pipeline(s). |
Sure ... might be easier to ignore whitespace when you look at diffs as I moved / re-indented some of the block profiling code. |
Extended pgo testing is seeing some failures:
|
@AndyAyersMS So, for easier testing, we can always use optimizations in the instrumentation tier even if we rejit just IL-only tier0. (or it can be stress-mode based). |
Probably a good stress mode to add, yeah. |
Recent failures all looked like infra issues, retrying. |
@EgorBo ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Played with your branch locally and it looks great!
Wow, nice wins! block counters seem to be expensive |
Enable edge based profiles for OSR, partial compilation, and optimized plus instrumented cases.
For OSR this requires deferring flow graph modifications until after we have built the initial probe list, so that the initial list reflects the entirety of the method. This set of candidate edge probes is thus the same no matter how the method is compiled. A given compile may schematize a subset of these probes and materialize a subset of what gets schematized; this is tolerated by the PGO mechanism provided that the initial instrumented jitting produces a schema which is a superset of the schema produced by any subsequent instrumented rejitting. This is normally the case.
Partial compilation may still need some work to ensure full schematization but it is currently off by default. Will address this subsequently.
For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind of probe relocation abilities that we have in the BlockCountInstrumentor. In particular we need to move probes that might appear in return blocks that follow implicit tail call blocks, since those return blocks must remain empty.
The details on how we do this are a bit different but the idea is the same: we create duplicate copies of any probe that was going to appear in the return block and instead instrument each pred. If the pred reached the return via a critical edge, we split the edge and put the probe there. This analysis relies on cheap preds, so to ensure we can use them we move all the critial edge splitting so it happens before we need the cheap pred lists.
The ability to do block profiling is retained but will no longer be used without special config settings.
There were also a few bug fixes in the spanning tree visitor. It must visit a superset of the blocks we end up importing and was missing visits in some cases.
This should improve jit time and code quality for instrumented code.
Fixes #47942.
Fixes #66101.
Contributes to #74873.