-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono][interp] Add tiering within interpreter #68823
Conversation
Tagging subscribers to this area: @BrzVlad Issue DetailsCompile code initially without optimizations, recompile methods with optimizations enabled once we reach a certain threshold.
|
just checking CI for now |
Contributes to #63809 |
For a single MonoMethod*, we can have two InterpMethod* instances, one with optimized flag false and the other with true. When tiering is enabled, when first getting an InterpMethod* for a MonoMethod* we set the optimized flag to false. When generatig code for this method, if optimized is false we must emit a special MINT_TIER_ENTER_METHOD at the start and later in the codegen process we skip applying optimizations to method code. MINT_TIER_ENTER_METHOD opcode is invoked with every method start and it will bump a counter. Once we hit the limit, the method will be tiered up. This process consists of creating a new InterpMethod* instance which have optimized set and storing it in the interp_code_hash, changing the mapping from the old MonoMethod. The optimized and unoptimized method use the same argument space, so tiering the method up requires just to set the ip to the start of the tiered up method code. An additional problem that happens with tiering is that we have to replace all instances of the untiered method from generated code. InterpMethod* instances are stored stored inside data_items of other methods and also inside vtables. When generating code for any method, we have to store in a hash table mappings from untiered InterpMethod* instance to addresses where this instance was stored. When we tier up the unoptimized method, we will traverse the list of addresses where this references is stored and update it to the optimized version.
c1cfa79
to
d08542b
Compare
Some optimizations might not be enabled by default, so add option to enable them.
In unoptimized code, we add a patchpoint instruction when entering basic blocks that are targets of backward branches, if the stack state is empty. This means that when tiering up a frame we just need to jump to the equivalent basic block in the tiered up method and execution can continue. Since the arguments and IL locals reside in the same space in both versions of the method (their offsets are computed in interp_method_compute_offsets)
We always take jit_mm lock when finishing compilation of method, use it also for publishing InterpMethod* fields. This also prevents weird races where the method can be tiered up before the we take the jit_mm lock, resulting in publishing the seq_points for the untiered method
Once we emit a tailcall, execution in the current bblock is finished.
We were doing unsigned conversion before
d08542b
to
197a847
Compare
/azp run runtime-wasm, runtime-extra-platforms |
Azure Pipelines successfully started running 2 pipeline(s). |
f11e710
to
58ee025
Compare
… enabled When invoking these clauses we obtained the InterpMethod from the MonoMethod* and make use of the jit info stored during frame unwinding. However, the method might have been tiered up since storing the jit info, so the native offsets stored there will no longer be relative to the optimized imethod. Fetch again the MonoJitInfo* from the imethod that we will be executing.
58ee025
to
1cd24fc
Compare
This seems stable and ready for review. I tried running some benchmarks from our own repo and didn't notice any difference. I'm not sure however if there was supposed to be any difference there, given it runs code in a loop and unoptimized compilation wouldn't make an impact. I ran the default blazor wasm sample however and tiering had a great startup improvement (around 20%) (I measured it by doing performance profiling on chrome and checked the scripting time, which was most of the time spent in startup) I would agree with @SamMonoRT that it would be easier to merge with tiering enabled, have more iterations of test suites and benchmarking done automatically, and unset it as default (maybe just for the release) if any problems arise. @lewing Any thoughts on how to proceed with this ? Do you know if blazor runs would pick up these changes relatively soon ? |
/azp run runtime-wasm, runtime-extra-platforms |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should merge it after the p5 branch and keep an eye out for regressions.
1cd24fc
to
e1d772e
Compare
This reverts commit 962a455. Prompted by dotnet#69864
Compile code initially without optimizations, recompile methods with optimizations enabled once we reach a certain threshold.