-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enregister EH var that are single def #47307
Conversation
Below is the superpmi diffs for Windows x64/x86/arm64. I investigated some of the methods that showed bigger regression and concluded that either they are existing weakness of register allocation or a flavor of resolving edge (e.g. #47194). Windows x64 (libraries-pmi)
Details
Details
Windows x64 (tests)
Details
Details
Windows x86 (libraries-pmi)
Details
Details
Windows x86 (tests)
Details
Details
Windows arm64 (libraries-pmi)
Details
Details
Windows arm64 (tests)
Details
Details
|
@dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as a prerequisite here we should look for or build up some basic async perf tests, so we can get an idea of the impact.
Also am curious if the manual EH Write Thru that @benaadams did in key async methods (see eg #9514 and dotnet/coreclr#15629) can be undone at this stage or requires something more general. And if it can be undone, do we see any perf impact one way or the other?
src/coreclr/jit/lclvars.cpp
Outdated
@@ -2587,14 +2587,16 @@ void Compiler::lvaSetVarLiveInOutOfHandler(unsigned varNum) | |||
{ | |||
noway_assert(lvaTable[i].lvIsStructField); | |||
lvaTable[i].lvLiveInOutOfHndlr = 1; | |||
if (!lvaEnregEHVars) | |||
// For now, only enregister an EH Var if it is a single def. | |||
if (!lvaEnregEHVars || !lvaTable[i].lvSingleDef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How trustworthy is lvSingleDef
?
I see it being set over in lvaMarkLclRefs
and that should have run not too long before LSRA if I remember correctly. Would be good to verify this.
I suspect it may not be entirely accurate (it may have both false positive and false negatives). So long as we're only using it as a profitability screen here that's probably ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - Even Carol mentioned it and I tried looking around, but didn't find much clue that proves it might not be safe. I will do some more research and sync up offline with you if any questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that it is only set for compiler temps (lvTemp) and it should always be correct for those variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that was the original implementation, but it has been extended to in fgFindJumpTargets to handle additional "root method locals"
// Now that we've seen the IL, set lvSingleDef for root method
// locals.
Most prominent benefiter of this change is |
Lines 28 to 38 in 6cf1b8e
and runtime/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs Lines 156 to 173 in 6cf1b8e
Have mitigations for EH de-enregistering; so hopefully they can removed (and may hurt if they aren't) Should be able to find them all the methods by searching for |
The code in |
There was an issue in how we were setting AsyncMethodBuilderCore.StartBefore:
After:
Calculate benchmark codeprivate const int OperationsPerInvoke = 10000;
[Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
public async Task<int> Calculate()
{
int result = 0;
for (var i = 0; i < OperationsPerInvoke; i++)
{
result += await Task.Run(() => 1);
}
return result;
} Asmdiff: https://www.diffchecker.com/fOIS7SHG ExecutionContext.RunInternalBefore
After
Run benchmark codeprivate const int OperationsPerInvoke = 1000;
[Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
public void Run()
{
var ec = ExecutionContext.Capture();
for (var i = 0; i < OperationsPerInvoke; i++)
{
ExecutionContext.Run(ec, (o) => { }, null);
}
} Asmdiff: https://www.diffchecker.com/U8yRJWy8 Note: "After" asm include changes in jit + reverting the changes in FYI - @stephentoub |
@AndyAyersMS - if you agree with the benchmark code that I used above, I will add it to our Microbenchmarks. |
src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs
Show resolved
Hide resolved
Nice that's what I was hoping the C# code could look like |
1f6287c
to
1ab5263
Compare
1ab5263
to
b44d76d
Compare
3328ef1
to
6d1a7c1
Compare
301bed4
to
bf8e72b
Compare
bf8e72b
to
24b0af6
Compare
src/coreclr/jit/compiler.h
Outdated
@@ -454,6 +454,10 @@ class LclVarDsc | |||
// before lvaMarkLocalVars: identifies ref type locals that can get type updates | |||
// after lvaMarkLocalVars: identifies locals that are suitable for optAddCopies | |||
|
|||
unsigned char lvEhWriteThruCandidate : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add comments.
src/coreclr/scripts/superpmi.py
Outdated
@@ -1565,7 +1565,7 @@ def replay_with_asm_diffs(self): | |||
"COMPlus_JitDiffableDasm": "1", | |||
"COMPlus_JitEnableNoWayAssert": "1", | |||
"COMPlus_JitNoForceFallback": "1", | |||
"COMPlus_JitDisasmWithGC": "1" } | |||
"COMPlus_JitDisasmWithGC": "0" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Revert this change.
I tried looking into some benchmarks that either has exception handling or use async/await and here are some of the results of % changes from this PR.
|
Benchmark numbers appreciated and encouraging. I'm still seeing a write thru failure in the weekend experimental CI runs, is this something you've looked into: https://dev.azure.com/dnceng/public/_build/results?buildId=1026714&view=ms.vss-test-web.build-test-results-tab ? |
Yes, that is addressed in ba57957. Basically, the failure is caused because one of the SIMD EH variable gets marked enregistered and we try to save/restore upper vector in a |
Final CodeSize and PerfScore diffs: Windows x64 (libraries-spmi)
Detail diffs
Detail diffs
Windows x64 (test-pmi)
Detail diffs
Detail diffs
Windows x64 (benchmarks)
Detail diffs
Detail diffs
Windows arm64 (libraries-spmi)
Detail diffs
Detail diffs
Windows arm64 (tests-spmi)
Detail diffs
Detail diffs
Windows arm64 (benchmarks)
Detail diffs
Detail diffs
|
Linux x64 (libraries-spmi)
Detail diffs
Detail diffs
Linux arm64 (libraries-spmi)
Detail diffs
Detail diffs
Windows x86 (libraries-spmi)
Detail diffs
Detail diffs
Windows x86 (tests-spmi)
Detail diffs
Detail diffs
Windows x86 (benchmarks)
Detail diffs
Detail diffs
Linux arm (libraries-spmi)
Detail diffs
Detail diffs
|
Takeaway:
|
075bca4
to
fc6823e
Compare
@AndyAyersMS - This should be ready for review. Once #49286 is merged, we should be green on jitstressregs pipelines for EH Write thru. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions and notes on comments.
src/coreclr/jit/lclvars.cpp
Outdated
bool needsExplicitZeroInit = fgVarNeedsExplicitZeroInit(lclNum, bbInALoop, bbIsReturn); | ||
|
||
if ((varDsc->lvEhWriteThruCandidate == true) || (needsExplicitZeroInit == true) || | ||
(tree->gtFlags & GTF_COLON_COND) || (tree->gtFlags & GTF_VAR_USEASG)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add comment on why these cause exclusion?
USEASG
I can guess, at though it seems it would be redundant with restricting to single def?
COLON_COND
is a mystery to me, why does it matter? It goes away during morph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I just copied the conditions of USEASG
and COLON_COND
from lvSingleDef
logic above. Yes, I don't think we need it specially because we already check for varDsc->lvEhWriteThruCandidate
.
// that was set by past phases. | ||
if (!isRecompute) | ||
{ | ||
varDsc->lvSingleDef = varDsc->lvIsParam; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this -- if we dead store something we can turn it from multiple def to single def. I suppose that won't happen for live into handler vars but it can for other vars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that won't happen for live into handler vars but it can for other vars.
I am not 100% sure if that is the case.
Currently, in the PR, I am marking lvEhWriteThruCandidate
in one of the early phase of lvaMarkLocalVars()
. I have to do it because that information is used during SSABuilder/Lowering to determine if a varDsc
should be enregistered or not. The weird part of this logic for lvaSetVarLiveInOutOfHandler()
is that during SSABuilder, we might mark a varDsc->lvDoNotEnregister=1
but later during Lowering, if things change and we want to reset lvDoNotEnregister
to enregister that EH variable, currently we don't do that. That should be fixed IMO.
As per your suggestion, I should definitely take into account dead store, etc. and defer the setting of lvEhWriteThruCandidate
until I recompute the ref count during Lowering, but it would be too late and we might have already marked the varDsc
to be "do not enregister".
So perhaps, there might be some refactoring involved to accomplish things to happen in right order and I might want to do it in a separate PR. Other thing that I want to do in the follow-up PR is to support local fields, because those are skipped in this PR from getting marked as lvEhWriteThruCandidate
. Perhaps, the logic of setting lvEhWriteThruCandidate
should ideally be around this code.
src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see this finally getting turned on.
* upstream/main: (83 commits) Fix a crash in llvm if the sreg of a setret is not set because the methods ends with a throw. (dotnet#49122) [macOS-arm64] Disable failing libraries tests (dotnet#49400) improve PriorityQueue documentation (dotnet#49392) [wasm] Fix debugger tests (dotnet#49206) [mono] Fix the emission of EnumEqualityComparer instances into the corlib AOT image. (dotnet#49402) jitutils M2M renaming reaction (dotnet#49430) WinHttpHandler: Read HTTP/2 trailing headers [RyuJIT] Make casthelpers cold for sealed classes (dotnet#49295) JIT: Non-void ThrowHelpers (dotnet#48589) Update package index for servicing (dotnet#49417) Remove unnecessary check on polymorphic serialization (dotnet#48464) Remove release build cron triggers from jitstress jobs (dotnet#49333) [main] Update dependencies from dotnet/arcade dotnet/llvm-project dotnet/runtime-assets (dotnet#49359) Implement AppleCryptoNative_X509GetRawData using SecCertificateCopyData [AndroidCrypto] Support a zero-length salt for HMACs. (dotnet#49384) Use managed implementation of pbkdf2 for Android's one-shot implementation. (dotnet#49314) Make 303 redirects do GET like Net Framework (dotnet#49095) Make sure event generation is incremental (dotnet#48903) Add amd and Surface arm64 perf runs (dotnet#49389) Enregister EH var that are single def (dotnet#47307) ...
In #35534, we saw mixed result of improvements and regressions. Hence the plan is to gradually enable this feature. This is the first step where we would enregister all EH vars in a method that are single def. In future, we will enable more scenarios.
Contributes to #35923