-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize lsra for MinOpts #96386
Optimize lsra for MinOpts #96386
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Detailsnull
|
I tried to investigate the windows-arm64 TP regression It turns out that the majority of slowness is attributed to 2 method contexts of
If I exclude those 2 contexts from the 2999 diff and 15658 diff |
Yes, so here is the breakdown: FullOpts:
MinOpts:
Looking in more detail, I checked what % of contents are affected in each of the configuration. For x64, there are around 28% (both windows/linux) and on arm64, there are around 15% (both windows/linux), so we at least see that the contexts affected are same for a particular architecture.
Linux/x64 analysisValues are mostly in - 400FB67F4C movzx rdi, byte ptr [rdi+0x4C]
+ 0FB6404C movzx rax, byte ptr [rax+0x4C] Some other places, since value is mostly in That could justify the improvements on linux, but if you want I can dig further. |
IMO, since the change is about MinOpts, it should be no-diff for FullOpts. How does the change end up affecting codegen for FullOpts in the cases where we have no tracking variables? Isn't that unexpected? And if it's expected, can the new logic be made to run only if Edit: It makes sense to me that we'd still want the optimized heuristics for LIR temps, even when we don't have any locals to enregister, so I think we should base the use of the fast and less optimal heuristics around |
This change is not just for MinOpts, but when we decide that enregistering of local vars is not needed. This is fairly common in MinOpts/Tier0/
We turn off runtime/src/coreclr/jit/lsra.cpp Lines 1353 to 1358 in 62d33ee
I am not sure I follow what you are suggesting. |
The end result seems to be that we are regressing CQ in FullOpts for a throughput optimization. That doesn't seem right.
Call |
If @jakobbotsch suggestion restricts the change the MinOpts, then this change could be only about MinOpts CQ + TP. @kunalspathak if you think a version of this change applies to FullOpts, then perhaps that could be made as a follow-up, to isolate the conversation about the FullOpts impact to that PR. |
ah i see. Yes, I was trying out similar change locally to eliminate the full opt impact. |
superpmi failures are: OSError: [Errno 28] No space left on device /azp run runtime-coreclr superpmi-replay |
/azp run runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 1 pipeline(s). |
Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on Overall (
|
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run.linux.arm64.checked.mch | 14,270,080 | +0 |
benchmarks.run_pgo.linux.arm64.checked.mch | 83,058,256 | |
benchmarks.run_tiered.linux.arm64.checked.mch | 21,983,044 | |
coreclr_tests.run.linux.arm64.checked.mch | 510,170,492 | |
libraries.crossgen2.linux.arm64.checked.mch | 55,689,604 | +0 |
libraries.pmi.linux.arm64.checked.mch | 75,931,904 | +0 |
libraries_tests.run.linux.arm64.Release.mch | 379,668,404 | |
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch | 162,138,568 | |
realworld.run.linux.arm64.checked.mch | 15,784,696 | +0 |
MinOpts (${\color{red}+148,204}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run.linux.arm64.checked.mch | 218,508 | +0 |
benchmarks.run_pgo.linux.arm64.checked.mch | 26,552,600 | |
benchmarks.run_tiered.linux.arm64.checked.mch | 17,400,224 | |
coreclr_tests.run.linux.arm64.checked.mch | 349,264,408 | |
libraries.crossgen2.linux.arm64.checked.mch | 1,636 | +0 |
libraries.pmi.linux.arm64.checked.mch | 119,984 | +0 |
libraries_tests.run.linux.arm64.Release.mch | 214,980,496 | |
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch | 13,480,568 | |
realworld.run.linux.arm64.checked.mch | 575,816 | +0 |
Assembly diffs for linux/x64 ran on windows/x64
Diffs are based on
Overall (${\color{green}-464,740}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run.linux.x64.checked.mch | 13,063,008 | |
benchmarks.run_pgo.linux.x64.checked.mch | 64,981,453 | |
benchmarks.run_tiered.linux.x64.checked.mch | 17,221,305 | |
coreclr_tests.run.linux.x64.checked.mch | 439,635,202 | |
libraries.crossgen2.linux.x64.checked.mch | 38,636,779 | |
libraries.pmi.linux.x64.checked.mch | 59,927,414 | |
libraries_tests.run.linux.x64.Release.mch | 328,845,908 | |
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch | 129,732,524 | |
realworld.run.linux.x64.checked.mch | 13,174,701 | |
smoke_tests.nativeaot.linux.x64.checked.mch | 4,191,403 | +0 |
MinOpts (${\color{green}-464,740}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run.linux.x64.checked.mch | 230,051 | |
benchmarks.run_pgo.linux.x64.checked.mch | 18,706,882 | |
benchmarks.run_tiered.linux.x64.checked.mch | 13,644,734 | |
coreclr_tests.run.linux.x64.checked.mch | 310,584,524 | |
libraries.crossgen2.linux.x64.checked.mch | 1,202 | |
libraries.pmi.linux.x64.checked.mch | 112,870 | |
libraries_tests.run.linux.x64.Release.mch | 183,459,665 | |
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch | 10,657,089 | |
realworld.run.linux.x64.checked.mch | 387,256 | |
smoke_tests.nativeaot.linux.x64.checked.mch | 911 | +0 |
Assembly diffs for osx/arm64 ran on windows/x64
Diffs are based on
Overall (${\color{red}+150,496}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.osx.arm64.checked.mch | 34,138,564 | |
benchmarks.run_tiered.osx.arm64.checked.mch | 15,442,448 | |
coreclr_tests.run.osx.arm64.checked.mch | 486,102,776 | |
libraries.crossgen2.osx.arm64.checked.mch | 55,570,716 | +0 |
libraries.pmi.osx.arm64.checked.mch | 79,907,992 | +0 |
libraries_tests.run.osx.arm64.Release.mch | 314,762,740 | |
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch | 159,436,088 | |
realworld.run.osx.arm64.checked.mch | 15,065,616 | +0 |
MinOpts (${\color{red}+150,496}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.osx.arm64.checked.mch | 16,484,956 | |
benchmarks.run_tiered.osx.arm64.checked.mch | 11,499,424 | |
coreclr_tests.run.osx.arm64.checked.mch | 332,741,972 | |
libraries.crossgen2.osx.arm64.checked.mch | 1,628 | +0 |
libraries.pmi.osx.arm64.checked.mch | 121,128 | +0 |
libraries_tests.run.osx.arm64.Release.mch | 203,635,640 | |
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch | 13,136,596 | |
realworld.run.osx.arm64.checked.mch | 568,396 | +0 |
Assembly diffs for windows/arm64 ran on windows/x64
Diffs are based on
Overall (${\color{red}+148,732}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.windows.arm64.checked.mch | 45,619,340 | |
benchmarks.run_tiered.windows.arm64.checked.mch | 15,264,644 | |
coreclr_tests.run.windows.arm64.checked.mch | 496,117,708 | |
libraries.crossgen2.windows.arm64.checked.mch | 58,913,600 | +0 |
libraries.pmi.windows.arm64.checked.mch | 79,525,232 | +0 |
libraries_tests.run.windows.arm64.Release.mch | 319,612,456 | |
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch | 169,012,984 | |
realworld.run.windows.arm64.checked.mch | 15,917,176 | +0 |
MinOpts (${\color{red}+148,732}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run_pgo.windows.arm64.checked.mch | 16,205,148 | |
benchmarks.run_tiered.windows.arm64.checked.mch | 11,172,336 | |
coreclr_tests.run.windows.arm64.checked.mch | 339,719,232 | |
libraries.crossgen2.windows.arm64.checked.mch | 1,636 | +0 |
libraries.pmi.windows.arm64.checked.mch | 119,984 | +0 |
libraries_tests.run.windows.arm64.Release.mch | 203,904,856 | |
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch | 13,136,180 | |
realworld.run.windows.arm64.checked.mch | 568,424 | +0 |
Assembly diffs for windows/x64 ran on windows/x64
Diffs are based on
Overall (${\color{green}-4,578}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
aspnet.run.windows.x64.checked.mch | 42,070,891 | |
benchmarks.run.windows.x64.checked.mch | 8,666,396 | |
benchmarks.run_pgo.windows.x64.checked.mch | 35,349,995 | |
benchmarks.run_tiered.windows.x64.checked.mch | 12,686,993 | |
coreclr_tests.run.windows.x64.checked.mch | 393,136,663 | |
libraries.crossgen2.windows.x64.checked.mch | 39,411,196 | |
libraries.pmi.windows.x64.checked.mch | 61,131,115 | |
libraries_tests.run.windows.x64.Release.mch | 279,572,690 | |
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch | 133,740,839 | |
realworld.run.windows.x64.checked.mch | 14,188,290 | |
smoke_tests.nativeaot.windows.x64.checked.mch | 5,085,035 | +0 |
MinOpts (${\color{green}-4,578}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
aspnet.run.windows.x64.checked.mch | 14,658,725 | |
benchmarks.run.windows.x64.checked.mch | 361 | |
benchmarks.run_pgo.windows.x64.checked.mch | 14,238,248 | |
benchmarks.run_tiered.windows.x64.checked.mch | 9,174,699 | |
coreclr_tests.run.windows.x64.checked.mch | 273,514,441 | |
libraries.crossgen2.windows.x64.checked.mch | 1,189 | |
libraries.pmi.windows.x64.checked.mch | 113,519 | |
libraries_tests.run.windows.x64.Release.mch | 175,492,237 | |
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch | 10,810,850 | |
realworld.run.windows.x64.checked.mch | 389,705 | |
smoke_tests.nativeaot.windows.x64.checked.mch | 909 | +0 |
Details here
Assembly diffs for linux/arm ran on windows/x86
Diffs are based on
Overall (${\color{green}-852,700}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run.linux.arm.checked.mch | 14,172,892 | |
benchmarks.run_pgo.linux.arm.checked.mch | 58,450,272 | |
benchmarks.run_tiered.linux.arm.checked.mch | 17,364,030 | |
coreclr_tests.run.linux.arm.checked.mch | 321,914,498 | |
libraries.crossgen2.linux.arm.checked.mch | 37,768,318 | +0 |
libraries.pmi.linux.arm.checked.mch | 49,500,180 | |
libraries_tests.run.linux.arm.Release.mch | 240,715,714 | |
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch | 92,863,410 | |
realworld.run.linux.arm.checked.mch | 13,608,080 |
MinOpts (${\color{green}-852,700}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run.linux.arm.checked.mch | 334,862 | |
benchmarks.run_pgo.linux.arm.checked.mch | 11,222,492 | |
benchmarks.run_tiered.linux.arm.checked.mch | 7,194,320 | |
coreclr_tests.run.linux.arm.checked.mch | 212,840,102 | |
libraries.crossgen2.linux.arm.checked.mch | 1,230 | +0 |
libraries.pmi.linux.arm.checked.mch | 106,504 | |
libraries_tests.run.linux.arm.Release.mch | 122,210,132 | |
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch | 10,081,872 | |
realworld.run.linux.arm.checked.mch | 449,296 |
Assembly diffs for windows/x86 ran on windows/x86
Diffs are based on
Overall (${\color{red}+199,265}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run.windows.x86.checked.mch | 7,037,944 | |
benchmarks.run_pgo.windows.x86.checked.mch | 43,076,375 | |
benchmarks.run_tiered.windows.x86.checked.mch | 8,947,008 | |
coreclr_tests.run.windows.x86.checked.mch | 305,219,044 | |
libraries.crossgen2.windows.x86.checked.mch | 31,623,120 | |
libraries.pmi.windows.x86.checked.mch | 48,807,890 | |
libraries_tests.run.windows.x86.Release.mch | 162,974,883 | |
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch | 100,565,179 | |
realworld.run.windows.x86.checked.mch | 11,349,106 |
MinOpts (${\color{red}+199,265}$ bytes)
Collection | Base size (bytes) | Diff size (bytes) |
---|---|---|
benchmarks.run.windows.x86.checked.mch | 279 | |
benchmarks.run_pgo.windows.x86.checked.mch | 5,966,425 | |
benchmarks.run_tiered.windows.x86.checked.mch | 3,848,434 | |
coreclr_tests.run.windows.x86.checked.mch | 198,156,671 | |
libraries.crossgen2.windows.x86.checked.mch | 1,057 | |
libraries.pmi.windows.x86.checked.mch | 95,314 | |
libraries_tests.run.windows.x86.Release.mch | 77,656,212 | |
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch | 7,273,446 | |
realworld.run.windows.x86.checked.mch | 295,700 |
Details here
Throughput diffs
Throughput diffs for linux/arm64 ran on windows/x64
Overall (${\color{green}-4.82\%}$ to ${\color{green}-0.04\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.linux.arm64.checked.mch | |
benchmarks.run_pgo.linux.arm64.checked.mch | |
benchmarks.run_tiered.linux.arm64.checked.mch | |
coreclr_tests.run.linux.arm64.checked.mch | |
libraries.crossgen2.linux.arm64.checked.mch | |
libraries.pmi.linux.arm64.checked.mch | |
libraries_tests.run.linux.arm64.Release.mch | |
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch | |
realworld.run.linux.arm64.checked.mch | |
smoke_tests.nativeaot.linux.arm64.checked.mch |
MinOpts (${\color{green}-10.51\%}$ to ${\color{green}-6.50\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.linux.arm64.checked.mch | |
benchmarks.run_pgo.linux.arm64.checked.mch | |
benchmarks.run_tiered.linux.arm64.checked.mch | |
coreclr_tests.run.linux.arm64.checked.mch | |
libraries.crossgen2.linux.arm64.checked.mch | |
libraries.pmi.linux.arm64.checked.mch | |
libraries_tests.run.linux.arm64.Release.mch | |
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch | |
realworld.run.linux.arm64.checked.mch | |
smoke_tests.nativeaot.linux.arm64.checked.mch |
FullOpts (${\color{green}-0.07\%}$ to ${\color{green}-0.03\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.linux.arm64.checked.mch | |
benchmarks.run_pgo.linux.arm64.checked.mch | |
benchmarks.run_tiered.linux.arm64.checked.mch | |
coreclr_tests.run.linux.arm64.checked.mch | |
libraries.crossgen2.linux.arm64.checked.mch | |
libraries.pmi.linux.arm64.checked.mch | |
libraries_tests.run.linux.arm64.Release.mch | |
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch | |
realworld.run.linux.arm64.checked.mch | |
smoke_tests.nativeaot.linux.arm64.checked.mch |
Throughput diffs for linux/x64 ran on windows/x64
Overall (${\color{green}-2.72\%}$ to ${\color{green}-0.05\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.linux.x64.checked.mch | |
benchmarks.run_pgo.linux.x64.checked.mch | |
benchmarks.run_tiered.linux.x64.checked.mch | |
coreclr_tests.run.linux.x64.checked.mch | |
libraries.crossgen2.linux.x64.checked.mch | |
libraries.pmi.linux.x64.checked.mch | |
libraries_tests.run.linux.x64.Release.mch | |
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch | |
realworld.run.linux.x64.checked.mch | |
smoke_tests.nativeaot.linux.x64.checked.mch |
MinOpts (${\color{green}-6.56\%}$ to ${\color{green}-4.49\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.linux.x64.checked.mch | |
benchmarks.run_pgo.linux.x64.checked.mch | |
benchmarks.run_tiered.linux.x64.checked.mch | |
coreclr_tests.run.linux.x64.checked.mch | |
libraries.crossgen2.linux.x64.checked.mch | |
libraries.pmi.linux.x64.checked.mch | |
libraries_tests.run.linux.x64.Release.mch | |
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch | |
realworld.run.linux.x64.checked.mch | |
smoke_tests.nativeaot.linux.x64.checked.mch |
FullOpts (${\color{green}-0.08\%}$ to ${\color{green}-0.04\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.linux.x64.checked.mch | |
benchmarks.run_pgo.linux.x64.checked.mch | |
benchmarks.run_tiered.linux.x64.checked.mch | |
coreclr_tests.run.linux.x64.checked.mch | |
libraries.crossgen2.linux.x64.checked.mch | |
libraries.pmi.linux.x64.checked.mch | |
libraries_tests.run.linux.x64.Release.mch | |
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch | |
realworld.run.linux.x64.checked.mch | |
smoke_tests.nativeaot.linux.x64.checked.mch |
Throughput diffs for osx/arm64 ran on windows/x64
Overall (${\color{green}-4.32\%}$ to ${\color{green}-0.04\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.osx.arm64.checked.mch | |
benchmarks.run_pgo.osx.arm64.checked.mch | |
benchmarks.run_tiered.osx.arm64.checked.mch | |
coreclr_tests.run.osx.arm64.checked.mch | |
libraries.crossgen2.osx.arm64.checked.mch | |
libraries.pmi.osx.arm64.checked.mch | |
libraries_tests.run.osx.arm64.Release.mch | |
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch | |
realworld.run.osx.arm64.checked.mch |
MinOpts (${\color{green}-10.55\%}$ to ${\color{green}-7.99\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.osx.arm64.checked.mch | |
benchmarks.run_pgo.osx.arm64.checked.mch | |
benchmarks.run_tiered.osx.arm64.checked.mch | |
coreclr_tests.run.osx.arm64.checked.mch | |
libraries.crossgen2.osx.arm64.checked.mch | |
libraries.pmi.osx.arm64.checked.mch | |
libraries_tests.run.osx.arm64.Release.mch | |
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch | |
realworld.run.osx.arm64.checked.mch |
FullOpts (${\color{green}-0.07\%}$ to ${\color{green}-0.03\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.osx.arm64.checked.mch | |
benchmarks.run_pgo.osx.arm64.checked.mch | |
benchmarks.run_tiered.osx.arm64.checked.mch | |
coreclr_tests.run.osx.arm64.checked.mch | |
libraries.crossgen2.osx.arm64.checked.mch | |
libraries.pmi.osx.arm64.checked.mch | |
libraries_tests.run.osx.arm64.Release.mch | |
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch | |
realworld.run.osx.arm64.checked.mch |
Throughput diffs for windows/arm64 ran on windows/x64
Overall (${\color{green}-4.16\%}$ to ${\color{green}-0.04\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.windows.arm64.checked.mch | |
benchmarks.run_pgo.windows.arm64.checked.mch | |
benchmarks.run_tiered.windows.arm64.checked.mch | |
coreclr_tests.run.windows.arm64.checked.mch | |
libraries.crossgen2.windows.arm64.checked.mch | |
libraries.pmi.windows.arm64.checked.mch | |
libraries_tests.run.windows.arm64.Release.mch | |
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch | |
realworld.run.windows.arm64.checked.mch | |
smoke_tests.nativeaot.windows.arm64.checked.mch |
MinOpts (${\color{green}-10.55\%}$ to ${\color{green}-6.32\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.windows.arm64.checked.mch | |
benchmarks.run_pgo.windows.arm64.checked.mch | |
benchmarks.run_tiered.windows.arm64.checked.mch | |
coreclr_tests.run.windows.arm64.checked.mch | |
libraries.crossgen2.windows.arm64.checked.mch | |
libraries.pmi.windows.arm64.checked.mch | |
libraries_tests.run.windows.arm64.Release.mch | |
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch | |
realworld.run.windows.arm64.checked.mch | |
smoke_tests.nativeaot.windows.arm64.checked.mch |
FullOpts (${\color{green}-0.07\%}$ to ${\color{green}-0.03\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.windows.arm64.checked.mch | |
benchmarks.run_pgo.windows.arm64.checked.mch | |
benchmarks.run_tiered.windows.arm64.checked.mch | |
coreclr_tests.run.windows.arm64.checked.mch | |
libraries.crossgen2.windows.arm64.checked.mch | |
libraries.pmi.windows.arm64.checked.mch | |
libraries_tests.run.windows.arm64.Release.mch | |
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch | |
realworld.run.windows.arm64.checked.mch | |
smoke_tests.nativeaot.windows.arm64.checked.mch |
Throughput diffs for windows/x64 ran on windows/x64
Overall (${\color{green}-2.54\%}$ to ${\color{green}-0.08\%}$ )
Collection | PDIFF |
---|---|
aspnet.run.windows.x64.checked.mch | |
benchmarks.run.windows.x64.checked.mch | |
benchmarks.run_pgo.windows.x64.checked.mch | |
benchmarks.run_tiered.windows.x64.checked.mch | |
coreclr_tests.run.windows.x64.checked.mch | |
libraries.crossgen2.windows.x64.checked.mch | |
libraries.pmi.windows.x64.checked.mch | |
libraries_tests.run.windows.x64.Release.mch | |
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch | |
realworld.run.windows.x64.checked.mch | |
smoke_tests.nativeaot.windows.x64.checked.mch |
MinOpts (${\color{green}-6.49\%}$ to ${\color{green}-4.45\%}$ )
Collection | PDIFF |
---|---|
aspnet.run.windows.x64.checked.mch | |
benchmarks.run.windows.x64.checked.mch | |
benchmarks.run_pgo.windows.x64.checked.mch | |
benchmarks.run_tiered.windows.x64.checked.mch | |
coreclr_tests.run.windows.x64.checked.mch | |
libraries.crossgen2.windows.x64.checked.mch | |
libraries.pmi.windows.x64.checked.mch | |
libraries_tests.run.windows.x64.Release.mch | |
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch | |
realworld.run.windows.x64.checked.mch | |
smoke_tests.nativeaot.windows.x64.checked.mch |
FullOpts (${\color{green}-0.12\%}$ to ${\color{green}-0.07\%}$ )
Collection | PDIFF |
---|---|
aspnet.run.windows.x64.checked.mch | |
benchmarks.run.windows.x64.checked.mch | |
benchmarks.run_pgo.windows.x64.checked.mch | |
benchmarks.run_tiered.windows.x64.checked.mch | |
coreclr_tests.run.windows.x64.checked.mch | |
libraries.crossgen2.windows.x64.checked.mch | |
libraries.pmi.windows.x64.checked.mch | |
libraries_tests.run.windows.x64.Release.mch | |
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch | |
realworld.run.windows.x64.checked.mch | |
smoke_tests.nativeaot.windows.x64.checked.mch |
Details here
Throughput diffs for linux/arm64 ran on linux/x64
Overall (${\color{green}-5.32\%}$ to ${\color{red}+0.04\%}$ )
Collection | PDIFF |
---|---|
libraries.crossgen2.linux.arm64.checked.mch | |
libraries_tests.run.linux.arm64.Release.mch | |
realworld.run.linux.arm64.checked.mch | |
coreclr_tests.run.linux.arm64.checked.mch | |
benchmarks.run_pgo.linux.arm64.checked.mch | |
libraries.pmi.linux.arm64.checked.mch | |
benchmarks.run_tiered.linux.arm64.checked.mch | |
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch | |
smoke_tests.nativeaot.linux.arm64.checked.mch |
MinOpts (${\color{green}-11.63\%}$ to ${\color{green}-6.46\%}$ )
Collection | PDIFF |
---|---|
libraries.crossgen2.linux.arm64.checked.mch | |
libraries_tests.run.linux.arm64.Release.mch | |
realworld.run.linux.arm64.checked.mch | |
coreclr_tests.run.linux.arm64.checked.mch | |
benchmarks.run_pgo.linux.arm64.checked.mch | |
libraries.pmi.linux.arm64.checked.mch | |
benchmarks.run.linux.arm64.checked.mch | |
benchmarks.run_tiered.linux.arm64.checked.mch | |
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch | |
smoke_tests.nativeaot.linux.arm64.checked.mch |
FullOpts (${\color{red}+0.02\%}$ to ${\color{red}+0.05\%}$ )
Collection | PDIFF |
---|---|
libraries.crossgen2.linux.arm64.checked.mch | |
libraries_tests.run.linux.arm64.Release.mch | |
realworld.run.linux.arm64.checked.mch | |
coreclr_tests.run.linux.arm64.checked.mch | |
benchmarks.run_pgo.linux.arm64.checked.mch | |
libraries.pmi.linux.arm64.checked.mch | |
benchmarks.run.linux.arm64.checked.mch | |
benchmarks.run_tiered.linux.arm64.checked.mch | |
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch | |
smoke_tests.nativeaot.linux.arm64.checked.mch |
Throughput diffs for linux/x64 ran on linux/x64
Overall (${\color{green}-2.87\%}$ to ${\color{green}-0.15\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run_pgo.linux.x64.checked.mch | |
benchmarks.run_tiered.linux.x64.checked.mch | |
libraries.crossgen2.linux.x64.checked.mch | |
libraries.pmi.linux.x64.checked.mch | |
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch | |
realworld.run.linux.x64.checked.mch | |
smoke_tests.nativeaot.linux.x64.checked.mch | |
benchmarks.run.linux.x64.checked.mch | |
libraries_tests.run.linux.x64.Release.mch | |
coreclr_tests.run.linux.x64.checked.mch |
MinOpts (${\color{green}-6.69\%}$ to ${\color{green}-4.65\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run_pgo.linux.x64.checked.mch | |
benchmarks.run_tiered.linux.x64.checked.mch | |
libraries.crossgen2.linux.x64.checked.mch | |
libraries.pmi.linux.x64.checked.mch | |
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch | |
realworld.run.linux.x64.checked.mch | |
smoke_tests.nativeaot.linux.x64.checked.mch | |
benchmarks.run.linux.x64.checked.mch | |
libraries_tests.run.linux.x64.Release.mch | |
coreclr_tests.run.linux.x64.checked.mch |
FullOpts (${\color{green}-0.22\%}$ to ${\color{green}-0.12\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run_pgo.linux.x64.checked.mch | |
benchmarks.run_tiered.linux.x64.checked.mch | |
libraries.crossgen2.linux.x64.checked.mch | |
libraries.pmi.linux.x64.checked.mch | |
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch | |
realworld.run.linux.x64.checked.mch | |
smoke_tests.nativeaot.linux.x64.checked.mch | |
benchmarks.run.linux.x64.checked.mch | |
libraries_tests.run.linux.x64.Release.mch | |
coreclr_tests.run.linux.x64.checked.mch |
Details here
Throughput diffs for linux/arm ran on windows/x86
Overall (${\color{green}-4.15\%}$ to ${\color{green}-0.08\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.linux.arm.checked.mch | |
benchmarks.run_pgo.linux.arm.checked.mch | |
benchmarks.run_tiered.linux.arm.checked.mch | |
coreclr_tests.run.linux.arm.checked.mch | |
libraries.crossgen2.linux.arm.checked.mch | |
libraries.pmi.linux.arm.checked.mch | |
libraries_tests.run.linux.arm.Release.mch | |
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch | |
realworld.run.linux.arm.checked.mch |
MinOpts (${\color{green}-11.56\%}$ to ${\color{green}-9.30\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.linux.arm.checked.mch | |
benchmarks.run_pgo.linux.arm.checked.mch | |
benchmarks.run_tiered.linux.arm.checked.mch | |
coreclr_tests.run.linux.arm.checked.mch | |
libraries.crossgen2.linux.arm.checked.mch | |
libraries.pmi.linux.arm.checked.mch | |
libraries_tests.run.linux.arm.Release.mch | |
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch | |
realworld.run.linux.arm.checked.mch |
FullOpts (${\color{green}-0.15\%}$ to ${\color{green}-0.05\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.linux.arm.checked.mch | |
benchmarks.run_pgo.linux.arm.checked.mch | |
benchmarks.run_tiered.linux.arm.checked.mch | |
coreclr_tests.run.linux.arm.checked.mch | |
libraries.crossgen2.linux.arm.checked.mch | |
libraries.pmi.linux.arm.checked.mch | |
libraries_tests.run.linux.arm.Release.mch | |
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch | |
realworld.run.linux.arm.checked.mch |
Throughput diffs for windows/x86 ran on windows/x86
Overall (${\color{green}-1.68\%}$ to ${\color{red}+0.02\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.windows.x86.checked.mch | |
benchmarks.run_pgo.windows.x86.checked.mch | |
benchmarks.run_tiered.windows.x86.checked.mch | |
coreclr_tests.run.windows.x86.checked.mch | |
libraries.crossgen2.windows.x86.checked.mch | |
libraries.pmi.windows.x86.checked.mch | |
libraries_tests.run.windows.x86.Release.mch | |
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch | |
realworld.run.windows.x86.checked.mch |
MinOpts (${\color{green}-5.03\%}$ to ${\color{green}-3.05\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.windows.x86.checked.mch | |
benchmarks.run_pgo.windows.x86.checked.mch | |
benchmarks.run_tiered.windows.x86.checked.mch | |
coreclr_tests.run.windows.x86.checked.mch | |
libraries.crossgen2.windows.x86.checked.mch | |
libraries.pmi.windows.x86.checked.mch | |
libraries_tests.run.windows.x86.Release.mch | |
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch | |
realworld.run.windows.x86.checked.mch |
FullOpts (${\color{red}+0.01\%}$ to ${\color{red}+0.02\%}$ )
Collection | PDIFF |
---|---|
benchmarks.run.windows.x86.checked.mch | |
benchmarks.run_pgo.windows.x86.checked.mch | |
benchmarks.run_tiered.windows.x86.checked.mch | |
coreclr_tests.run.windows.x86.checked.mch | |
libraries.crossgen2.windows.x86.checked.mch | |
libraries.pmi.windows.x86.checked.mch | |
libraries_tests.run.windows.x86.Release.mch | |
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch | |
realworld.run.windows.x86.checked.mch |
Details here
@BruceForstall @jakobbotsch - can you please take another look? Thanks! |
src/coreclr/jit/lsra.cpp
Outdated
#ifdef DEBUG | ||
// Validate the current state just after we've freed the registers. This ensures that any pending | ||
// freed registers will have had their state updated to reflect the intervals they were holding. | ||
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible/straightforward to factor this validation into a method and use it from both allocateRegisters
and allocateRegistersMinimal
?
A related question -- what are the fundamental difference between the two, except the reduced set of heuristics used in the minimal version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've skimmed the code. I'm assuming allocateRegistersMinimal
was copied mostly from allocateRegisters
. I had a question on what the differences between them is, apart from the use of a reduced set of heuristics -- I'm assuming it wouldn't be straightforward to unify them to avoid code duplication? (For example, by the use of another template parameter)
The trade offs in CQ vs TP look just fine to me. We should definitely be taking this kind of tradeoff in MinOpts, and the TP improvements are amazing.
Pretty much. To easily spot the difference, I extracted out here:
Correct, that was the direction I started with, but I quickly noticed that it scatters the templated parameter everywhere and very hard to see what gets executed for MinOpts. In future, it will be good to focus on
Yes, that's the main change, and then removing the code for things that never happen for MinOpts, like we don't see |
Diff results for #96386Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,498,771 contexts (1,011,240 MinOpts, 1,487,531 FullOpts). MISSED contexts: 6,580 (0.26%) Overall (+147,632 bytes)
MinOpts (+147,632 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,340 contexts (977,766 MinOpts, 1,527,574 FullOpts). MISSED contexts: 6,922 (0.28%) Overall (-485,333 bytes)
MinOpts (-485,333 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,229,922 contexts (927,360 MinOpts, 1,302,562 FullOpts). MISSED contexts: 6,095 (0.27%) Overall (+150,252 bytes)
MinOpts (+150,252 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,308,445 contexts (929,692 MinOpts, 1,378,753 FullOpts). MISSED contexts: 6,353 (0.27%) Overall (+148,112 bytes)
MinOpts (+148,112 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,385 contexts (928,740 MinOpts, 1,437,645 FullOpts). MISSED contexts: 6,816 (0.29%) Overall (-62,154 bytes)
MinOpts (-62,154 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,230,528 contexts (825,130 MinOpts, 1,405,398 FullOpts). MISSED contexts: 77,529 (3.36%) Overall (-868,478 bytes)
MinOpts (-868,478 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,246,531 contexts (794,865 MinOpts, 1,451,666 FullOpts). MISSED contexts: base: 7,010 (0.30%), diff: 52,597 (2.29%) Overall (+190,955 bytes)
MinOpts (+190,955 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-5.32% to -0.04%)
MinOpts (-10.39% to -6.49%)
FullOpts (-0.07% to -0.03%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-2.61% to -0.05%)
MinOpts (-6.57% to -4.49%)
FullOpts (-0.08% to -0.04%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-4.23% to -0.05%)
MinOpts (-10.43% to -7.92%)
FullOpts (-0.07% to -0.03%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-4.15% to -0.04%)
MinOpts (-10.42% to -6.32%)
FullOpts (-0.07% to -0.03%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-2.54% to -0.08%)
MinOpts (-6.48% to -4.45%)
FullOpts (-0.12% to -0.07%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (-4.16% to -0.08%)
MinOpts (-11.57% to -9.31%)
FullOpts (-0.15% to -0.05%)
Throughput diffs for windows/x86 ran on windows/x86Overall (-1.69% to +0.02%)
MinOpts (-5.04% to -3.05%)
FullOpts (+0.01% to +0.02%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-5.82% to +0.04%)
MinOpts (-11.62% to -6.54%)
FullOpts (+0.02% to +0.06%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-2.72% to -0.15%)
MinOpts (-6.70% to -4.66%)
FullOpts (-0.22% to -0.12%)
Details here |
enregisterLocalVars=false
specific code in itREG_ORDER
andREG_NUM
heuristics whenenregisterLocalVars=false