-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Add simple late layout pass #107483
JIT: Add simple late layout pass #107483
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs show the usual amount of churn with layout changes, though seems to be a net PerfScore improvement. TP cost isn't too bad. |
I added a check to see that the RPO layout is enabled since we're still running the old layout in CI, and it probably doesn't make sense to mix them. |
I happen to know the
Any idea what's up with windows x86? Given how few allocatable registers there on x86 are it is a good stress test for LSRA inserted blocks. |
I took a look at the top regressions from
To this:
In other cases, the size increases come from the JIT moving new blocks in-line, breaking up existing fallthrough. For example, the layout in
To this:
In this example, we seem to be interleaving more paths, but the loop bodies are more compact (
I think you're right. I'm seeing this same pattern of block movement on other platforms, but we aren't introducing as many new blocks as we are on x86, hence the larger diffs in both directions. |
commit 7ae87de Author: Larry Ewing <lewing@microsoft.com> Date: Mon Sep 9 22:11:12 2024 -0500 [wasm] more cases when looking up unmanaged delegates (dotnet#107113) Make the association between the wasm_native_to_interp_ftndescs generation and the lookup from unmanaged more robust so that we don't see problems like dotnet#107212 where the same slot was being reused for multiple methods with different signatures. To do this we change the Key(s) we use and fix the string escaping it relies on, and attempt to lookup by token first. Next , we rewrite the C code generation to make it easier to read and modify and mitigate some potentially negative memory side effects of that we introduce a gratuitous custom text writer that understands the idea of concatenated strings and use that where possible when building the output. Next, we change the import code generation to use binary rather than linear search for both the module and symbol. And finally, we update the ICall table generation to use the extensions. part of dotnet#104391 and dotnet#107212 commit 1808129 Author: Elinor Fung <elfung@microsoft.com> Date: Mon Sep 9 20:03:34 2024 -0700 Remove FCThrowRes from AssemblyNative::IsDynamic (dotnet#107574) commit 5cb6a06 Author: Aman Khalid <amankhalid@microsoft.com> Date: Tue Sep 10 02:38:23 2024 +0000 JIT: Add simple late layout pass (dotnet#107483) commit c762b75 Author: Martin Costello <martin@martincostello.com> Date: Tue Sep 10 03:15:53 2024 +0100 Add [DebuggerDisplay] to CancellationTokenSource (dotnet#105764) * Add [DebuggerDisplay] to CancellationTokenSource Add `[DebuggerDisplay]` to `CancellationTokenSource` to show whether cancelled or disposed. Relates to dotnet#105698. * Update src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs --------- Co-authored-by: Stephen Toub <stoub@microsoft.com> commit b77b71e Author: Katelyn Gadd <kg@luminance.org> Date: Mon Sep 9 17:40:14 2024 -0700 [wasm] Clean up some FIXMEs in the jiterpreter (dotnet#107562) * Cleanup some fixmes in the jiterpreter * Flow through size of the var in MINT_LDLOCA_S so jiterpreter can do accurate invalidation commit c21d90e Author: Pavel Savara <pavel.savara@gmail.com> Date: Tue Sep 10 02:40:00 2024 +0200 [WASI] improve single-threaded threadpool (dotnet#107395) * fix dotnet#104803 * PollWasiEventLoopUntilResolvedVoid * more * wip * CPU-bound work to do * fix exit * Update src/mono/sample/wasi/http-p2/Program.cs Co-authored-by: Larry Ewing <lewing@microsoft.com> * feedback --------- Co-authored-by: Larry Ewing <lewing@microsoft.com> commit 61de5df Author: Elinor Fung <elfung@microsoft.com> Date: Mon Sep 9 17:14:07 2024 -0700 Make DAC and ProfToEEInterfaceImpl stop using BaseDomain (dotnet#107570) `BaseDomain` should no longer be needed now that we only have one `AppDomain` and the `SystemDomain` can be treated as separate. This makes the DAC and ProfToEEInterfaceImpl use `AppDomain` directly and check against `SystemDomain::System()` to determine if a pointer is the system domain. commit 76dbb27 Author: Stephen Toub <stoub@microsoft.com> Date: Mon Sep 9 19:59:54 2024 -0400 Use SearchValues in Uri.CheckForUnicodeOrEscapedUnreserved (dotnet#107357) commit 149d4bb Author: Miha Zupan <mihazupan.zupan1@gmail.com> Date: Mon Sep 9 16:54:00 2024 -0700 Extend the list of recognized SearchValues<char> field names in Regex (dotnet#107402) commit e591fbf Author: Kunal Pathak <Kunal.Pathak@microsoft.com> Date: Mon Sep 9 16:38:42 2024 -0700 Arm: Fix the base register used for restoring register from stack (dotnet#107564) * Use correct baseReg for vstr, similar to vldr * add test cases * Mark internal test methods private commit 51c350c Author: Elinor Fung <elfung@microsoft.com> Date: Mon Sep 9 16:35:02 2024 -0700 Make missing framework error message list other architectures that were found (dotnet#107156) When erroring on a missing framework, check if there are versions of the framework for other architectures and list them for the user. commit 2ed43b6 Author: Alan Hayward <a74nh@users.noreply.github.com> Date: Mon Sep 9 23:53:45 2024 +0100 ARM64-SVE: Allow op inside conditionalselect to be non HWintrinsic (dotnet#107180) * ARM64-SVE: Allow op inside conditionselect to be non HWintrinsic * Add Sve.IsSupported check to test commit ac4b7c6 Author: Kunal Pathak <Kunal.Pathak@microsoft.com> Date: Mon Sep 9 15:52:00 2024 -0700 Arm: Consider the fact that targetReg can be second half during resolution (dotnet#107493) * Arm: Consider the fact that targetReg can be second half during resolution * add test case * Make sure we only handle float registers * fix test case's public methods commit 18eedbe Author: Aaron Robinson <arobins@microsoft.com> Date: Mon Sep 9 14:02:51 2024 -0700 Convert Thread FCalls to QCalls (dotnet#107495) * Convert Thread.IsAlive property * Convert Thread.GetCurrentThread() * Convert Thread.ThreadState property * Convert Thread.Initialize() commit d45ccfd Author: Michal Strehovský <MichalStrehovsky@users.noreply.github.com> Date: Tue Sep 10 05:28:57 2024 +0900 Fix reflection-calling `Set` method on arrays (dotnet#107529) The test added in dotnet#106787 found an issue in the implementation of reflection calls to array `Set` methods. We used to throw the wrong exception type. There were probably other corner case bugs (like what exception is thrown when both element type is wrong and index is out of range and when/how value coercion should happen). This should fix that. commit c534080 Author: Tom McDonald <tommcdon@microsoft.com> Date: Mon Sep 9 15:21:41 2024 -0400 Avoid using OpenThread for out of process SetThreadContext debugging (dotnet#107511) * Avoid using OpenThread in out of process thread context scenarios * Add comments * Update src/coreclr/debug/di/process.cpp Co-authored-by: mikelle-rogers <45022607+mikelle-rogers@users.noreply.github.com> * Update src/coreclr/debug/di/process.cpp Co-authored-by: mikelle-rogers <45022607+mikelle-rogers@users.noreply.github.com> * Update src/coreclr/debug/di/process.cpp Co-authored-by: Noah Falk <noahfalk@users.noreply.github.com> --------- Co-authored-by: mikelle-rogers <45022607+mikelle-rogers@users.noreply.github.com> Co-authored-by: Noah Falk <noahfalk@users.noreply.github.com> commit d2c7db0 Author: Tanner Gooding <tagoo@outlook.com> Date: Mon Sep 9 11:06:45 2024 -0700 Disable TensorExtensionsTwoSpanInFloatOut due to dotnet#107254 (dotnet#107555) commit b7b91cb Author: Aaron Robinson <arobins@microsoft.com> Date: Mon Sep 9 09:08:31 2024 -0700 Convert some handle APIs to QCalls (dotnet#107513) Convert RuntimeTypeHandle.GetAssembly() Convert RuntimeTypeHandle.GetModule() Convert RuntimeAssembly.GetManifestModule() commit 600f6bd Author: David Wrighton <davidwr@microsoft.com> Date: Mon Sep 9 09:04:51 2024 -0700 Fix thread static cleanup paths (dotnet#107438) * Fix thread static cleanup paths - Do not destroy GC handles while holding the spin lock - Free the pLoaderHandle array when the thread is terminated * When using a ThreadStatics stress test on collectible assemblies, a few more issues were found - Fix issue where the LoaderAllocator's SegmentedHandleIndex wasn't being freed - Fix issue where the logic to re-use TLSIndex values wasn't working properly commit fe7a52d Author: Linus Hamlin <78953007+lilinus@users.noreply.github.com> Date: Mon Sep 9 17:57:31 2024 +0200 Remove ActiveIssue for solved issues in Vector tests (dotnet#107127) commit 0c33c6f Author: Elinor Fung <elfung@microsoft.com> Date: Mon Sep 9 08:21:16 2024 -0700 Fix module being set as tenured too early (dotnet#107489) commit 2fb3629 Author: Elinor Fung <elfung@microsoft.com> Date: Mon Sep 9 08:03:27 2024 -0700 Remove `BaseDomain` use in `LoaderAllocator` and event tracing helpers (dotnet#107481) - Remove `BaseDomain` member on `LoaderAllocator` - Add asserts in functions using `AppDomain` that the loader allocator is collectible and the type is `LAT_Assembly` (so `AssemblyLoaderAllocator` which always had `AppDomain`) - Remove unnecessary `BaseDomain`/`AppDomain` parameters from event tracing helpers - They were always being called with the current app domain commit 62133e0 Author: dotnet-maestro[bot] <42748379+dotnet-maestro[bot]@users.noreply.github.com> Date: Mon Sep 9 16:56:30 2024 +0200 [main] Update dependencies from dotnet/xharness (dotnet#107291) * Update dependencies from https://github.com/dotnet/xharness build 20240902.2 Microsoft.DotNet.XHarness.CLI , Microsoft.DotNet.XHarness.TestRunners.Common , Microsoft.DotNet.XHarness.TestRunners.Xunit From Version 9.0.0-prerelease.24452.1 -> To Version 9.0.0-prerelease.24452.2 * Update dependencies from https://github.com/dotnet/xharness build 20240903.1 Microsoft.DotNet.XHarness.CLI , Microsoft.DotNet.XHarness.TestRunners.Common , Microsoft.DotNet.XHarness.TestRunners.Xunit From Version 9.0.0-prerelease.24452.2 -> To Version 9.0.0-prerelease.24453.1 * Update dependencies from https://github.com/dotnet/xharness build 20240904.2 Microsoft.DotNet.XHarness.CLI , Microsoft.DotNet.XHarness.TestRunners.Common , Microsoft.DotNet.XHarness.TestRunners.Xunit From Version 9.0.0-prerelease.24453.1 -> To Version 10.0.0-prerelease.24454.2 * Update dependencies from https://github.com/dotnet/xharness build 20240906.1 Microsoft.DotNet.XHarness.CLI , Microsoft.DotNet.XHarness.TestRunners.Common , Microsoft.DotNet.XHarness.TestRunners.Xunit From Version 10.0.0-prerelease.24454.2 -> To Version 10.0.0-prerelease.24456.1 * Update dependencies from https://github.com/dotnet/xharness build 20240909.1 Microsoft.DotNet.XHarness.CLI , Microsoft.DotNet.XHarness.TestRunners.Common , Microsoft.DotNet.XHarness.TestRunners.Xunit From Version 10.0.0-prerelease.24456.1 -> To Version 10.0.0-prerelease.24459.1 --------- Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com> Co-authored-by: Ilona Tomkowicz <32700855+ilonatommy@users.noreply.github.com> commit 4c0973e Author: Jeremi Kurdek <59935235+jkurdek@users.noreply.github.com> Date: Mon Sep 9 17:53:45 2024 +0300 Fix passing assemblies using relative path (dotnet#107536) commit 67e5768 Author: Katelyn Gadd <kg@luminance.org> Date: Mon Sep 9 06:19:10 2024 -0700 [wasm] Implement MINT_NEWARR in jiterpreter (dotnet#107430) commit 176754d Author: Matous Kozak <55735845+matouskozak@users.noreply.github.com> Date: Mon Sep 9 13:35:01 2024 +0200 [mono][infra] decrease CPU count for fullAOT CI build (dotnet#107531) commit 49bf719 Author: Pavel Savara <pavel.savara@gmail.com> Date: Mon Sep 9 12:30:47 2024 +0200 [browser][MT] fix feature detection on webworker (dotnet#107452) commit aa418fc Author: Preeyan Parmar <4997904+preeyan@users.noreply.github.com> Date: Sun Sep 8 22:44:27 2024 +0100 Remove unused declarations from clsload.hpp (dotnet#107509) * Remove unused declarations from clsload.hpp * also remove unused ClassLoader::TryEnsureLoaded commit 7d68c7f Author: Steve <hez2010@outlook.com> Date: Mon Sep 9 06:36:18 2024 +0900 Implement getClassAssemblyName (dotnet#106959) * Add getClassAssemblyName * Handle nullptrs * Remove CORINFO_ASSEMBLY_HANDLE * Address feedbacks Co-authored-by: Jan Kotas <jkotas@microsoft.com> commit 39c84a3 Author: Jan Kotas <jkotas@microsoft.com> Date: Sun Sep 8 11:24:13 2024 -0700 Fix corner-case accounting bug in new codeheap allocation (dotnet#107492) The size of internal CodeHeap structures was not included in codeheap memory reservation. It caused false OOM exception to be thrown when JITed method code size was near 64kB multiple commit 10f6c4c Author: Aaron Robinson <arobins@microsoft.com> Date: Sun Sep 8 11:02:41 2024 -0700 Convert WaitHandle FCalls to QCalls (dotnet#107488) commit b523ec5 Author: Aman Khalid <amankhalid@microsoft.com> Date: Sun Sep 8 14:42:04 2024 +0000 JIT: Simplify block insertion logic during loop canonicalization (dotnet#107371)
@AndyAyersMS @LoopedBard3 thanks for organizing the perf diffs. If LSRA introduces new blocks, then re-running layout should be the right decision, assuming the profile is maintained correctly. I suspect some of these regressions may be from the JCC erratum mitigation now triggering due to layout churn. I'm curious to see if the arm64 diffs reflect this by being more conservative in both directions. |
LSRA may introduce new blocks after we've run layout (see #107419). Ideally, we'd do
optOptimizeLayout
andfgDetermineFirstColdBlock
after LSRA so that we only need to reorder once, but many of the transformations infgUpdateFlowGraph
aren't designed to be run after lowering, and will hit asserts. Once some of the later flowgraph optimizations likeoptSwitchRecognition
are decoupled from lexical block order, we can look into only runningfgUpdateFlowGraph
before lowering/LSRA, and then run only reordering afterwards. For now, I think we can get away with just rerunning RPO layout; hopefully this is relatively cheap.