Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat System.Runtime.CompilerServices.Unsafe as intrinsic #68739

Merged
merged 30 commits into from
May 12, 2022

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented May 1, 2022

This is a draft to get initial feedback and confirm that tests pass, remaining APIs (currently just return nullptr) should also be handled if this looks good. That could happen as part of this PR or as a separate PR/work item.

Most of these methods are extremely trivial and currently cause the JIT to do more work overall to inline them and optimize away the various boundaries that otherwise exist. Treating them as intrinsic allows them to be "inlined directly" without the normal overhead of the inliner, which should increase code quality and overall JIT throughput.

@ghost ghost assigned tannergooding May 1, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 1, 2022
@ghost
Copy link

ghost commented May 1, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This is a draft to get initial feedback and confirm that tests pass.

Most of these methods are extremely trivial and currently cause the JIT to do more work overall to inline them and optimize away the various boundaries that otherwise exist. Treating them as intrinsic allows them to be "inlined directly" without the normal overhead of the inliner, which should increase code quality and overall JIT throughput.

Author: tannergooding
Assignees: tannergooding
Labels:

area-CodeGen-coreclr

Milestone: -

src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
@tannergooding
Copy link
Member Author

tannergooding commented May 1, 2022

Resolved feedback and added handling for SkipInit, Subtract, and SubtractByteOffset which were also "easy" to handle. The remaining unhandled ones are:

  • NI_SRCS_UNSAFE_Copy
  • NI_SRCS_UNSAFE_CopyBlock
  • NI_SRCS_UNSAFE_CopyBlockUnaligned
  • NI_SRCS_UNSAFE_InitBlock
  • NI_SRCS_UNSAFE_InitBlockUnaligned
  • NI_SRCS_UNSAFE_Read
  • NI_SRCS_UNSAFE_ReadUnaligned
  • NI_SRCS_UNSAFE_Unbox
  • NI_SRCS_UNSAFE_Write
  • NI_SRCS_UNSAFE_WriteUnaligned

The SuperPMI diffs look promising. Throughput has minor wins across the board:

Collection Base # instructions Diff # instructions PDIFF
coreclr_tests.pmi.Linux.arm64.checked.mch 390,711,391,340 387,258,405,254 -0.88%
libraries.crossgen2.Linux.arm64.checked.mch 102,091,516,884 101,971,562,036 -0.12%
libraries.pmi.Linux.arm64.checked.mch 218,744,363,716 218,294,238,960 -0.21%
libraries_tests.pmi.Linux.arm64.checked.mch 505,701,324,806 504,964,706,203 -0.15%
--- --- --- ---
benchmarks.run.Linux.x64.checked.mch 56,226,856,371 55,122,836,457 -1.96%
coreclr_tests.pmi.Linux.x64.checked.mch 325,400,500,401 322,203,176,985 -0.98%
libraries.crossgen2.Linux.x64.checked.mch 54,780,062,904 54,665,293,674 -0.21%
libraries.pmi.Linux.x64.checked.mch 200,134,462,080 199,679,252,664 -0.23%
libraries_tests.pmi.Linux.x64.checked.mch 456,285,132,862 455,499,094,398 -0.17%
--- --- --- ---
benchmarks.run.OSX.arm64.checked.mch 33,225,430,371 33,178,219,315 -0.14%
coreclr_tests.pmi.OSX.arm64.checked.mch 388,882,063,761 385,480,961,069 -0.87%
libraries.crossgen2.OSX.arm64.checked.mch 134,748,643,681 134,628,797,898 -0.09%
libraries.pmi.OSX.arm64.checked.mch 214,764,256,836 214,343,583,121 -0.20%
libraries_tests.pmi.OSX.arm64.checked.mch 501,853,839,272 501,123,210,562 -0.15%
--- --- --- ---
benchmarks.run.windows.arm64.checked.mch 52,638,095,633 51,533,561,274 -2.10%
coreclr_tests.pmi.windows.arm64.checked.mch 390,028,521,007 386,622,605,944 -0.87%
libraries.crossgen2.windows.arm64.checked.mch 140,609,834,764 140,488,907,461 -0.09%
libraries.pmi.windows.arm64.checked.mch 233,215,883,839 232,752,057,433 -0.20%
libraries_tests.pmi.windows.arm64.checked.mch 539,401,187,163 538,593,273,957 -0.15%
--- --- --- ---
aspnet.run.windows.x64.checked.mch 55,154,779,661 54,844,118,570 -0.56%
benchmarks.run.windows.x64.checked.mch 50,463,801,456 49,397,203,131 -2.11%
coreclr_tests.pmi.windows.x64.checked.mch 327,800,136,063 324,635,766,586 -0.97%
libraries.crossgen2.windows.x64.checked.mch 126,326,426,875 126,197,264,432 -0.10%
libraries.pmi.windows.x64.checked.mch 215,148,959,735 214,678,797,630 -0.22%
libraries_tests.pmi.windows.x64.checked.mch 488,523,763,272 487,654,417,773 -0.18%

Code Size diffs looks to be generally an improvement, but there are both small wins and small regressions. Will need to spot check some of them

@tannergooding
Copy link
Member Author

Just spot checking, several of the regressions are things like "now the register allocator decides to use a different register, which causes some downstream changes"
For example Decimal.GetBytes uses one less stack slot in favor of using r14. This causes an extra push r14 in the prologue and pop r14 in the epilogue, increasing code size

Several of the cases are reduced local count and others are because we end up with other opts triggering that didn't before. The biggest difference that looks to cause most of these is that we have cases where a byref becomes live "sooner".

I'm probably going to need to get a proper jit dump to get a more in depth look here, but will probably get to it later in the evening.

@tannergooding
Copy link
Member Author

Reverted NI_SRCS_UNSAFE_Subtract and NI_SRCS_UNSAFE_SubtractByteOffset as it was causing an AV. It needs further investigation.

{
assert(sig->sigInst.methInstCount == 1);

// ldarg.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the comments describing these :)

@tannergooding tannergooding marked this pull request as ready for review May 2, 2022 13:32
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
@tannergooding
Copy link
Member Author

#68851 resolves the AV that was occuring and removes the need for the workaround.

This is ready for review in either case.

@jkotas
Copy link
Member

jkotas commented May 5, 2022

Try to workaround the JIT issue by spilling op1/op2 for Unsafe.Add

Revert these workarounds and wait for the proper fix to get merged?

@tannergooding
Copy link
Member Author

Revert these workarounds and wait for the proper fix to get merged?

I was planning on doing that when the fix was merged to avoid unnecessary CI churn

@jakobbotsch
Copy link
Member

#68851 was just merged so the workaround should no longer be necessary.

@tannergooding
Copy link
Member Author

Merged with dotnet/main and reverted the unnecessary spilling. I kept the extracted impSpillSideEffect helper since it simplifies the necessary spill in Unsafe.ByteOffset and will ideally help with #54956 as well

@jkotas
Copy link
Member

jkotas commented May 5, 2022

Revert 982660e too? Or is there a problem with it?

@tannergooding
Copy link
Member Author

Shouldn't be, just forgot to include it.

@tannergooding
Copy link
Member Author

runtime-dev-innerloop (Build OSX x64 release Runtime_Debug)

Is unrelated. It's complaining about an ApiCompat issue with System.IO.FileStream. CC. @ericstj

@tannergooding
Copy link
Member Author

tannergooding commented May 6, 2022

Updated #68739 (comment) with the throughput numbers. Still all positive there (anywhere from -0.1% to -2.0% on the # instructions).

Will log an issue tracking the remaining Unsafe APIs that aren't intrinsic yet.

For the asm-diffs there are wins and regressions with the overall average being an improvement. The wins vs regressions are the same on all platforms (with actual percentages differing by platform) with the regressions being in libraries_tests.pmi and benchmarks.run.

Spot checking a few, they are generally in cases where we now make different inlining decisions. For example, in System.Diagnostics.FileVersionInfo:GetVersionInfoForCodePage we go from:

- 268 single block inlinees; 165 inlinees without PGO data
+ 137 single block inlinees; 223 inlinees without PGO data

We end up doing more inlining for things like Debug.Assert which causes the overall method size to increase. This is a "natural" fallout from the profitability of the Unsafe APIs being higher, better IR being created, and more optimization opportunities kicking in.

@tannergooding
Copy link
Member Author

@dotnet/jit-contrib for review.

* i is the stack entry which will be checked and spilled.
*/

inline void Compiler::impSpillSideEffect(bool spillGlobEffects, unsigned i DEBUGARG(const char* reason))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this was just a move of code to its own function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that way it could be reused more easily.

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The changes are pretty straightforward and the throughput benefits are great.

@tannergooding
Copy link
Member Author

Thanks! Logged #69220 to track the remaining methods.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jun 12, 2022

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants