-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize library code using arm64 intrinsics #33308
Comments
|
@jkotas It's my understanding that ASP.NET is currently not running benchmarks on arm64. Hopefully that will change, then we'd be able to see benefits from optimizing these. |
Also, BitArray is chosen to be first to implement because it's simple, has benchmarks defined, and can easily be used as a proof-of-concept of the basics of the intrinsics implementation -- not because it's the most important class. |
CC. @CarolEidt, @echesakovMSFT, @GrabYourPitchforks, @TamarChristinaArm |
Would love to help this! Though, I remember majority of the ARM intrinsics being unavailable last time I looked up (only available via experimental NuGet package). Has this been addressed? I have seen a few API review sessions where ARM intrinsics are discussed but I don't know if they are implemented. |
One concern is that I can't quite run tests and benchmarks locally since I do not own an ARM machine (well, I do - but it's an RPi 3B+). I know those exist as part of CI/CD but are the perf numbers available to non-MSFT people? |
The implementation is in progress, so there will be feature gaps.
That's a bigger problem. The CI will do feature testing on ARM64, but there is currently no perf testing available to non-MSFT (and minimal available internal). You probably don't want to depend on the CI to do all functional testing for you. I think it's possible to use the RPi for this if you install, say, Ubuntu 64-bit. |
I have successfully run perf tests on a RPi 4 with Ubuntu 64-bit; I assume it would also work on the RPi 3B+. |
|
I think we also could add |
For However, ARM/ARM64 do support |
I attempted to use Arm64 intrinsics to optimize |
Also |
At this time, we don't believe there will be enough time (or resources) to implement the following optimizations for .NET 5: System.Numerics.Matrix4x4 #33565 As @jkotas writes above, this will leave some potential web workload performance improvements on the table. cc @tannergooding @danmosemsft |
There is a meeting Monday morning for me to onboard 3-4 others with porting these remaining types. #33565 is one that I'll be using to help walk them through the process of porting the other code. What is the current deadline for getting these changes in by? CC. @jeffhandley |
@tannergooding That's great to hear. The deadline is whatever your team chooses as the deadline, for how you manage 5.0 work. The CLR CodeGen team doesn't expect to have enough time/resources to these ourselves for 5.0 (we chose to do lots of this libraries work initially, to validate the intrinsics work and seed the optimizations), but certainly other teams/people can do them instead. |
The platform teams (us) are aiming to be feature complete by Preview 8 snap which is July 15 (internal link) |
@BruceForstall - we'll keep you updated on our planning and progress after we kick off the work next Monday. |
@tannergooding what is our test strategy for all these -- presumably we're just relying on our regular test bed and having enough hardware variation. In particular, how do we get coverage on the software fallback paths? In the past, we've said that the ARM machines cover this path for us. |
@danmosemsft We have two AzDO pipelines for stressing the ISAs, which includes disabling hardware intrinsics: runtime-coreclr jitstress-isas-arm: https://dev.azure.com/dnceng/public/_build?definitionId=665 These are defined using the following modes (in eng\pipelines\common\templates\runtimes
@tannergooding can comment on whether that is still sufficient for x86/x64, arm, as well as scenarios like R2R. Note that ARM32 doesn't support hardware intrinsics, so requires the fallback path. |
@jeffhandley is work planned for 5.0 complete? If so please let's close this now. |
@danmosemsft - There are still some APIs under |
@danmosemsft I'm aggregating the status of these and will make a decision today whether the remaining items should be moved to 6.0. |
|
Here's the list of intrinsics efforts not merged into
I will take the following actions:
|
This all looks good to me. Thanks @jeffhandley for summarizing and closing out the 5.0 release. And thanks everyone for the great work with all these arm64 improvements! |
Agreed. And I think we can please close this then.
+100 ! and appreciation for @BruceForstall for establishing clarity originally by creating this list. |
The following classes/functions in the libraries have Intel x86/x64 intrinsics usage. These are where
_ISA_.IsSupported()
is called. This information was collected manually and might not be complete. Some of these function names represent many overloads. There are some vectorized helper methods not shown here -- where a function callsIsSupported
and then calls a specific helper function to do the actual work, such as for SSE2 or AVX2 specifically. There are other cases whereVector<T>
is used, but arm64 already supports that (it should be verified that the arm64Vector<T>
code is complete and performant).When each of these has added an arm64-specific intrinsics optimization, it should be "checked off".
The sections below are ordered in the presumed priority order that they should be implemented in. (There is no assumed priority order for the individual functions in each section.)
It is expected that
System.Collections.BitArray
,System.Numerics
, andSystem.SpanHelpers
will be "arm64 intrinsi-fied" for .NET 5. If possible,System.Buffers
andSystem.Text
will as well, but that is not considered required.System.Collections.BitArray #33309
System.Runtime.Intrinsics #33496
Vector64
Vector128
Vector256
System.Numerics
System.Numerics.BitOperations #33495
System.Numerics.Matrix4x4 #33565
System.SpanHelpers #33707
[ ] System.SpanHelpers.SequenceCompareTo(byte)(SIMD vector implementation is fast enough)[ ] System.SpanHelpers.SequenceEqual(byte)(SIMD vector implementation is fast enough)[ ] System.SpanHelpers.LocateFirstFoundByte()(Only used by SIMD version ofIndexOf
andIndexOfAny
which are already optimized by ARM64 intrinsics)System.Buffers #35033
(Not completed in 5.0.0; moved to 6.0.0)
System.Text
System.Text.ASCIIUtility #35034
(Not completed in 5.0.0; #41292 contains the items moved to 6.0.0)
System.Text.Unicode #35035
System.Text.Encodings.Web #35036
The text was updated successfully, but these errors were encountered: