-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate additional cross platform hardware intrinsics #61649
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue Detailsnull
|
case NI_VectorT128_ConvertToUInt32: | ||
{ | ||
assert(simdBaseType == TYP_FLOAT); | ||
return gtNewSimdHWIntrinsicNode(retType, op1, NI_AdvSimd_ConvertToUInt32RoundToZero, | ||
simdBaseJitType, simdSize, /* isSimdAsHWIntrinsic */ true); | ||
} | ||
|
||
case NI_VectorT128_ConvertToUInt64: | ||
{ | ||
assert(simdBaseType == TYP_DOUBLE); | ||
return gtNewSimdHWIntrinsicNode(retType, op1, NI_AdvSimd_Arm64_ConvertToUInt64RoundToZero, | ||
simdBaseJitType, simdSize, /* isSimdAsHWIntrinsic */ true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth calling out these weren't accelerated at all on ARM64 before; and now they are and via a single instruction.
SIMD_INTRINSIC("ConvertToInt32", false, ConvertToInt32, "ConvertToInt32", TYP_STRUCT, 1, {TYP_STRUCT, TYP_UNDEF, TYP_UNDEF}, {TYP_FLOAT, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF}) | ||
// Convert double to long | ||
SIMD_INTRINSIC("ConvertToInt64", false, ConvertToInt64, "ConvertToInt64", TYP_STRUCT, 1, {TYP_STRUCT, TYP_UNDEF, TYP_UNDEF}, {TYP_DOUBLE, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF, TYP_UNDEF}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're getting near the point that the rest of this "legacy" SIMD intrinsic support can be removed entirely as nearly everything has moved onto SIMDAsHWIntrinsic
now.
if (Sse2.IsSupported) | ||
{ | ||
// Based on __m256d int64_to_double_fast_precise(const __m256i v) | ||
// from https://stackoverflow.com/a/41223013/12860347. CC BY-SA 4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three bits of code here are new algorithms and are significantly faster than the previous.
They are also correct where-as various inputs on the old algorithm would actually return a different result as compared to the scalar versions.
Only diffs for
|
Test failures are because the current RyuJIT algorithm for This can be repro'd using
|
ecbf81c
to
073a445
Compare
073a445
to
e0b3289
Compare
…nstant and return true where supported
…ed versions on x86/x64
…han long.MaxValue
e0b3289
to
8202352
Compare
Rebased onto dotnet/main; this is still pending review and would be very beneficial to get merged so the PRs adding new SIMD logic can trivially support ARM from the start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some questions, but, other than that, looks good.
src/libraries/System.Private.CoreLib/src/ILLink/ILLink.Substitutions.NoArmIntrinsics.xml
Show resolved
Hide resolved
Improvement on win-arm64: dotnet/perf-autofiling-issues#2974 |
This continues the work on #49397 which started with #53450 and #60094
In particular, this moves
to be implemented using the general SIMDAsHWIntrinsic logic and adding then having the new APIs in Vector64/128/256 use the same shared entry points.
There will likely be one or two more PRs after this one covering:
Sum
or approved but NYI such asShiftLeft
/ShiftRight
Once this is in, the library side work to switch over to using the xplat APIs can also happen.