-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Span<T>.Fill implementation #51365
Conversation
Tagging subscribers to this area: @GrabYourPitchforks, @carlossanlop Issue DetailsThis optimizes
The central SIMD loop doesn't attempt to perform any type of alignment optimization. We can consider adding this in the future if benchmarking shows this to be a worthwhile addition. I also didn't investigate any other call sites throughout the runtime + libraries to see if they should be migrated from whatever existing code they might have to this Benchmark results: byte(Note: The internal memset routine uses nontemporal stores, which could explain the blazing fast runtime.)
char
int
long
float
double
decimal(This exercises non-SIMD code paths.)
string(This exercises non-SIMD code paths.)
Resolves #24806. /cc @carlossanlop @jozkee
|
System.Linq.Expressions.Tests failures are known issue #51346. |
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
A note on the |
@SingleAccretion good observation. I stepped through the memset implementation (used by initblk) and noticed that it performed alignment fixup before entering the main loop. So I wonder if this comment is stale? |
Very possible. My comment was the result of me measuring this a few months ago, and seeing the expected 2x regression (Ivy Bridge). If it is no longer accurate, that's great! |
Notable changes in latest commit:
No major changes to the overall design behind the logic. |
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
Notable change in latest commit: Mono interpreter's implementation of initblk performs a null check on the incoming address before delegating to memset. Restoring the "don't call initblk for empty spans" logic seemed like a more surgical solution than trying to change the interpreter's initblk logic, which might have unintentional ripple effects. I'm only restoring this logic for mono, as coreclr's initblk logic handles the zero-length case just fine, and a zero-length input should be very rare. |
Thanks all for the feedback! If there are any additional comments we can send a cleanup PR. |
Unsafe.InitBlockUnaligned(ref Unsafe.As<T, byte>(ref _pointer.Value), Unsafe.As<T, byte>(ref value), (uint)_length); | ||
#if MONO | ||
// Mono runtime's implementation of initblk performs a null check on the address. | ||
// We'll perform a length check here to avoid passing a null address in the empty span case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a compatibility thing? Should we change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment at #51365 (comment). I think I know where to make the change, but I was concerned that it would have broader implications and that I'd risk inadvertently breaking some other component. Putting the check here seemed the safest option. (This code had a check originally, so it's not a behavioral change from before this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File an issue? Making the runtimes behave the same would be best if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already filed - #51411 😄.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the same bug?
This optimizes
Span<T>.Fill
via three primary mechanisms:The central SIMD loop doesn't attempt to perform any type of alignment optimization. We can consider adding this in the future if benchmarking shows this to be a worthwhile addition.
I also didn't investigate any other call sites throughout the runtime + libraries to see if they should be migrated from whatever existing code they might have to this
Span<T>.Fill
implementation. That can come as a future commit to this PR or as a future PR.Benchmark code
Benchmark results:
byte
(Note: The internal memset routine uses nontemporal stores, which could explain the blazing fast runtime.)
char
int
long
float
double
decimal
(This exercises non-SIMD code paths.)
string
(This exercises non-SIMD code paths.)
Resolves #24806.
Resolves #7049.
/cc @carlossanlop @jozkee