Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate ways to optimize Span.Fill and Span.Clear for small buffers #24806

Closed
ahsonkhan opened this issue Jan 26, 2018 · 7 comments · Fixed by #51365
Closed

Investigate ways to optimize Span.Fill and Span.Clear for small buffers #24806

ahsonkhan opened this issue Jan 26, 2018 · 7 comments · Fixed by #51365
Labels
Milestone

Comments

@ahsonkhan
Copy link
Member

We need to reduce the overhead of calling Span and ReadOnlySpan Fill and improve its performance for small buffers so that it can be used in common scenarios.

See additional context:
dotnet/corefx#26598 (comment)
dotnet/corefx#26289 (comment)

Edit: As part of this change, investigate and remove the TODOs:
https://github.com/dotnet/corefx/blob/79d708b2faf8a75089b1873fbb101b0a957c1fbd/src/System.Memory/src/System/SpanHelpers.Clear.cs#L69
https://github.com/dotnet/corefx/blob/79d708b2faf8a75089b1873fbb101b0a957c1fbd/src/System.Memory/src/System/SpanHelpers.Clear.cs#L105

https://github.com/dotnet/coreclr/blob/22f1bc00d018a49f9550ee3b564f5f7737960b0d/src/mscorlib/shared/System/Span.NonGeneric.cs#L707

cc @dotnet/corefxlab-contrib, @jkotas, @stephentoub

@GrabYourPitchforks
Copy link
Member

We should look at Span<T>.Clear() at the same time.

@GrabYourPitchforks GrabYourPitchforks self-assigned this Jan 29, 2018
@ahsonkhan ahsonkhan changed the title Investigate ways to optimize Span.Fill for small buffers Investigate ways to optimize Span.Fill and Span.Clear for small buffers Jan 29, 2018
@GrabYourPitchforks
Copy link
Member

I know this issue specifically deals with small buffers, but do we know the calling patterns for how these APIs are likely to be used? For example, is it likely that Span<T>.Fill(...) will ever be called when T is a reference type (not simply a contains-references type)? Is the common use case that T will be a fully blittable type?

@jkotas
Copy link
Member

jkotas commented Jan 30, 2018

I think it going to be most frequently used with spans of primitive types (byte, char, int), but it needs to work for all types.

@GrabYourPitchforks
Copy link
Member

I investigated Span.Clear for small buffers of blittable T. Was able to get significant performance improvements for buffer sizes <= 16 bytes (approx.), but at the expense of making Clear slower for larger buffers. Will keep investigating, but the regressions for larger buffers is steering me away from my current thinking.

One alternative is to expose a specialty API SmallClear which is optimized for smaller buffers and which defers to Clear for larger buffers. The API would only be for callers who really need this in perf-critical code paths and who already know that their spans are small enough to benefit from this. We'd need to mark this non-browseable or hide it behind a package so that we don't needlessly confuse the 99.9% developer audience.

@jkotas
Copy link
Member

jkotas commented Mar 7, 2018

Where is the regression for larger buffers coming from?

@jkotas
Copy link
Member

jkotas commented Mar 7, 2018

expense of making Clear slower for larger buffers

There is always step or bend in the performance curve where implementation switches to a different strategy. It should get amortized for larger buffers pretty quickly.

@GrabYourPitchforks
Copy link
Member

Where is the regression for larger buffers coming from?

@jkotas Currently Span.Clear and Span.ClearWithoutReferences get inlined into their caller, which means that the caller (user code) directly invokes JIT_Memset or MemoryNative::Clear, depending on the size of the span.

When I was experimenting with optimizing for small buffers the code looked like the following.

private static void ClearWithoutReferences(/* ... */) {
    if (length > 4096) { goto PInvoke; }
    else if (length > 16) { goto JIT_Memset; }
    else {
        /* optimize for small cases */
    }
}

The modified ClearWithoutReferences was large enough where it was no longer inlined into its caller. So now not only is there an additional if check, but the call stack goes caller -> ClearWithoutReferences -> JIT_Memset. This was showing up in the microbenchmark and regressing performance by around 30% at the boundary.

I should note that the microbenchmark was using a fully primed branch predictor.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@GrabYourPitchforks GrabYourPitchforks removed their assignment Feb 12, 2020
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 16, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 17, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants