Investigate ways to optimize Span.Fill and Span.Clear for small buffers #24806

ahsonkhan · 2018-01-26T19:47:22Z

We need to reduce the overhead of calling Span and ReadOnlySpan Fill and improve its performance for small buffers so that it can be used in common scenarios.

See additional context:
dotnet/corefx#26598 (comment)
dotnet/corefx#26289 (comment)

Edit: As part of this change, investigate and remove the TODOs:
https://github.com/dotnet/corefx/blob/79d708b2faf8a75089b1873fbb101b0a957c1fbd/src/System.Memory/src/System/SpanHelpers.Clear.cs#L69
https://github.com/dotnet/corefx/blob/79d708b2faf8a75089b1873fbb101b0a957c1fbd/src/System.Memory/src/System/SpanHelpers.Clear.cs#L105

https://github.com/dotnet/coreclr/blob/22f1bc00d018a49f9550ee3b564f5f7737960b0d/src/mscorlib/shared/System/Span.NonGeneric.cs#L707

cc @dotnet/corefxlab-contrib, @jkotas, @stephentoub

GrabYourPitchforks · 2018-01-29T06:22:41Z

We should look at Span<T>.Clear() at the same time.

GrabYourPitchforks · 2018-01-30T08:01:51Z

I know this issue specifically deals with small buffers, but do we know the calling patterns for how these APIs are likely to be used? For example, is it likely that Span<T>.Fill(...) will ever be called when T is a reference type (not simply a contains-references type)? Is the common use case that T will be a fully blittable type?

jkotas · 2018-01-30T09:06:42Z

I think it going to be most frequently used with spans of primitive types (byte, char, int), but it needs to work for all types.

GrabYourPitchforks · 2018-03-07T18:01:55Z

I investigated Span.Clear for small buffers of blittable T. Was able to get significant performance improvements for buffer sizes <= 16 bytes (approx.), but at the expense of making Clear slower for larger buffers. Will keep investigating, but the regressions for larger buffers is steering me away from my current thinking.

One alternative is to expose a specialty API SmallClear which is optimized for smaller buffers and which defers to Clear for larger buffers. The API would only be for callers who really need this in perf-critical code paths and who already know that their spans are small enough to benefit from this. We'd need to mark this non-browseable or hide it behind a package so that we don't needlessly confuse the 99.9% developer audience.

jkotas · 2018-03-07T22:29:19Z

Where is the regression for larger buffers coming from?

jkotas · 2018-03-07T22:34:20Z

expense of making Clear slower for larger buffers

There is always step or bend in the performance curve where implementation switches to a different strategy. It should get amortized for larger buffers pretty quickly.

GrabYourPitchforks · 2018-03-07T22:44:37Z

Where is the regression for larger buffers coming from?

@jkotas Currently Span.Clear and Span.ClearWithoutReferences get inlined into their caller, which means that the caller (user code) directly invokes JIT_Memset or MemoryNative::Clear, depending on the size of the span.

When I was experimenting with optimizing for small buffers the code looked like the following.

private static void ClearWithoutReferences(/* ... */) {
    if (length > 4096) { goto PInvoke; }
    else if (length > 16) { goto JIT_Memset; }
    else {
        /* optimize for small cases */
    }
}

The modified ClearWithoutReferences was large enough where it was no longer inlined into its caller. So now not only is there an additional if check, but the call stack goes caller -> ClearWithoutReferences -> JIT_Memset. This was showing up in the microbenchmark and regressing performance by around 30% at the boundary.

I should note that the microbenchmark was using a fully primed branch predictor.

GrabYourPitchforks self-assigned this Jan 29, 2018

ahsonkhan changed the title ~~Investigate ways to optimize Span.Fill for small buffers~~ Investigate ways to optimize Span.Fill and Span.Clear for small buffers Jan 29, 2018

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

GrabYourPitchforks removed their assignment Feb 12, 2020

GrabYourPitchforks mentioned this issue Apr 16, 2021

Optimize Span<T>.Fill implementation #51365

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 16, 2021

GrabYourPitchforks closed this as completed in #51365 Apr 17, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 17, 2021

ghost locked as resolved and limited conversation to collaborators May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate ways to optimize Span.Fill and Span.Clear for small buffers #24806

Investigate ways to optimize Span.Fill and Span.Clear for small buffers #24806

ahsonkhan commented Jan 26, 2018

GrabYourPitchforks commented Jan 29, 2018

GrabYourPitchforks commented Jan 30, 2018

jkotas commented Jan 30, 2018

GrabYourPitchforks commented Mar 7, 2018

jkotas commented Mar 7, 2018

jkotas commented Mar 7, 2018

GrabYourPitchforks commented Mar 7, 2018

Investigate ways to optimize Span.Fill and Span.Clear for small buffers #24806

Investigate ways to optimize Span.Fill and Span.Clear for small buffers #24806

Comments

ahsonkhan commented Jan 26, 2018

GrabYourPitchforks commented Jan 29, 2018

GrabYourPitchforks commented Jan 30, 2018

jkotas commented Jan 30, 2018

GrabYourPitchforks commented Mar 7, 2018

jkotas commented Mar 7, 2018

jkotas commented Mar 7, 2018

GrabYourPitchforks commented Mar 7, 2018