Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion: zero-allocation "params" via Span<T> / ReadOnlySpan<T> #535

Closed
mgravell opened this issue May 4, 2017 · 51 comments
Closed

Comments

@mgravell
Copy link
Member

mgravell commented May 4, 2017

This is broadly related to #178 and #179

C# has the params keyword that allows convenient varadic signatures, for example:

    var tokens = line.Split(',',':');

Here, the signature is string[] string.Split(params char[] separator).

This usage is awkward because the compiler emits a new char[] every time it is invoked; this might be GEN-0 collected if you get lucky, but it still has overhead.

The advent of Span<T> offers new possibilities. For example, what if the following signature was possible:

    SomeReturnType SomeMethod(params Span<int> values) // should also work with ReadOnlySpan<T>

Then:

    var foo = SomeMethod(1, 2, 3);

Could be compiled as the equivalent of:

    SomeType foo;
    unsafe
    {
        var vals = stackalloc int[3];
        vals[0] = 1;
        vals[1] = 2;
        vals[2] = 3;
        foo = SomeMethod(new Span<int>(vals, 3));
    }

Since this is a compiler generated implementation, it should bypass project-level unsafe restrictions
and use the Span<T> type guarantees to ensure stack-referring safety, etc.

This makes it possible to have zero-allocation params varadic usage. Obviously the usual "last parameter" etc rules would apply.

If unsafe is not available (because of iterator blocks, async, etc), it can be implemented as a span over an array instead, just like regular params might have emitted:

    var foo = SomeMethod(new Span<int>(new int[] { 1, 2, 3 }));

although actually, I'm struggling to think of a scenario involving primitives where this would be needed, even in an iterator block etc, since the call-site is so localized; it might be possible to avoid this entirely. If the target method can't use spans because of async etc, then the target method won't have span in the signature, so: bullet dodged. Note that if this array version is required, and if the values are constant / literals and the target is ReadOnlySpan<T> (not Span<T>), then this array could be cached as a static:

    static int[] __backingField; // generated unpronounceable name

    var foo = SomeMethod(new ReadOnlySpan<int>(
        __backingField ?? (__backingField = new int[] { 1, 2, 3 })));

Thoughts....?


Additional idea for extending pre-existing APIs: if (haven't thought it through) it is possible to have two params overloads for params Foo[] and params Span<Foo> against the same target, then the params Span<Foo> should take precedence if the caller is using compiler-provided params support, since it can be called in a lower overhead way. Meaning, given:

    SomeType Foo(params int[] values);
    SomeType Foo(params Span<int> values);

then:

    var x = Foo(1, 2, 3);

would call the second method. However, if the caller is more explicit with an int[] local / expression, then the first method is called.


Additional unknown: what to do if the T is not one that can normally be stackalloc'd (so: not a known primitive), for example string? Just use the array version? or use something more subtle like:

    var vals = stackalloc byte[{len} * sizeof(string)]; // sizeof here not valid C#, but is valid IL
    var span = new Span<sting>(vals, {len});
    span[0] = ...
@mgravell mgravell changed the title Span<T> and params possibility Feature suggestion: zero-allocation "params" via Span<T> / ReadOnlySpan<T> May 4, 2017
@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

Quick bit of exploration; at the moment the Span<T> version is marginally slower, but zero allocations:

           Method |      Mean |     Error |    StdDev |          Op/s |  Gen 0 |  Gen 1 | Allocated |
----------------- |----------:|----------:|----------:|--------------:|-------:|-------:|----------:|
      ParamsArray |  9.178 ns | 0.1797 ns | 0.3002 ns | 108,957,295.6 | 0.0076 |      - |      48 B |
    ExplicitArray |  9.116 ns | 0.1726 ns | 0.1530 ns | 109,700,882.9 | 0.0076 | 0.0000 |      48 B |
             Span | 10.123 ns | 0.2090 ns | 0.1955 ns |  98,789,162.3 |      - |      - |       0 B |
 StaticBackedSpan |  6.126 ns | 0.1170 ns | 0.1149 ns | 163,230,300.8 |      - |      - |       0 B |

(that is comparing Execute(1,2,3,4,5) vs Execute(new int[] {1,2,3,4,5}) vs Execute(new Span<int>(args, 5)) as per full gist here)

Update: these numbers are out of date now; there's a better way - see SpanTuple etc

@mikedn
Copy link

mikedn commented May 4, 2017

Additional unknown: what to do if the T is not one that can normally be stackalloc'd (so: not a known primitive), for example string? Just use the array version? or use something more subtle like:

Create a struct containing the necessary number of T fields on the stack and somehow (DangerousCreate?) build a span pointing to the first field of the struct.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@mikedn that would definitely work, but I'm not sure that it is necessary to add the additional struct, when the raw data can also just be stackalloc'd - certainly a valid consideration, though

@mikedn
Copy link

mikedn commented May 4, 2017

when the raw data can also just be stackalloc'd - certainly a valid consideration, though

For primitive types yes, you'd probably keep using stackalloc. The struct is only needed for reference types. Or who knows, maybe the struct would be also good for primitive types, it depends which construct is best for the JIT. Probably neither since both imply taking the address of a local :)

@mattnischan
Copy link

mattnischan commented May 4, 2017

Yeah, and in these cases you're really just stackalloc'ing the references, which are a fixed size (sizeof(IntPtr)). The struct would be an interesting take, but I'm not sure what it gains you. It is a cool idea, though.

@mikedn
Copy link

mikedn commented May 4, 2017

Yeah, and in these cases you're really just stackalloc'ing the references, which are a fixed size (sizeof(IntPtr)).

The GC needs to be able to track those references so they can't be in stackallocated memory, the GC doesn't track that.

Anyway, all this sounds very tempting but it looks like the main problem is that the resulting IL code is unverifiable/unsafe/unanyhting.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

added benchmark for static-backed ReadOnlySpan<int> version for literals etc; is now faster than params int[] version - I see this as the idea for when fixed values are used as the args

for the stackalloc case, would need to think about when the stackalloc space is recovered, especially in the case of a loop like above; is that allocating on the stack once? or many times separately? A quick test done by bumping REPEATS_PER_ITEM up to 1M shows that it would currently blow the stack if used naively.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@mikedn once they're inside the span, they should be essentially ref T; and once they're ref T accessible via ref T on the stack (not the stackalloc part), they should be fully reachable from the root, no?. This does show that we need to get the span into a formal local before populating it, though (in the case of reference types).

@mikedn
Copy link

mikedn commented May 4, 2017

for the stackalloc case, would need to think about when the stackalloc space is recovered, especially in the case of a loop like above; is that allocating on the stack once? or many times separately?

Stackallocated space cannot be freed before the method returns. That is, if you do allocate in a loop you'll end up with stack overflow. That's actually one reason to stick to struct because the same local could be reused. At the risk of getting some funny side effects if the called method somehow manages to persist the span for later use.

once they're on the span, they should be essentially ref T; and once they're ref T via a parameter on the stack (not the stackalloc part), they should be fully reachable from the root, no?.

Hmm, I doubt that. That would imply that GC understands span and scans it but AFAIK that's not the case. The way things work is that the GC scans the array/object the span points to. But in our case the span doesn't point to a GC object, it points to random memory so the GC can't do anything. Sure, random memory in this case is actually stack memory and the GC scans the stack but in a different manner.

@benaadams
Copy link
Member

ValueTuple?

@mattnischan
Copy link

mattnischan commented May 4, 2017 via email

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@mattnischan it is simpler than that, I think - just something like (this syntax won't compile currently - is purely for indication):

var _ = (1,2,3,4,5);
SomeMethod(new Span<int>(ref _.Item1, 5));

This does assume that ValueTuple<...> has specific layout; probably OK, but...

Testing with SpanTuple doing:

var args = (1,2,3,4,5);
total += Execute(new Span<int>(Unsafe.AsPointer(ref args.Item1), 5));

gives:

           Method |     Mean |     Error |    StdDev |          Op/s |  Gen 0 |  Gen 1 | Allocated |
----------------- |---------:|----------:|----------:|--------------:|-------:|-------:|----------:|
      ParamsArray | 8.225 ns | 0.1822 ns | 0.1705 ns | 121,579,023.0 | 0.0076 | 0.0000 |      48 B |
    ExplicitArray | 8.421 ns | 0.1631 ns | 0.1812 ns | 118,748,775.0 | 0.0076 |      - |      48 B |
             Span | 9.736 ns | 0.1167 ns | 0.1092 ns | 102,707,992.8 |      - |      - |       0 B |
        SpanTuple | 5.812 ns | 0.1052 ns | 0.0984 ns | 172,067,653.3 |      - |      - |       0 B |
 StaticBackedSpan | 5.816 ns | 0.0776 ns | 0.0726 ns | 171,930,191.3 |      - |      - |       0 B |

which is really good

@mattnischan
Copy link

I see what you mean. I was thinking SomeMethod(params ValueTuple<?> things).

I think that layout stuff gets tricky with Rest, maybe.

@HaloFour
Copy link
Contributor

HaloFour commented May 4, 2017

Each tuple size is a completely different and incompatible type. You'd be forced to pass the tuple representing the maximum number of elements.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@benaadams see SpanTuple perf above (in an edit); also cc @mikedn - if ValueTuple isn't an option, then the example above but with your idea of a per-call-site struct would also work great.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@HaloFour you can cheat via Span - see the illustration above

@HaloFour
Copy link
Contributor

HaloFour commented May 4, 2017

@mgravell

Ah, I see. Had to refresh the page.

Perf does look good but the fact that you have to break out the unsafe+pointers give me pause.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@HaloFour that's just a "what works today" so I can run the perf test; as I understand it, once the stack-referring ref-like specification is viable, it might be possible to do this more directly without the Unsafe. That's a question for @VSadov though.

@mikedn
Copy link

mikedn commented May 4, 2017

if ValueTuple isn't an option, then the example above but with your idea of a per-call-site struct would also work great.

ValueTuple does have public fields rather than properties so maybe it could be used because you can get your hands on a ref T. But it also uses LayoutKind.Auto which makes it a bit risky to use in this scenario, sequential layout would be preferable to avoid surprises.

But the type isn't really important. It can be any type that has a suitable number of reference type fields. The fields can even be object (though that might require using Unsafe.As to go from ref object to ref T).

@ufcpp
Copy link

ufcpp commented May 4, 2017

@mgravell
If my understanding is right, refering ValueTuple will be still unsafe even if ref-like become viable. The SpanTuple would need blittable constraints. However ValueTuple is not blittable because its layout is 'Auto'. See also: https://github.com/dotnet/coreclr/issues/10448

@CyrusNajmabadi
Copy link
Member

image

@CyrusNajmabadi
Copy link
Member

which makes it a bit risky to use in this scenario, sequential layout would be preferable to avoid surprises.

That seems like a safe change that could be made. Paging @jcouv

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@mikedn @cyrus yeah, the ValueTuple thing is probably going too far into implementation details. It could just as easily be a struct ParamsHolder<T> that has a number of fields of type T plus explicit layout, and if it isn't enough: fall back to arrays. It doesn't matter if we use 3 of 16 (or whatever) - we aren't copying it on the stack, we are passing a reference. The actual type used doesn't matter so much as the concept.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

The other "fix" of course being: use stackalloc but do it so that it only allocates once per stack frame.

@VSadov
Copy link
Member

VSadov commented May 4, 2017

ValueTuple is generic. "Auto" allows JIT to pack/pad/align fields as appropriate to their size to make accesses more efficient.
I believe ValueTuple is "Auto" on purpose.

@VSadov
Copy link
Member

VSadov commented May 4, 2017

I wonder how pass-by span compares to _arglist()
That also does not allocate, but calee needs to do typechecks.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@VSadov kinda related and might simplify a lot of this... Thinking aloud here:

ref T foo = ref stackalloc T[len];

Or even:

Span<T> foo = stackalloc T[len];

I.e. wrap a lot of the ugly here up a level into the stackalloc itself.

@VSadov
Copy link
Member

VSadov commented May 4, 2017

A struct at the callee side seems more suitable to hold the variables. Possibly compiler-generated one - like in Fixed.
Stackalloc limits you to unmanaged types and has a problem with re-allocation.

The idea of using Span as an indexer that is not coupled with the underlying storage and yet is itself a struct is interesting.

Some care would be needed to not let such spans escaping their scope. There could be some challenges here. - What if callee returns the span or assigns to another ref/out parameter . . .

@VSadov
Copy link
Member

VSadov commented May 4, 2017

@mgravell

Span<T> foo = stackalloc T[len];

is the current idea for safe stackallock'd spans. It did not get a lot of scrutiny though, so cannot be absolutely sure if it will work.

Note that stackalloc allocates every time. "Leaks" are ok, since all will be released when method returns, but can easily SO if it is done in a loop.
Also it is not tracked by GC, so T cannot have managed types.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@VSadov in the case of params usage, the compiler is fully in charge of the scope, so it can't escape. For more general purposes, that's surely where your ref-like stack referring stuff comes in, no?

@mikedn
Copy link

mikedn commented May 4, 2017

So X calls a span params method Y and Y returns the span it got as a parameter. X now return the span returned from Y to its caller. Ooops, the caller of X now has a span that points into the former frame of X.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@mikedn the call to Y would be marked as a stack-referring expression, so the result is stack-referring and thus cannot be used as a return or out/ref result; that is, I believe, part of the ref-like proposal that is a work in progress separate to this - basically: that's already considered separately, if i understand the ref-like type spec proposal; span-t is a ref-like type for this purpose.

@mikedn
Copy link

mikedn commented May 4, 2017

@mgravell I'm not familiar with that proposal but it sounds kind of fishy. So I call a span params method and that method allocates an array, makes a span out of it and returns it. Now the compiler thinks that the returned span refers to the stack and thus limits my ability to return the span? Oh well, I suppose span isn't too different from ref in this regard so it probably makes sense.

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@VSadov
Copy link
Member

VSadov commented May 4, 2017

Slightly more updated version of the span safety proposal:
https://github.com/VSadov/csharplang/blob/3ab16a77b9fefe3805fa81044ba72d6c254e00f9/proposals/span-safety.md

It is easier to follow/comment via a PR:
#472

@mgravell
Copy link
Member Author

mgravell commented May 4, 2017

@VSadov hoorah! I guessed correctly on the declaration syntax (ref struct)! Ok to be fair it was an obvious choice, but: yay. I see the stackalloc / span assignment in there too, so: indeed a lot of parallels between the direction of that design and the things I've been talking about here. This sounds reassuring, in that we're thinking of similar solutions to the problem scenarios. Looking good.

@mikedn
Copy link

mikedn commented May 5, 2017

Yep, as I suspected this stack-referring thing is rather infectious. It's cool and at the same time kind of scary, many C# developers will probably have trouble understanding all this wizardry. Sometimes I wonder what C# language turns into :)

@mgravell
Copy link
Member Author

mgravell commented May 5, 2017

@mikedn I think of it very much like pointers: most c# devs barely know they are a thing in c#, but for certain areas they are heavily used. As long as the scenarios "most devs" are likely to see are clear and simple, the subtle depths don't worry me. Still better than pointers.

@omariom
Copy link

omariom commented May 6, 2017

Sometimes I wonder what C# language turns into :)

When C# was born MS said it was closer to C++ than to Java :)

@ygc369
Copy link

ygc369 commented Jul 4, 2017

I like this proposal.

@alrz
Copy link
Member

alrz commented Dec 29, 2017

I think #179 cover this proposal (if we allow any type for params that is assignable from T[]), we just need to emit stackalloc where possible (as an optimization).

@mgravell
Copy link
Member Author

mgravell commented Dec 29, 2017 via email

@alrz
Copy link
Member

alrz commented Dec 29, 2017

I was talking about semantics where these are all permitted, but I missed the fact that Span is a ref struct and doesn't implement any interfaces. so it should be a special case anyways.

@bbarry
Copy link
Contributor

bbarry commented Jan 3, 2018

Presumably a compiler generated nested struct would be safer than relying on the internals of ValueTuple right?

[Benchmark(OperationsPerInvoke = REPEATS_PER_ITEM)]
public unsafe int SpanGenerated()
{
    int total = 0;
    for (int i = 0; i < REPEATS_PER_ITEM; i++)
    {
        var args = new Generated5Int(1, 2, 3, 4, 5);
        total += Execute(new Span<int>(Unsafe.AsPointer(ref args.I1), 5));
    }
    return total;
}
[Benchmark(OperationsPerInvoke = REPEATS_PER_ITEM)]
public unsafe int ReadOnlySpanGenerated()
{
    int total = 0;
    for (int i = 0; i < REPEATS_PER_ITEM; i++)
    {
        var args = new Generated5Int(1, 2, 3, 4, 5);
        total += Execute(new ReadOnlySpan<int>(Unsafe.AsPointer(ref args.I1), 5));
    }
    return total;
}

[StructLayout(LayoutKind.Sequential)]
struct Generated5Int
{
    public int I1;
    int I2;
    int I3;
    int I4;
    int I5;
    public Generated5Int(int i1, int i2, int i3, int i4, int i5) {
        I1 = i1;
        I2 = i2;
        I3 = i3;
        I4 = i4;
        I5 = i5;
    }
}

                Method |     Mean |     Error |    StdDev |          Op/s |  Gen 0 |  Gen 1 | Allocated |
---------------------- |---------:|----------:|----------:|--------------:|-------:|-------:|----------:|
           ParamsArray | 6.000 ns | 0.0219 ns | 0.0205 ns | 166,673,417.6 | 0.0114 | 0.0000 |      48 B |
         ExplicitArray | 5.943 ns | 0.0159 ns | 0.0141 ns | 168,271,717.2 | 0.0114 | 0.0000 |      48 B |
                  Span | 9.500 ns | 0.0217 ns | 0.0203 ns | 105,262,022.6 |      - |      - |       0 B |
             SpanTuple | 2.182 ns | 0.0056 ns | 0.0052 ns | 458,245,714.4 |      - |      - |       0 B |
      StaticBackedSpan | 2.177 ns | 0.0056 ns | 0.0050 ns | 459,410,418.4 |      - |      - |       0 B |
         SpanGenerated | 2.181 ns | 0.0044 ns | 0.0041 ns | 458,545,682.8 |      - |      - |       0 B |
 ReadOnlySpanGenerated | 2.181 ns | 0.0040 ns | 0.0037 ns | 458,582,157.2 |      - |      - |       0 B |

@alrz
Copy link
Member

alrz commented Jan 15, 2018

I started working on stackalloc initializers (dotnet/roslyn#24249), now we should be able to reuse its lowering for params Span. For reference types we fall back to array though.

Is there anyone from LDT interested to champion this proposal?

@bbarry
Copy link
Contributor

bbarry commented Jan 16, 2018

I don't know if stackalloc initializers are worth doing with their performance cost relative to allocating an array (Span vs ParamsArray in various tables above).

A generated struct used for a Span pointer would be nice (relying on Tuple is suspect due to auto layout) but that has the same problem that #992, #1147 and https://github.com/dotnet/corefx/issues/26228 do, doesn't it?

This is valid code (and almost certainly buggy):

public unsafe Span<int> SpanGenerated()
{    
    var args = new Generated5Int(1, 2, 3, 4, 5);
    return Execute(new Span<int>(Unsafe.AsPointer(ref args.I1), 5));
}
Span<int> Execute(/*params*/ Span<int> input) => input;

Does stackalloc have that issue as well?


Edit: what I mean is supporting Span<T> would be nice for params in your patch but I don't know if it is worth waiting for zero allocation methods of doing it assuming you would be supporting other array like params things.

@benaadams
Copy link
Member

Does stackalloc have that issue as well?

stackalloc is fixed memory so it doesn't have the problem; however you can't return it from the method as that stack space is then invalid; as it gets unwound - though you can call into other methods with it (as its still valid going deeper into the stack)

@alrz
Copy link
Member

alrz commented Jan 16, 2018

@bbarry

supporting Span<T> would be nice for params in your patch but I don't know if it is worth waiting for zero allocation methods of doing it assuming you would be supporting other array like params things.

Not sure if I understand correctly, do you mean runtime helpers that would allow block init? I guess "other array like params things" refers to #179? and what it has to do with those methods?

@bbarry
Copy link
Contributor

bbarry commented Jan 16, 2018

Sorry I was just saying #179 should be unrelated to this even if #179 were to support Foo(params Span<T>) signatures via a Foo(new Span<int>(new int[] {1, 2, 3})) calling convention.

As I am seeing it, safe code stackalloc seems pretty dead in the water, so do the other zero allocation strategies presented here, without some fundamental new feature for tracking lifetimes (like #1130) to enable #535 (this issue), #923, #992, #1147, and https://github.com/dotnet/corefx/issues/26228.

@alrz
Copy link
Member

alrz commented Jan 19, 2018

This is how I imagine we can make this work:

Currently array initializers are lowered to a static field of a struct with a size of all elements combined, iff all values are constants (plus a few other conditions), for example, new int[3] {1, 2, 3} produces the following:

// Code size 19 (0x13)
.maxstack 8

IL_0000: ldc.i4.3
IL_0001: newarr [mscorlib]System.Int32
IL_0006: dup
IL_0007: ldtoken field valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=12' '<PrivateImplementationDetails>'::E429CCA3F703A39CC5954A6572FEC9086135B34E
IL_000c: call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle)
IL_0011: pop
IL_0012: ret

stackalloc initializers can take advantage of the same mechanism but instead of RuntimeHelpers.InitializeArray we use memcpy (the cpblk instruction) to copy the content of that field directly to the stackalloc variable (this is implemented in the PR), for example stackalloc int[3] {1, 2, 3} produces the following:

// Code size       21 (0x15)
.maxstack  4
IL_0000:  ldc.i4.s   12
IL_0002:  conv.u
IL_0003:  localloc
IL_0005:  dup
IL_0006:  ldsflda    ""<PrivateImplementationDetails>.__StaticArrayInitTypeSize=12 <PrivateImplementationDetails>.E429CCA3F703A39CC5954A6572FEC9086135B34E""
IL_000b:  ldc.i4.s   12
IL_000d:  cpblk
IL_000f:  call       ""void C.Print(int*)""
IL_0014:  ret

Span<T> conversion just works out of the box since we're passing an already-initialized pointer to the constructor.

Finally, params on arrays is using the same lowering logic for array construction so we get all optimizations from there. I propose we use the above lowering for params Span<T> where possible, for reference types we can either fallback to the array variation or use the suggested lowering if feasible,

var vals = stackalloc byte[{len} * sizeof(string)]; // sizeof here not valid C#, but is valid IL
var span = new Span<sting>(vals, {len});
span[0] = ...

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests