Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reevaluate params ReadOnlySpan<string> overloads from #77873 #101261

Closed
jozkee opened this issue Apr 18, 2024 · 21 comments · Fixed by #101499
Closed

Reevaluate params ReadOnlySpan<string> overloads from #77873 #101261

jozkee opened this issue Apr 18, 2024 · 21 comments · Fixed by #101499
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Milestone

Comments

@jozkee
Copy link
Member

jozkee commented Apr 18, 2024

Existing compiler rules make that types that define implicit operators for string and string[] incur in a source breaking change.
Example:

class Program
{
    static void Main()
    {
        var c = new C();
        Join(",", c); // error CS0121: The call is ambiguous between the following methods or properties: 'Program.Join(string, params string[])' and 'Program.Join(string, params ReadOnlySpan<string>)'
    }

    public static string Join(string separator, params string[] value) => "array";

    public static string Join(string separator, params ReadOnlySpan<string> value) => "span";
}

class C
{
    public static implicit operator string(C c) => "c";

    public static implicit operator string[](C c) => ["c"];
}

One real-world example is Microsoft.Extensions.Primitives.StringValues which is passed into string.Join(string, string[]) on aspnetcore, and that it failed to compile when the changes in #100898 flowed into their repo.

API Proposal

Not quite a new API proposal, but a proposal for keeping as is the APIs on #77873.
These APIs would make existing callsites of their analogous params string[] overloads to fail if a StringValues is being passed as argument.

System.String.Concat(params ReadOnlySpan<string?> values)
System.String.Join(char separator, params ReadOnlySpan<string?> value)
System.String.Join(string? separator, params ReadOnlySpan<string?> value)
System.IO.Path.Combine(params ReadOnlySpan<string> paths)
System.IO.Path.Join(params ReadOnlySpan<string?> paths)
System.Text.StringBuilder.AppendJoin(string? separator, params ReadOnlySpan<string?> values)
System.Text.StringBuilder.AppendJoin(char separator, params ReadOnlySpan<string?> values)

Considering that having the ambiguity of string and string[] implicit operators feels like is against FDGs 5.7.2 Conversion Operators

DO NOT provide a conversion operator if such a conversion is not clearly expected by the end users.

I think the best path would be to accept the source breaking change and expect users to update their codebase, preferably picking the new ReadOnlySpan<string> overloads.

Alternatively

@dotnet/roslyn could the compiler rules be adjusted to accommodate StringValues? Why adding a public static implicit operator ReadonlySpan<string> to StringValues doesn't make the compiler select string.Join(string, ReadonlySpan<string>)? It does make the error go away, though, by selecting the string[] overload.

@jozkee jozkee added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Memory labels Apr 18, 2024
@jozkee jozkee added this to the 9.0.0 milestone Apr 18, 2024
@jozkee jozkee self-assigned this Apr 18, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

@jozkee jozkee changed the title Reevaluate params ReadOnlySpan<string> overloads from https://github.com/dotnet/runtime/issues/77873 Reevaluate params ReadOnlySpan<string> overloads from #77873 Apr 18, 2024
@RikkiGibson
Copy link
Contributor

could the compiler rules be adjusted to accommodate StringValues? Why adding a public static implicit operator ReadonlySpan<string> to StringValues doesn't make the compiler select string.Join(string, ReadOnlySpan<string>)? It does make the error go away, though, by selecting the string[] overload.

string[] is considered better because implicit conversion from string[] to ReadOnlySpan<string> exists, and not the other way around. Thus the compiler thinks of string[] as "more derived/specific" than ReadOnlySpan<string>.

It seems like when overload resolution is comparing a conversion from StringValues->string to invoke Join(ReadOnlySpan) in expanded form, with a conversion from StringValues->string[] to invoke Join(string[]) in normal form, there's no rule which indicates the conversion to string is better than the conversion to string[].

However once an operator StringValues->ReadOnlySpan<string> is added, now the overloads can be compared as: oh, I can either invoke in Join(ReadOnlySpan) in normal form, or Join(string[]) in normal form, and the conversion from string[] to ReadOnlySpan indicates that string[] is better.

I'm not sure what change is most appropriate to fix the case in this issue, though. I will say that choosing ReadOnlySpan<string> in expanded form feels undesirable for this case as it converts the StringValues to string and implicitly joins the inner strings as part of that.

@jozkee
Copy link
Member Author

jozkee commented Apr 18, 2024

However once an operator StringValues->ReadOnlySpan is added, now the overloads can be compared as: oh, I can either invoke in Join(ReadOnlySpan) in normal form, or Join(string[]) in normal form, and the conversion from string[] to ReadOnlySpan indicates that string[] is better.

I would expect conversion of StringValues->ReadOnlySpan<string> to be better than StringValues->string[] and avoid weight-in the array_type->span_type conversion in such indirect cases. If that would be possible, I think we could add the implicit ReadOnlySpan<string> operator to fix the issue on our end.

@jozkee
Copy link
Member Author

jozkee commented Apr 18, 2024

cc @dotnet/fxdc

@RikkiGibson
Copy link
Contributor

I would expect conversion of StringValues->ReadOnlySpan<string> to be better than StringValues->string[]

It looks like "overload resolution priority" was designed to address this specific kind of problem. dotnet/csharplang#7906

@CyrusNajmabadi
Copy link
Member

could the compiler rules be adjusted to accommodate StringValues? Why adding a public static implicit operator ReadonlySpan<string> to StringValues doesn't make the compiler select string.Join(string, ReadOnlySpan<string>)? It does make the error go away, though, by selecting the string[] overload.

string[] is considered better because implicit conversion from string[] to ReadOnlySpan<string> exists, and not the other way around. Thus the compiler thinks of string[] as "more derived/specific" than ReadOnlySpan<string>.

It seems like when overload resolution is comparing a conversion from StringValues->string to invoke Join(ReadOnlySpan) in expanded form, with a conversion from StringValues->string[] to invoke Join(string[]) in normal form, there's no rule which indicates the conversion to string is better than the conversion to string[].

However once an operator StringValues->ReadOnlySpan<string> is added, now the overloads can be compared as: oh, I can either invoke in Join(ReadOnlySpan) in normal form, or Join(string[]) in normal form, and the conversion from string[] to ReadOnlySpan indicates that string[] is better.

I'm not sure what change is most appropriate to fix the case in this issue, though. I will say that choosing ReadOnlySpan<string> in expanded form feels undesirable for this case as it converts the StringValues to string and implicitly joins the inner strings as part of that.

Isn't this was @333fred had a proposal for?

@AlekseyTs
Copy link
Contributor

I would expect conversion of StringValues->ReadOnlySpan<string> to be better than StringValues->string[] and avoid weight-in the array_type->span_type conversion in such indirect cases.

What is the basis for these expectations? Usually, expectations should be based on something.

@jozkee
Copy link
Member Author

jozkee commented Apr 19, 2024

What is the basis for these expectations?

  1. Given that we implicitly convert arrays into [ReadOnly]Spans in other contexts, feels counter-intuitive to prefer the opposite here.
  2. When Span & the array conversion were introduced, I doubt it was considered that it would resolve conversion fights in other types in favor of the array.
  3. The number of efforts put into spreading Span benefits, e.g. although I'm not sure if such benefits exist in conversions.

@AlekseyTs
Copy link
Contributor

Given that we implicitly convert arrays into [ReadOnly]Spans in other contexts, feels counter-intuitive to prefer the opposite here.

Based on what? I would expect exactly the behavior that we have, and that expectation is based on language rules, not on feelings. From your perspective span types are unconditionally "better" than array types, but compiler cannot get into one's head and read one's mind/feelings. There is a rational behind the language rules. To simplify, language prefers a candidate that takes a "narrower" type for a given argument. For example, an int parameter is preferred over a long parameter. The process of determining what type is "narrower" is based on the set of available implicit conversions. If type A is implicitly convertible to type B and B is not implicitly convertible to A, type A is "narrower" than B. Thus, it is "better" than B. By this rule, int is better than long, and array is "better"/"narrower" than span.

I guess, the point that I am trying to make. There is nothing wrong to have a desire that compiler behaves a certain way. But the desire itself is not a good basis (I would even say it is not a basis at all) to have an expectation that compiler is supposed to behave this way. Basically, there is an analogy between a contract for an API and language rules. The contract defines the expectations, not the other way around. It is, obviously, fine to ask to adjust the language rules to align expected behavior of the compiler with the desired outcome. But, in my opinion, that is quite different from simply expecting things to work a certain way just because there is a desire for that at given moment in time.

@jozkee jozkee added blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Apr 19, 2024
@CyrusNajmabadi
Copy link
Member

@AlekseyTs the ldm has expressed many times the desire to have the Span based forms "win". So far we've been accomplishing that by patching certain sections of the spec to make that happen. Fred also has a more generalized proposal to nip things in the bud and try to take care of it more uniformly.

It seems very easy and reasonable to me that users would take away from this an intuition on how things should behave. That's how we operate as well. I agree the spec defines what will actually happen. But it is our intuitions that drive is to get the spec to the place we want it to be.

@bartonjs
Copy link
Member

bartonjs commented Apr 23, 2024

Video

  • We are aware of the break, and think that it's fine going forward. It would be nice if the language handled it.
  • Adding an implicit conversion to ReadOnlySpan<string> will fix the problem for this pattern, and we've decided to proactively add it to StringValues. It's up to the owners of the library to decide if this new conversion is available on .NET Standard 2.0, or only on .NET 9+
namespace Microsoft.Extensions.Primitives
{
    public readonly partial struct StringValues
    {
        public static implicit operator ReadOnlySpan<string>(in StringValues value);
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 23, 2024
@RikkiGibson
Copy link
Contributor

RikkiGibson commented Apr 23, 2024

It looks like in StringValues value was probably meant to be scoped in StringValues value. Assuming the goal of in was to minimize value copies, and not to allow the reference to escape into the return value.

@stephentoub
Copy link
Member

stephentoub commented Apr 23, 2024

It looks like in StringValues value was probably meant to be scoped in StringValues value. Assuming the goal of in was to minimize value copies, and not to allow the reference to escape into the return value.

The reference needs to escape into the return value. The argument is in to allow for this to be allocation-free. StringValues stores either a string or a string[], and we want to be able to do something like this:

public static implicit operator ReadOnlySpan<string>(in StringValues value) =>
    value._values is string ? new ReadOnlySpan<string>(in Unsafe.As<object, string>(ref Unsafe.AsRef(in value._values))) :
    new ReadOnlySpan<string>((string[])value._values);

Does adding scoped add any benefits? We've generally only added it when it's necessary.

@RikkiGibson
Copy link
Contributor

RikkiGibson commented Apr 23, 2024

Oh interesting! So you want result to be able to hold a reference to the struct field in the single item case. I didn't understand this aspect of the internal layout of StringValues. Carry on :)

Does adding scoped add any benefits? We've generally only added it when it's necessary.

I would lean toward using it when it is meaningful to do so (when the method has references both as inputs and outputs.) Use it unless you either want to return a reference to an input, or want to reserve the ability to do so in the future.

@stephentoub
Copy link
Member

Does adding scoped add any benefits? We've generally only added it when it's necessary.

I would lean toward using it when it is meaningful to do so (when the method has references both as inputs and outputs.) Use it unless you either want to return a reference to an input, or want to reserve the ability to do so in the future.

We've opted not to do that because we'd end up explicitly adding it to a boatload of APIs where it doesn't actually affect behavior. We only use it when it's not a nop.

@stephentoub
Copy link
Member

@MihaZupan highlighted that we can't do public static implicit operator ReadOnlySpan<string>(in StringValues value) and still have it be safe: after calling this, the returned ReadOnlySpan<string> will be referencing the original location of the StringValues, which could be overwritten, resulting in type safety violations e.g.
SharpLab

At that point, we're back to there being no way to make this API safe and keep it allocation-free, since the only way to make it allocation-free for a single string is to take a reference to the original location, which we can't safely do.

@Neme12

This comment was marked as resolved.

@stephentoub
Copy link
Member

It can't be non-allocating on .NET Core, either.

@Neme12
Copy link

Neme12 commented Apr 24, 2024

Sorry, I missed that. Yeah, that example worries me so much because I have done span-returning methods pointing to readonly structs. Somehow I assumed if it's a readonly struct it's ok. Damn, thanks for the example, but damn... this worries me because this is something you could do with completely safe code. Well, except for the cast. But you'd still be able to overwrite the value of the string by reassigning the struct.

EDIT: Although... I guess then that is kind of expected because readonly span only means you can't change the thing it's pointing to, not that it cannot be changed. Hmmm, we need an ImmutableSpan 😄

But I assumed the struct stored the fields separately for some reason (I guess because that's how I would have done it).

@Neme12

This comment was marked as resolved.

@Neme12
Copy link

Neme12 commented Apr 24, 2024

the ldm has expressed many times the desire to have the Span based forms "win"

I think that would be unfortunate. There are many reasons where a method takes a span as well as an array/string, and the span overload potentially has to allocate, whether for legacy reasons (e.g. Stream, where the default implementation has to potentially allocate and make a copy and forward to the original array-taking one - and I would guess the majority of real-world stream do not override the span one from what I've seen in other people's code), and I've seen it in cases other than legacy reasons as well where a new API was added with overloads for both either string/array and a span and the span-taking one had to allocate. I think it was for String implementing ISpanParsable - the regular parsable doesn't allocate, the span-taking one does. But it was still added for it to be usable in generic scenarios where you only have a span and not a string. But if you do have a string, you should still prefer the string-taking one. After all, string/arrays are more derived and more usable, and can be reused. Spans cannot be reused, and so sometimes you have to allocate an array/string from them.

The only case where preferring span overloads adds value is for params, where the array version allocates invisibly by default (but I might be missing other cases as well in the language where there are implicit allocations similar to this that would be mitigated by span). Of course, If C# didn't have invisible allocations from the start, even this wouldn't be a problem.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants