Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'split' support for ReadOnlySpan<char> similar to string #934

Closed
ahsonkhan opened this issue Jan 23, 2018 · 129 comments
Closed

Add 'split' support for ReadOnlySpan<char> similar to string #934

ahsonkhan opened this issue Jan 23, 2018 · 129 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Milestone

Comments

@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 23, 2018

Edited by @stephentoub on 6/26/2024:

public static class MemoryExtensions
{
    // Alternative name: EnumerateSplits, but not sure what SplitAny would be called
+   public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> source, T separator);
+   public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> source, ReadOnlySpan<T> separator);
+   public static SpanSplitEnumerator<T> SplitAny<T>(this ReadOnlySpan<T> source, params ReadOnlySpan<T> separators);

    // Optional:
+   public static SpanSplitEnumerator<char> SplitAny(this ReadOnlySpan<char> source, params ReadOnlySpan<string> separators);

+   public ref struct SpanSplitEnumerator<T>
+   {
+       public StringSplitEnumerator<T> GetEnumerator();
+       public bool MoveNext();
+       public Range Current { get; }
+   }
}

Older proposals:

partial class MemoryExtensions
{
    public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> span, T separator,
        StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

    public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
    {
        public SpanSplitEnumerator<T> GetEnumerator() { return this;  }
        public bool MoveNext();
        public Range Current { get; }
    }
}

Previously approved API Proposal

public static SpanSplitEnumerator<T> Split(this ReadOnlySpan<T> span, T seperator,
    StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
{
    public SpanSplitEnumerator GetEnumerator() { return this;  }
    public bool MoveNext();
    public ReadOnlySpan<T> Current { get; }
}

Split off from https://github.com/dotnet/corefx/issues/21395#issuecomment-359342832

From @Joe4evr on January 21, 2018 23:16

Can I throw in another suggestion? I'd really like to see some ability to split a ReadOnlySpan<char>. Obviously, you can't return a collection of Spans directly, but isn't that what Memory<T> is for?

// equivalent to the overloads of 'String.Split()'
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, StringSplitOptions options);

The reason for choosing IReadOnlyList<T> is to make the resulting collection indexable, just like string[]. It would be nice if the implementation is ImmutableArray<T>, but I'm not sure if that's a concern for distribution and such.


From @ahsonkhan on January 22, 2018 01:36

@Joe4evr, out of curiosity, do you have a scenario atm where these APIs would be useful? If so, can you please show the code sample?

I would replace the char[] overloads with ReadOnlySpan<char>. However, I am not sure about adding the split APIs, in general, given they have to allocate. Is there a way to avoid allocating? Also, given these are string-like APIs for span, it is strange to have an overload that takes a string[]. Maybe all these would fit better on ReadOnlyMemory instead, especially given the return type.


From @Joe4evr on January 22, 2018 08:35

My scenario is taking a relatively big string of user input and then parsing that to populate a more complex object. So rather than take the whole string at once, I'd like to parse it in pieces at a time. It'd be pretty nice if this can be facilitated by the Span/Memory<T> APIs so that this code won't have to allocate an extra 30-40 tiny strings whenever it runs.

Admittedly, I only started on this particular case earlier today, mostly to experiment and find out how much I could get out of the Span APIs at this time.

Maybe it was a bit naive of me to expect a collection like I did, but I'd at least like to see some API to deal with this scenario a little easier, because I'll probably not be the only one looking to split a span up into smaller chunks like this.


From @stephentoub on January 22, 2018 08:41

Splitting support would be good, but I don't think it would look like the proposed methods; as @ahsonkhan points out, that would result in a lot of allocation (including needing to copy the whole input string to the heap, since you can't store the span into a returned interface implementation).

I would instead expect a design more like an iterator implemented as a ref struct, e.g.

public ref struct CharSpanSplitter
{
    public CharSpanSplitter(ReadOnlySpan<char> value, char separator, StringSplitOptions options);
    public bool TryMoveNext(out ReadOnlySpan<char> result);
}

cc @KrzysztofCwalina, @stephentoub, @Joe4evr

@Joe4evr
Copy link
Contributor

Joe4evr commented Jan 23, 2018

Presumed usage would be like this?

var splitter = span.Split(',', StringSplitOptions.RemoveEmptyEntries);
while (splitter.TryMoveNext(out var slice))
{
    //....
}

Yeah, looks good.

@svick
Copy link
Contributor

svick commented Jan 23, 2018

Would it be possible to implement the enumerable pattern, even if it's not possible to implement IEnumerable<T>?

I think the interface would look something like:

public static SpanSplitEnumerable Split(this ReadOnlySpan<char> span, char seperator);

public ref struct SpanSplitEnumerable
{
    public SpanSplitEnumerator GetEnumerator();
}

public ref struct SpanSplitEnumerator
{
    public bool MoveNext();
    public ReadOnlySpan<char> Current { get; }
}

That way, the usage could be:

foreach (var slice in span.Split(','))
{}

If indexing is important, something like ref struct SpanSplitList would probably work, but I think it couldn't guarantee zero allocations.

@khellang
Copy link
Member

Sorry, didn't see this issue before posting https://github.com/dotnet/corefx/issues/21395#issuecomment-359802015.

As mentioned, I'd love to see something like Google Guava's CharMatcher for this. It could be useful for other scenarios as well.

@grahamehorner
Copy link

It would be great if the split worked also with Span and returned ReadOnlyMemory<ReadOnlyMemory>, ReadOnlySpan<ReadOnlyMemory>

@ahsonkhan
Copy link
Member Author

It would be great if the split worked also with Span and returned ReadOnlyMemory<ReadOnlyMemory>, ReadOnlySpan<ReadOnlyMemory>

Clarifying the comment so the generic types show up:
It would be great if the split worked also with Span<byte> and returned ReadOnlyMemory<ReadOnlyMemory<T>>, ReadOnlySpan<ReadOnlyMemory<T>>

@LordJZ
Copy link

LordJZ commented May 23, 2018

Simple implementation for SpanSplitEnumerable<T> Split<T>(this ReadOnlySpan<T> span, T separator) interface where the return type is foreachable. Unit tests included.

https://gist.github.com/LordJZ/92b7decebe52178a445a0b82f63e585a

The sentinel is a "clever trick" to avoid having a boolean field, can be replaced with a backing field.

@dougbu
Copy link
Member

dougbu commented Sep 7, 2018

We needed features like this on top of ASP.NET's StringTokenizer type in the aspnet/WebHooks repo. See that repo's TrimmingTokenizer and have seen some interest (aspnet/WebHooks#324) in making that more broadly available.

It would be great to point customers to something in the BCL instead of (say) adding TrimmingTokenizer features to StringTokenizer. What is the expected timeframe for the "Future" milestone that would include a dotnet/corefx#26528 fix?

/cc @davidfowl

@danmoseley
Copy link
Member

@ahsonkhan if this proposal makes sense to you, perhaps you could help shepherd it to review, as it seems there would be volunteers to implement it?

@danmoseley
Copy link
Member

cc @JeremyKuhne who has been doing thinking about low allocation string operations.

@JeremyKuhne
Copy link
Member

StringBuilder added an enumerator for the next version that we should consider when deciding on a pattern for enumerating spans. https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/System/Text/StringBuilder.cs#L587

@ahsonkhan
Copy link
Member Author

if this proposal makes sense to you, perhaps you could help shepherd it to review, as it seems there would be volunteers to implement it?

Let me make sure the API shape is clear. Incorporating the recent feedback (to get foreach support, make it generic), here is the API:

public static SpanSplitEnumerable<T> Split(this ReadOnlySpan<T> span, T seperator,
    StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

public ref struct SpanSplitEnumerable<T> where T : IEquatable<T>
{
    public SpanSplitEnumerator<T> GetEnumerator();
}

public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
{
    public bool MoveNext();
    public ReadOnlySpan<T> Current { get; }
}

@JeremyKuhne
Copy link
Member

@KrzysztofCwalina this is the enumerating spans issue we discussed yesterday.

@khellang
Copy link
Member

Does StringSplitOptions really make sense for a generic T?

@candoumbe
Copy link

candoumbe commented Sep 12, 2018

Does StringSplitOptions really make sense for a generic T?

I suppose a SpanSplitOptions which would mimic StringSplitOptions makes more sense here.

@KrzysztofCwalina
Copy link
Member

Should the enumerator and the enumerable be combined? i.e. what's the reason to have two types?

@svick
Copy link
Contributor

svick commented Sep 13, 2018

@KrzysztofCwalina I think it makes the API cleaner and easier to understand. The foreach pattern requires an enumerable and an enumerator, and that's exactly what the proposed API provides.

It there any advantage in combining them, apart from decreasing the size of the API surface?

@terrajobst
Copy link
Member

terrajobst commented Sep 25, 2018

Video

We don't think we want these APIs:

  1. They allocate
  2. They aren't as convenient as String.Split

If you care about allocations, then you want a different API. And if you don't care about allocations, well, then you can just use String.Split.

So what API would we like to see? It could be a struct-based enumerator that allows the consumer to foreach the individual spans without allocations. Alternatively (or additionally) we could have a method that allows the consumer to pass in a buffer with the locations, which, for the most part, could be stackallocated.

@schungx
Copy link

schungx commented Oct 25, 2018

We don't think we want these APIs:

1. They allocate

I readily concur with this. If I don't care about allocations, I'll simply be using string.Split. If I care enough to want to avoid allocating substrings, then obviously I would like to avoid any allocations whatsoever.

Suggest ReadOnlySpan<T>.Split returns ReadOnlySpan<ReadOnlySpan<T>> on the stack so there is no more allocation.

If I want to use foreach, which allocations, then I can still enumerate over the ReadOnlySpan<ReadOnlySpan<T>>. Otherwise, I'll use a simple for loop to have zero allocation.

@khellang
Copy link
Member

Can you use ReadOnlySpan<T> as a generic argument?

@svick
Copy link
Contributor

svick commented Oct 25, 2018

@schungx

Suggest ReadOnlySpan<T>.Split returns ReadOnlySpan<ReadOnlySpan<T>> on the stack so there is no more allocation.

Split can't return a Span that was stack allocated by itself. It could take that Span as an additional parameter, but then you have to somehow handle the case where that Span is not big enough. I believe this is what @terrajobst meant by "we could have a method that allows the consumer to pass in a buffer with the locations".

@KrzysztofCwalina
Copy link
Member

@terrajobst, the API that @svick proposed above does not seem to allocate. Am I missing something? Or are you saying that we want both convenience and no allocations?

@svick, I wonder how important it is for this enumerator to be easy to understand. The classic enumerator pattern has two types for many reasons that are less and less applicable once you deal with by ref structs that cannot implement enumerator interfaces and get copied when passed around. I do agree we should discuss pros and cons and maybe indeed it's better to have two types, but I wanted to open the discussion as it seems such an overkill to add two by ref types just so we can split.

@svick
Copy link
Contributor

svick commented Oct 25, 2018

@KrzysztofCwalina I agree that my argument of "this version of the API is slightly easier to understand" is fairly weak in this case. But in my opinion, the argument of "this version of the API is slightly smaller" is even weaker, which is why I prefer to have two types.

But ultimately it doesn't make that much of a difference, as long as I'll be able to have foreach (var slice in span.Split(',')) without allocations, I'll be happy.

@schungx
Copy link

schungx commented Oct 31, 2018

@svick I'm suspect a normal foreach will always allocate because the current state must be kept somewhere, even if you somehow come up with value-based iterators...

And what is the purpose of this again? That's to prevent allocations in parsing code in tight loops, where all we're doing is manipulating streams of text without ever changing them.

@svick
Copy link
Contributor

svick commented Oct 31, 2018

@schungx There's nothing really new to come up with. For example, foreach over a List<T> already does not allocate on the heap, because the current state is kept in the enumerator struct on the stack. A very similar approach can be used here.

@ahsonkhan
Copy link
Member Author

ahsonkhan commented Nov 13, 2018

  1. They allocate

It could be a struct-based enumerator that allows the consumer to foreach the individual spans without allocations.

Should the enumerator and the enumerable be combined? i.e. what's the reason to have two types?

The APIs, as suggested here, don't allocate and are essentially what was recommended. Maybe it was missed since it wasn't at the top. Will copy it over to the top post.

Making it into a single type:

public static SpanSplitEnumerator<T> Split(this ReadOnlySpan<T> span, T seperator,
    StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
{
    public SpanSplitEnumerator GetEnumerator() { return this;  }
    public bool MoveNext();
    public ReadOnlySpan<T> Current { get; }
}

@terrajobst
Copy link
Member

terrajobst commented Nov 27, 2018

  • We don't think it should be constrained to just char, so leaving it as T is fine.
  • While we don't want to expose an overload that works on Span<T>, we should make sure we can
    • We should rename SpanSplitEnumerator to ReadOnlySpanSplitEnumerator to
  • The enumerator should probably live on the same type as the method, which would be MemoryExtensions
  • Using StringSplitOptions might feel odd, but (1) we've never extended it and (2) and the operations are about how the split is performed, which applies to this API to
  • If we ever need to expose char[] (matches any) we'd add a new method called SplitAny so that we can also have the equivalent of string (matches the sequence)
  • We might want to provide a similar method for memories, but let's wait until that's required

@Gnbrkm41
Copy link
Contributor

Are we going to have an overload that takes ReadOnlySpan as the delimiter as well?

@Gnbrkm41
Copy link
Contributor

I tried implementing the APIs above, and additionally a two more split methods in my personal repository. The types and methods themselves can be found in ConsoleApp4 (lol) folder, and the tests can be found in SplitTests folder.

public static ReadOnlySpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> span, T delimiter,
    StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}
public static ReadOnlySpanSplitBySequenceEnumerator<T> Split<T>(this ReadOnlySpan<T> span,
    ReadOnlySpan<T> delimiter, StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}
public static ReadOnlySpanSplitByAnyEnumerator<T> SplitByAny<T>(this ReadOnlySpan<T> span,
    ReadOnlySpan<T> delimiters, StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

The first one replaces string.Split(char), the second one replaces string.Split(string), and the third replaces string.Split(char[]). The last two are not really included in the original discussion (and would take a bit of time to go through approval process even if it does), but I personally think it is worth including to have the complementary alternatives as well.

It's not a great code perhaps, but I'd love to open a PR for this issue (with the code above). Although the last two methods are not approved, at least I can open one for the regular T version.

@grant-d
Copy link
Contributor

grant-d commented Apr 16, 2019

A slightly different take on this. Considering that Range is now available, and the non-zero cost of Slice on potentially-many results, what if the enumerable returned Ranges instead of sliced RoS. In other words, a list of delineations. The consumer could then decide when/if to Slice(range)

foreach (var range in span.Split(','))
{
    var x = span.Slice(range);
}

Perhaps that's too abstract for a public api.

@bbartels
Copy link
Contributor

bbartels commented Apr 17, 2019

Did some experimenting with this as well.
Not sure if it's an optimal solution though.
https://github.com/bbartels/coreclr/blob/master/src/System.Private.CoreLib/shared/System/MemoryExtensions.Split.cs

@stephentoub
Copy link
Member

@jeffhandley, this has been languishing and I'd like for us to ship a solution in .NET 9. Can you help?

@GrabYourPitchforks
Copy link
Member

Adding some comments here that I mentioned to folks offline.

If we want to use the same enumerator type as a return value from all the different overloads, then we're essentially inventing a polymorphic system. (The data storage and lookup algorithm will depend on what overload is called, and we're abstracting all of those away behind a single projection.) Polymorphism in .NET is typically implemented using a base class and derived types, but since people in this thread have said that non-allocation is vital, we can't use the typical .NET mechanisms here where state is captured by derived types which contain the concrete implementations.

So there are a few options:

  • List all possible implementations' backing fields, and promote each of them to a top-level backing field of the sole Enumerator<T> object, and have the MoveNext() method switch on what algorithm is in use. This potentially bloats the size of the struct itself, and I don't know how the JIT will optimize it.
  • Use a clever union-style backing field for implementation-specific details, and use a delegate or fnptr or something to jump to the appropriate implementation. This keeps the size of the struct small but adds a level of indirection, which could also impact the JIT.
  • Have different method overloads return different concrete types. For example, the method which splits on a ROS<char> needle would return a different type than the method which splits on a ROS<string> needle, etc. This could limit our ability to perform certain optimizations. For example, if the API is Split(this ROS<char> input, params ROS<char> needle), then mySpan.Split() [which is intended to split on whitespace] would necessarily return the same implementation as mySpan.Split('a', 'b', 'c'). and we might not be able to optimize it as fully as if we could return a whitespace-only specific enumerator.
  • Drop the non-allocating requirement and allow internal state capture.

There are pros and cons to all of these. My previous holdup was based on a concern that whatever enumerator shape we choose for v1 of this API will have consequences for overloads and capabilities we may wish to add in the future. That is, if we decide to ship a "minimal" v1 API and we ship whatever API shape happens to fall out of that, that may unintentionally bind our ability to improve this area in v2. So I want to be sure that when the v1 API is added, the choice of shape is deliberate and is done with an eye toward future proofing. As long as that is done, I'm happy. :)

@stephentoub
Copy link
Member

stephentoub commented Jun 26, 2024

The most common use case I've seen here is to want to take an existing use of string.Split and be able to iterate through the results in a non-allocating manner. I think we should expose the APIs to do that in .NET 9, following the same structure as the Split span-based APIs we added in .NET 8, plus handling generics (in particular in support of ReadOnlySpan<byte> for UTF8 text). We don't need the StringSplitOptions as those are really only important when it impacts allocation, which there isn't here: the consumer can easily skip empty entries and trim as desired. And the count option is generally a small value, such that the existing Split methods introduced in .NET 8 are better options. At which point the generics and chars would look the same, so we can just have generic overloads (and optimize chars as an implementation detail):

public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> source, T separator);
public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> source, ReadOnlySpan<T> separator);
public static SpanSplitEnumerator<T> SplitAny<T>(this ReadOnlySpan<T> source, params ReadOnlySpan<T> separators);

public ref struct StringSplitEnumerator<T>
{
    StringSplitEnumerator<T> GetEnumerator();
    bool MoveNext();
    Range Current { get; }
}

To fully optimize that to the max we would likely need several different return types, and I don't believe that's warranted. The size of the returned enumerator isn't particularly important, as it's generally going to be at most one copy and it won't be put on the heap because it's a ref struct. The relevant overheads then will likely be for extra branches on each MoveNext to determine which path to take, and those should be few. Given all the work associated with split, I'm not particularly concerned about those costs. So I think we should keep this API simple and just have the single return type for all the overloads.

We've been punting on this for years. We should just do it now.

(Note the one use case the above generic overloads lack is support for splitting chars based on multiple strings... if we want to support that, we could also have a non-generic public static SpanSplitEnumerator<char> SplitAny(this ReadOnlySpan<char> source, params ReadOnlySpan<string> separators, and if/when we add that we could choose to give it a different return type or just augment the existing one... that doesn't need to be decided until/if it's actually added. I've left it out because I don't see it used commonly.)

@stephentoub stephentoub self-assigned this Jun 26, 2024
@stephentoub stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work and removed api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Jun 26, 2024
@bbartels
Copy link
Contributor

bbartels commented Jun 26, 2024

Any chance I could pick this up for implementation once the API is reviewed? I have a lot of free time to crack this out pretty quickly, especially given I'd just base in what was already (temporarily) merged. Spent a lot of time on this over the years :')

@stephentoub
Copy link
Member

Any chance I could pick this up for implementation once the API is reviewed? I have a lot of free time to crack this out pretty quickly, especially given I'd just base in what was already (temporarily) merged. Spent a lot of time on this over the years :')

Yes, thanks :)

@colejohnson66
Copy link

Is there a reason this isn’t called “EnumerateSplits”? Having “Split” return an enumerator, while it matches what many do with the result, doesn’t match the existing API of returning a complete result.

@stephentoub
Copy link
Member

stephentoub commented Jul 1, 2024

I'd be fine calling it EnumerateSplits, matching what we exposed on Regex. But I don't have a good answer for what the corresponding SplitAny would be called.

@iSazonov
Copy link
Contributor

iSazonov commented Jul 2, 2024

EnumerateAllSplits, EnumerateAnySplits... Although it looks too long.

@wstaelens
Copy link

EnumerateSplitResult(s) ?

@bartonjs
Copy link
Member

bartonjs commented Jul 2, 2024

Video

  • We decided Split is better than EnumerateSplits
  • We're leaving SplitAny(ROS-char, ROS-string) out for now, and only adding the generic ones.
  • We added a SearchValues overload to SplitAny, though it might or might not get implemented this release
  • These may need an IEquatable constraint... if they do, add it.
public static class MemoryExtensions
{
    public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> source, T separator);
    public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> source, ReadOnlySpan<T> separator);
    public static SpanSplitEnumerator<T> SplitAny<T>(this ReadOnlySpan<T> source, params ReadOnlySpan<T> separators);
    public static SpanSplitEnumerator<T> SplitAny<T>(this ReadOnlySpan<T> source, SearchValues<T> separators);
    
    public ref struct SpanSplitEnumerator<T>
    {
        public SpanSplitEnumerator<T> GetEnumerator();
        public bool MoveNext();
        public Range Current { get; }
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 2, 2024
@bbartels
Copy link
Contributor

bbartels commented Jul 2, 2024

I'll get on it 🙂

@hez2010
Copy link
Contributor

hez2010 commented Jul 9, 2024

Should SpanSplitEnumerator<T> implement IEnumerator<T>?

I would like to use it like

T[] Collect<T, TEnum>(TEnum e) where TEnum : IEnumerator<T>
{
    var list = new List<T>();
    while (e.MoveNext()) list.Add(e.Current);
    return list.ToArray();
}

@colejohnson66
Copy link

colejohnson66 commented Jul 9, 2024

It can't because it's a ref struct, and ref structs being able to implement interfaces was removed from C# 13.

@hez2010
Copy link
Contributor

hez2010 commented Jul 10, 2024

It can't because it's a ref struct, and ref structs being able to implement interfaces was removed from C# 13.

It's not being removed. It will be shipped as a preview feature instead, and several APIs in BCL have already adopted this feature.

@stephentoub
Copy link
Member

It will be shipped as a preview feature instead, and several APIs in BCL have already adopted this feature.

We have not implemented interfaces on any public ref structs, and we won't on non-experimental public ref structs as long as the language feature is in preview. We have no way to mark just the interface inheritance as experimental / preview, which means if the language feature were to change or disappear, we could be left with non-preview surface area we're unable to maintain.

@clipperhouse
Copy link

+1 on this overall proposal, I’m a fan of making splits into enumerators, and (hopefully) allocation-free.

I’ve implemented similar here: https://github.com/clipperhouse/uax29.net

@clipperhouse
Copy link

Having watched the design review above, on Current being a Range: another possibility is to hang a Ranges property off the SpanSplitEnumerator, as I’ve done here. That way, you can preserve the API of string.Split, where the enumerator returns the splits, but give consumers the ranges to use where that’s advantageous.

Though I am not sure that satisfies the requirements mentioned in the discussion. IIUC, the goal was to a) allow Memory inputs to avoid allocation and b) perhaps it’s confusing if the input is Memory but the output is ReadOnlySpan or c) avoid multiplying the overloads.

(I appreciate that you likely feel the API has been sufficiently debated and thanks for indulging.)

@stephentoub
Copy link
Member

Fixed by #104534

@github-actions github-actions bot locked and limited conversation to collaborators Aug 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Projects
None yet
Development

Successfully merging a pull request may close this issue.