Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Infix Concatenation Operator for IEnumerable<T> #7884

Closed
dubrowgn opened this issue Jan 11, 2016 · 22 comments
Closed

Proposal: Infix Concatenation Operator for IEnumerable<T> #7884

dubrowgn opened this issue Jan 11, 2016 · 22 comments

Comments

@dubrowgn
Copy link

Basically, this would be a '+' infix operator for IEnumerable that works the same as it does for strings:

// given:
IEnumerable<T> seqA;
IEnumerable<T> seqB;
T itemA;
T itemB;

// with two IEnumerables
seqA + seqB; // => seqA.Concat(seqB); (eg "as" + "df")
null + seqB; // => seqB; (eg null + "df")
seqA + null; // => seqA; (eg "as" + null)

// with singular items
seqA + itemA; // append itemA to seqA as a new IEnumerable<T> (eg "asd" + 'f')
itemB + seqB; // prepend itemB to seqB as a new IEnumerable<T> (eg 'a' + "sdf")

// a more complicated example from production code
// given:
IEnumerable<Filter> filters;
IEnumerable<Sort> sorts;
IEnumerable<Select> selects;

IEnumerable<int> allPropertyIds =
    filters.OfType<PropertyFilter>().Select(f => f.PropertyId) +
    sorts.OfType<PropertySort>().Select(s => s.PropertyId) +
    selects.OfType<PropertySelect>().Select(s => s.PropertyId);

Somewhat related: #3673, as this could be implemented as a set of extension methods if #3673 was implemented.

@MgSam
Copy link

MgSam commented Jan 11, 2016

Being that Concat is O(n^2), I don't think the language should make it easier for people to use it.

@dubrowgn
Copy link
Author

Definitely good to know. Although, nobody in their right mind would ever guess Concat was O(n^2). This seems like a bug to me, that should obviously be addressed.

Edit: I'm definitely not seeing O(n^2) for Concat on my machine (see results). Maybe this was addressed already?

@orthoxerox
Copy link
Contributor

@dubrowgn the N is the number of IEnumerables you're concatting, not the number of items in each.

@dubrowgn
Copy link
Author

@orthoxerox Ah my mistake. It still seems to me this should be addressed though.

@dsaf
Copy link

dsaf commented Jan 11, 2016

I wanted to request something similar the other day, but decided that it will become achievable through other more general means #6136, #5165 that are already coming. Wouldn't extension operators work? Then it becomes a standard library support matter.

@paulomorgado
Copy link

I'm not sure I understand the point/justification on that blog post.

Take this modified example:

var stopWatch = new Stopwatch();
for (int length = 0; length <= 10000; length += 1000)
{
    var list = Enumerable.Range(1, 1);
    IEnumerable<int> ones = list;
    for (int i = 0; i < length; ++i)
        ones = ones.Concat(list);

    stopWatch.Reset();
    stopWatch.Start();

    var count = 0;
    foreach (var item in ones) count++;

    stopWatch.Stop();
    Console.WriteLine($"Length: {length} Count: {count} Time: {stopWatch.ElapsedMilliseconds}");
}

and this:

var stopWatch = new Stopwatch();
for (int length = 0; length <= 10000; length += 1000)
{
    var list = Enumerable.Range(1, 10);
    IEnumerable<int> ones = list;
    for (int i = 0; i < length; ++i)
        ones = ones.Concat(list);

    stopWatch.Reset();
    stopWatch.Start();

    var count = 0;
    foreach (var item in ones) count++;

    stopWatch.Stop();
    Console.WriteLine($"Length: {length} Count: {count} Time: {stopWatch.ElapsedMilliseconds}");
}

The number of elements has weight on the execution time. It's not just the number o enumerables. The number of yields is directly related to the number of elements.

@svick
Copy link
Contributor

svick commented Jan 12, 2016

@paulomorgado I'm not sure what is it that you don't understand. Of course increasing the number of elements increases execution time, but it increases linearly, as you would expect, and it can't be improved. But when increasing the number of enumerables, the time increases quadratically. That's not what most people would except, and it can be improved.

In other words, the total time it takes to enumerate the result of concating n sequences, each having m elements with the current implementation of Concat is O(m · n2).

@alrz
Copy link
Member

alrz commented Jan 12, 2016

For immutable lists this would make sense but not just any IEnumerable.

@orthoxerox
Copy link
Contributor

@dubrowgn #15 is the original issue covering the problem of naive recursive yielding. As far as I understand, it's fixable, the C# team knows how to fix it, but it's a big and complex rewrite. Generators are already doing some complex transformations of your apparently trivial code, so it's the question of time and effort vs the number of improved codebases.

In this specific case, you can write your own Concat that will return not just an IEnumerable, but an INestedEnumerable, and will flatten your linked list of IEnumerables into an array if at least one of the parameters is an INestedEnumerable. Then your execution time will become linear again.

@MgSam
Copy link

MgSam commented Jan 12, 2016

@orthoxerox I don't believe it's correct to say the team knows how to fix it. There are a lot of tradeoffs. See this discussion from CodePlex.

@dubrowgn
Copy link
Author

@dsaf #6136 would be awesome, and solve a bunch of other annoyances as well. I think that would probably be the way to do this.

@alrz This is really just sugar on top of Concat, so it effectively already exists. Why is Concat in the codebase at all if it didn't make sense for IEnumerable?

@orthoxerox Good to know the issue got some attention already. I had a similar thought about using linked lists instead of trees, since IEnumerable only ever traverses in one direction. I'm failing to come up with instances where a tree could solve a problem a linked list couldn't, but I'm sure there probably are cases. Regarding INestedEnumerable, is there a reason the library version of Concat can't be changed to use INestedEnumerable today?

@orthoxerox
Copy link
Contributor

@dubrowgn Well, INestedEnumerable would only work for linked lists of enumerables, not trees or other structures, plus you waste a lot of time and space building a list of them to traverse. If you have millions of enumerables you're better off writing your own custom visitor instead than using any generated iterators.

@MgSam
Copy link

MgSam commented Jan 12, 2016

Why is Concat in the codebase at all if it didn't make sense for IEnumerable?

@dubrowgn It makes sense if used appropriately (eg, not in loops). Making it into an operator gives an implicit signal to the developer "hey, this is the right way to do this!" after which the developer often assumes that since its the "right" way, they don't have to worry about performance implications. If you're using it so often that making it an operator would really be helpful to you, perhaps you should reconsider if you're really using it appropriately in the first place.

In this vein, some people feel having string overload the + operator was itself a mistake for this reason (though personally I feel the benefits of string + outweigh the costs).

@dubrowgn
Copy link
Author

@MgSam I definitely understand the concern, and currently that makes sense. However, adding the + operator for all IEnumerable types makes C# more consistent. Why should string be special? After all, string is just an IEnumerable<char> with some fancy optimizations.

So, a couple of thoughts. First, consistency matters. If a sequence of chars gets an operator, a sequence of anything else should be able to use the same operator with similar assumptions. Right now strings get special treatment, and string.Concat has a completely different time complexity than IEnumerable<T>.Concat. Second, if we can do Concat faster for chars, we should be able to do it just as fast for other types too. Third, it seems a bit backwards to say "don't use this feature" because it is slow. String operations are heavily optimized because people find them useful, not the other way around.

@MgSam
Copy link

MgSam commented Jan 12, 2016

String operations are heavily optimized because people find them useful, not the other way around.

In fact, quite the opposite. The only optimizations the compiler does are to join compile-time constant strings. Concatenation of lots of strings using + is extremely wasteful of memory.

From a performance perspective, + for strings was a mistake. It is extremely common for beginners to abuse the operator and use it in a loop. C# is supposed to be a "pit of success" language; i.e, where doing the naive thing is also the "right" thing.

I personally feel string + was worth the cost of mistaken usage because its extremely common for UI or logging to want to join a couple strings together, and forcing the user to make a nest of method calls to do this would be extremely annoying.

IEnumerable does not have a similar position. The typical application should rarely, if ever, be doing IEnumerable.Concat.

@dubrowgn
Copy link
Author

String operations are heavily optimized because people find them useful, not the other way around.

I was talking about string operations in general, in code.

Concatenation of strings using + is extremely wasteful of memory.

It's only wasteful if you concatenate more than once in succession. Plus, most of string's performance issues come from being an immutable, array-backed datatype. I don't think anyone wants IEnumerable<T>.Concat to flatten the results into an Array.

IEnumerable does not have a similar position. The typical application should rarely, if ever, be doing IEnumerable.Concat.

We do it all the time, just not with IEnumerable directly. List.Add, List.AddRange, Queue.Enqueue, Stack.Push are all examples of exactly the same operation, just with a specific backing datatype.

@MgSam
Copy link

MgSam commented Jan 12, 2016

We do it all the time, just not with IEnumerable directly. List.Add, List.AddRange, Queue.Enqueue, Stack.Push are all examples of exactly the same operation, just with a specific backing datatype.

These are all different operations, each with different effects and performance characteristics. The similarity ends at the conceptual level of "putting something new in a datastructure". Enumerable.Concat is similarly very different from all of these.

You inadvertently make another point at why overloading + is a usually bad idea- it makes very different operations all appear to be the same thing. It not only leads to more mistakes by the developer but it tricks later readers of the code as well.

@dubrowgn
Copy link
Author

Of course they have different performance characteristics. The point is, we add items to collections, and collections to other collections. Saying IEnumerable<T>.Concat should rarely or never happen is like saying all of those operations should likewise rarely or never happen, but they happen quite often.

@MgSam
Copy link

MgSam commented Jan 12, 2016

The existence of those methods is why Concat should rarely or never be used. If you need to be adding and removing things, you should be using a "real" data structure (BCL or custom type), not an IEnumerable. The reason being that "real" data structures can and usually are specifically optimized for adding a lot of things to them at once. Certainly more-performant than IEnumerable.Concat.

Notice that the BCL lacks IEnumerable.Append or IEnumerable.Prepend even though both methods are trivial to implement yourself. I'd guess that the poor performance characteristics of using such methods is the reason they don't exist in the BCL.

@dubrowgn
Copy link
Author

You have to store all of the results at once to use existing data structures. The whole point of enumerating items one at a time, is that you don't have to have them all in memory at once.

@weltkante
Copy link
Contributor

Just for reference I'll link the Microsoft research paper about a non-quadratic implementation of yield foreach. This is also covered in issue #15

@gafter
Copy link
Member

gafter commented Mar 20, 2017

We are now taking language feature discussion on https://github.com/dotnet/csharplang for C# specific issues, https://github.com/dotnet/vblang for VB-specific features, and https://github.com/dotnet/csharplang for features that affect both languages.

@gafter gafter closed this as completed Mar 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants