New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Collection expressions: inline collections in spreads and foreach #7864

Open

cston wants to merge 17 commits into dotnet:main from cston:spread-conditional

Member

cston commented Jan 23, 2024 •

edited

Loading

Support collection expressions inline in expression contexts where the collection type is unspecified.

In spreads to add elements conditionally to the containing collection:

int[] items = [x, y, .. b ? [z] : []];

In foreach:

foreach (bool b in [false, true]) { }


          Collection expressions: spread conditional expression

5afdd36

cston force-pushed the spread-conditional branch from 91db6cf to 5afdd36 Compare

January 23, 2024 07:51

cston added 2 commits

January 23, 2024 08:59


          Refactor rules

6b7a720


          Renames

81e151a

jnm2 reviewed

View reviewed changes

proposals/csharp-12.0/collection-expressions.md Outdated

+              An expression `E` is *spreadable* as values of type `Tₑ` if one of the following holds:
+              * `E` is a *collection expression* and for each element `Eᵢ` in the collection expression there is an implicit conversion to `Tₑ`.
+              * `E` is a [*conditional expression*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/expressions.md#1115-conditional-operator) `b ? x : y`, and `x` and `y` are *spreadable* as values of type `Tₑ`.

Contributor

jnm2 Jan 23, 2024

Only curious: Is there anywhere in the spec that states this in the opposite form, for purposes like target-typed new inside a conditional expression looking outward? Could both directions be unified somehow in some definition of type inference for any language constructs that support "seeing through" them?

jnm2 reviewed

View reviewed changes

proposals/csharp-12.0/collection-expressions.md Outdated

               * For each element in order:
                 * If the element is an *expression element*, the initialization instance *indexer* is invoked to add the evaluated expression at the current index.
                 * If the element is a *spread element* then one of the following is used:
+                  * If the spread element expression is a *collection expression*, then the collection expression elements are evaluated in order as if the elements were in the containing collection expression.
+                  * If the spread element expression is a *conditional expression*, then the condition is evaluated and if `true` the second-, otherwise the third-operand is evaluated as if the operand was an element in the containing collection expression.

Contributor

jnm2 Jan 23, 2024

What if we did not specify treating conditional expressions differently during construction than other expressions, and leave compiler optimizations discretionary?

jnm2 approved these changes

View reviewed changes

cston added 5 commits

January 24, 2024 20:06


          Move proposal to separate doc

bf15073


          Add foreach

55a9f3f


          Misc.

5ef984e


          Misc.

f317fb3


          Add compile time *collection_type*

adca12d

Member

CyrusNajmabadi commented Feb 3, 2024

I like the col<T> concept. We might want to call out an 'anonymous collection type'. Both to draw a correspondence to the existing anonymous -type feature, and to make it clear why it would not be possible to reference in source or metadata.


          Add natural type alternative

8a9d513

cston changed the title ~~Collection expressions: spread conditional expression~~ Collection expressions: inline collections in spreads and foreach

cston mentioned this pull request

Collection expressions: implicit collections in spreads and foreach #7534

Closed

cston added 4 commits

February 3, 2024 20:15


          Merge remote-tracking branch 'upstream/main' into spread-conditional

7df30e1


          Misc

fc81418


          More detail

17a690f


          Update table

681611d

cston marked this pull request as ready for review

February 4, 2024 18:06

cston requested a review from a team as a code owner

February 4, 2024 18:06

cston requested a review from CyrusNajmabadi

February 4, 2024 18:06


          Update table

fafd194

CyrusNajmabadi reviewed

View reviewed changes

proposals/collection-expressions-inline.md Outdated


		### Conversions

		To support conversions for inline collection expressions, an abstract collection_type is introduced.

Member

CyrusNajmabadi Feb 4, 2024

abstract is a bit ambiguous. I would say anonymous.

Member Author

cston Mar 5, 2024

Replaced "abstract" with "compile time".

CyrusNajmabadi reviewed

View reviewed changes

proposals/collection-expressions-inline.md Outdated


		The [iteration type](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) of a collection_type `col<E>` is `E`.

		An implicit collection_type conversion exists from a type with an [iteration type](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the collection_type `col<Tₑ>`.

Member

CyrusNajmabadi Feb 4, 2024

Suggested change

      
            An implicit *collection_type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`.
          
            An implicit *collection type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`.

Member

CyrusNajmabadi Feb 4, 2024

do you want to refer to iteration type? or should we say element type? (just thinking upon the last few ldm talks).

CyrusNajmabadi reviewed

View reviewed changes

proposals/collection-expressions-inline.md Outdated

+              >
+              > The implicit *collection expression conversion* exists if the type has an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` where for each *element* `Eᵢ` in the collection expression:
+              > * If `Eᵢ` is an *expression element*, there is an implicit conversion from `Eᵢ` to `Tₑ`.
+              > * If `Eᵢ` is a *spread element* `..Sᵢ`, there is an implicit conversion **from `Sᵢ` to the *collection_type* `col<Tₑ>`**.

Member

CyrusNajmabadi Feb 4, 2024

nice!

CyrusNajmabadi reviewed

View reviewed changes

proposals/collection-expressions-inline.md Outdated


		The elements of a nested collection expression within a spread can be added to the containing collection instance directly, without instantiating an intermediate collection.

		The elements of the nested collection expression are evaluated in order within the nested collection. There is no implied evaluation order between the elements of the nested collection and other elements within the containing collection expression.

Member

CyrusNajmabadi Feb 4, 2024

this seems wrong to me. there should def be an implied evaluation order. if i have

[a, b, .. c ? [d] : [e], f, g]

i would expect that d and e would be evaluated before f and g.

Member Author

cston Mar 5, 2024

Changed to:

The elements of a collection expression within a spread are evaluated in order as if the spread elements were declared in the containing collection expression directly.

CyrusNajmabadi reviewed

View reviewed changes

proposals/collection-expressions-inline.md Outdated

+              * If `Eᵢ` is an *expression element*, the type of `Eᵢ`.
+              * If `Eᵢ` is a *spread element* `..Sᵢ`, the [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) of `Sᵢ`.
+              Since collection expressions do not have a *type* or an *iteration type*, the *best common type* does not consider elements from nested collection expressions.

Member

CyrusNajmabadi Feb 4, 2024

pity. mght be something to reconsider later. for example:

[a, b, .. c ? [d] : [e] f, g]. I could imaginet he type is the bct of a, c, d, e, f, g

CyrusNajmabadi reviewed

View reviewed changes

proposals/collection-expressions-inline.md Outdated

+              ```csharp
+              byte[] x = [1, .. b ? [2] : []];       // error: cannot convert int to byte
+              int[]  y = [1, .. b ? [default] : []]; // error: no type for [default]
+              int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null>

Member

CyrusNajmabadi Feb 4, 2024

Suggested change

      
            int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null>
          
            int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null>
          
            Predicate<int>[] predicates = [a, .. b ? [i => i.ToString()] : []]; // No type for lambda.
          
            Predicate<int>[] predicates = [a, .. b ? [(int i) => i.ToString()] : []]; // Func<int, string> not compatible with Predicate<int>

CyrusNajmabadi reviewed

View reviewed changes

proposals/collection-expressions-inline.md Outdated

+              Natural type would also allow a subset of the scenarios that are supported above. But since the above proposal relies on target typing and natural type relies on *best common type* of the elements within the collection expression, there will be some limitations if we don't also support target typing with nested collection expressions.
+              ```csharp
+              byte[] x = [1, .. b ? [2] : []];       // error: cannot convert int to byte
+              int[]  y = [1, .. b ? [default] : []]; // error: no type for [default]

Member

CyrusNajmabadi Feb 4, 2024

Suggested change

      
            int[]  y = [1, .. b ? [default] : []]; // error: no type for [default]
          
            int[]  y = [1, .. b ? [default] : []]; // error: no type for [default]
          
            string?[]  y = [1, .. b ? [null] : []]; // error: no type for [null]

CyrusNajmabadi mentioned this pull request

[Proposal]: Collection Expressions Next (C#13 and beyond) #7913

Closed

4 tasks

KevinCathcart reviewed

View reviewed changes

proposals/collection-expressions-inline.md Outdated


		For a collection expression used as the collection in a `foreach` statement, the compiler may use any conforming representation for the collection instance, including eliding the collection.

		The elements of the collection are evaluated in order, and the loop body is executed for each element in order. The compiler may evaluate subsequent elements before executing the loop body for preceding elements.

KevinCathcart Feb 23, 2024 •

edited

Loading

The compiler may evaluate subsequent elements before executing the loop body for preceding elements.

Is that really something that should be left unspecified? The compiler cannot easilly change it later, as that would be an observable breaking change, and one that feels pretty likely to break real programs, whenever somebody does foreach (Foo foo in [myFoo, ..someFunctionWithSideEffects()]), especially if someFunctionWithSideEffects might mutate myFoo.

Edit: Actually an even better example of a case where users would absolutely notice a change if one is made later is if some element throws an exception when evaluated. That is a case where it could make a really big difference to users as to whether the exception occurs before any loop iterations, or not.

I'd honestly expect all the elements to be evaluated first before any loop executions, simply because that it what would happen if the expression were converted to literally any runtime type.

The only sensible implementation strategies I can even think of that would allow for delayed evaluation would be unrolling the loop body (even if an unusual unrolling like: loop and process these initial literal elements, then loop and process these elements from a spread), or the creation of a generator that yields the elements one by one, and then iterating over that generator's output.

Member

CyrusNajmabadi Feb 23, 2024

Is that really something that should be left unspecified?

Yes. It allows the compiler great leeway in the impl strategy for converting the expressions to a collection.

The compiler cannot easilly change it later, as that would be an observable breaking change

It's not a breaking change as it's explicitly marked as unspecified. Taking a dependency on it puts one in unsupported territory, with changes free to happen.

and one that feels pretty likely to break real programs,

I would definitely hope not.

whenever somebody does foreach (Foo foo in [myFoo, ..someFunctionWithSideEffects()]), especially if someFunctionWithSideEffects might mutate myFoo.

NOte: it's fine for someFunctionWithSideEffects to have side effects. Those will happen after myFoo is evaluated, and before any other elements are evaluated. What is unspecified is what it means is if the iteration of that evaluated value itself mutates. That would be a pattern we consider too 'out there', and we're willing to say that if your iterators do this, then collection expressions are not a good fit for you. We want to optimzie for the 99.99% case, where iteration order will not have an impact.

Note: to reiterate. Evaluation-order of the elements is still guaranteed to be left-to-right, just as it always has been in C#. So side effects from prior element evaluations will be seen by later element evaluations.

I'd honestly expect all the elements to be evaluated first before any loop executions

That would require stack space for all their values. That would def be undesirable in many cases. Simply imagine a collection expr with 1000 elements. If we require all that evaluation up front, we need to store all those values somewhere (like teh stack) then copy them into the dest. However, if we give the compiler freedom, it can do something like use a single space on the stack, evaluate each element it it, add to the dest, and the move on to the next (reusing that single space).

KevinCathcart Feb 23, 2024 •

edited

Loading

and one that feels pretty likely to break real programs,
I would definitely hope not.

A better example is if evaluation of any element throws. I can predict with high certainty that users will notice and care about the difference between processing the loop body for the first handful elements and then throwing vs throwing before processing any elements. Imagine if say the loop body is writing to a file and the 15th element throws when evaluated. Pretty significant difference if the file ends up empty, vs if it has lines for the previously processed elements.

If initially implemented with one at a time evaluation, then users may come to depend on the file having the lines from all elements before the throwing was was evaluated. If initially pre-evaluating all elements then users could easily become reliant on that behavior.

And files are hardly the only such scenarios. This could also apply to GUI scenarios like adding elements to listbox, or whatever.

Obviously users understand they can get partial execution of the loop if the body throws. And users quickly learn about lazy Enumerables, like those created by generators and or LINQ, where they can get partial execution from the enumerator throwing. But users are used to any type of foreach-able construct either being eagerly evaluated or lazily evaluated.

But here you are trying to reserve the right for the compiler to sometimes treat the collection expression as eagerly evaluated (like arrays or spans or most collection classes), and other times to be lazily evaluated like a generator or LINQ Enumerables, and possibly even mix the two within one expression in undefined ways that may change in the future. Which is very much not something users are used to, and something which will be hard to explain to users.

It's not a breaking change as it's explicitly marked as unspecified. Taking a dependency on it puts one in unsupported territory, with changes free to happen.

Average users don't read the compiler specification closely enough to have any idea what is marked as unspecified or not. They see the new feature announcement which gives the basic overview (which probably won't have a giant warning about eagerness being unspecified) or they discover the feature in some codebase. Either way they play around with it a bit, and check (experimentally) to see the current behavior for things they might care about (like eager or lazy evaluation) and then assume it won't change in the future.

Member

CyrusNajmabadi Feb 23, 2024

A better example is if evaluation of any element throws

As mentioned previously, evaluation order of elements is strictly defined.

Member

CyrusNajmabadi Feb 23, 2024

and something which will be hard to explain to users

For the tiny set of people that run into an issue here, the situation can be explained.

and then assume it won't change in the future.

Then that is the consequence of behaving that way. We have made a decision balancing the needs of the ecosystem as a whole. If developers run into an issue because of that, we are ok with it on balance.

Member

CyrusNajmabadi Feb 24, 2024

This could also apply to GUI scenarios like adding elements to listbox, or whatever.

It shouldn't. You can only get the final collection if no throwing occurred during the creation of the collectino. It is all assigned to a temp. So you either get it all, or you get nothing (since the actual code that cn reference that newly created collection doesn't run due to the exception).

Member

CyrusNajmabadi Feb 24, 2024

which probably won't have a giant warning about eagerness being unspecified

For nearly all users, there's no need to specify. Nearly all users with nearly all cases will simply get collections that work well including wrt to the efficiency of producing the result. Users that absolutely depend not only on element evaluation order, but enumeration order of those elements are going to be extremely slim. To the point that we literally haven't heard of a single user being affected by this in the entire beta and release period since release.

This indicates to me that this is not the sort of footgun being warned about. But is rather just a small part of the design that most people won't ever need to know about or be affected by. This is in line with many other features we have where there's some niche case that might result in slightly unexpected or undefined behavior if the user is doing something very far out of the norm. As such, designing and documenting it in the same vein seems totally ok :)

KevinCathcart Feb 24, 2024

@CyrusNajmabadi I really don't think we are talking about the same thing, and that may explain some confusion.

As best I can tell, you seem to be talking about part of the already shipped collection expression feature for building collections where iteration order of spreads when building a collection may vary relative to the evaluation of elements. If that is what you are talking about then yes, I fully agree that it is really unlikely that changes there break people in practice.

But what I am talking about here is part of proposed specification for foreach loops over collection expressions. What I am looking at seems to say executions of the body of the foreach loop can interleave in unspecified ways with evaluating the elements of collection expression passed to the foreach loop:

The elements of the collection are evaluated in order, and the loop body is executed for each element in order. The compiler may evaluate subsequent elements before executing the loop body for preceding elements.

Which says that foreach (int i in [1, myfunc()]) { foo(i); } might do:

evaluate 1
run foo(1)
evaluate myfunc() (which happens to throw)

or it might do:

evaluate 1.
evaluate myfunc() (which happens to throw).
Would now try to run loop body for each, which won't happen because of the throw.

And furthermore which one it does might change in the future. That is what I have concerns about.

Member

CyrusNajmabadi Feb 24, 2024

I understand your concerns. We'll def consider them during the ldm discussions on this topic.

KevinCathcart commented Feb 24, 2024

If I understand correctly, this specification would not enable users to do:

foreach(int digit in isEven ? [0,2,4,6,8] : [1,3,5,7,9] )
{...}

That is because the proposed changes for foreach are defined in terms of the collection directly being a collection expression. So instead users would need to wrap that with another collection expression and a spread: foreach(int digit in [..isEven ? [0,2,4,6,8] : [1,3,5,7,9]]). (As an aside, if I am reading this right, that wrapping won't work with an implicitly typed iteration variable because it won't have any best common type, which feels unfortunate, so users would be forced to explicitly type the iteration variable for this scenario.)

Allowing foreach to support any col<Tₑ> when there is an explicitly typed iteration variable (of type Tₑ), seems like it might be possible. This could be done be by offering col<Tₑ> as a target type after exhausting the previously defined foreach binding approaches. Admittedly the lowering could get somewhat complicated. For example, the code could have a conditional operator with an IEnumerable<char> on one side, and a ReadOnlySpan<char> on the other, and as far as I can tell that conditional would happily target type to col<char>, but the lowerings for each side of that conditional are different. A combination lowering should be possible to implement though, as I show later.

Supporting similar for implicitly typed iteration variables, like in foreach(var digit in isEven ? [0,2,4,6,8] : [1,3,5,7,9] ) would seem harder to specify. Perhaps it can be done by specifying that if the type of the collection cannot be found without target typing, then recurse into the expression, collecting up a bunch of types/expressions) to use to calculate a best common type as follows:

for an expression with an iteration type, that iteration type gets contributed towards finding the best common type.
for a conditional expression recurse each side with this algorithm
for a collection expression each non-spread element gets contributed towards finding the best common type while the spreads would recurse.

I know that isn't correct specification language, but hopefully it gets the idea across.

Now it may be that further extending foreach beyond what this proposal has to support that scenario is too much work for too little gain, but thought it was worth considering.

Example of lowering a conditional `col` when sides do not share a type.

public void Z(bool b, IEnumerable<char> left, ReadOnlySpan<Char> right)
{
    foreach(var c in b ? left : right)
    {
        //body
    }
}

could be lowered as:

public void Z(bool b, IEnumerable<char> left, ReadOnlySpan<char> right)
{
    // Save off condition, as we only want to evaluate it once.
    bool _cond = b;
    // left side helper variable(s)
    IEnumerator<char>? _enumerator = default;
    // right side helper variable(s)
    ReadOnlySpan<char> _span = default;
    int _num = default;

    if(_cond) {
        // left setup;
        _enumerator = left.GetEnumerator();
    } else {
        //right setup;
         _span = right;
        _num=0;
    }

    try{
        while(_cond ? _enumerator!.MoveNext() : _num<_span.Length)
        {
            char c = _cond ? _enumerator!.Current : _span[_num];

            // Body goes here

            if(_cond) {
                //left side end of loop
            } else {
                // right side end of loop
                _num++;
            }
        }
    }
    finally
    {
        if (_cond)
        {
            // left side finally
            _enumerator?.Dispose();
        }
        else
        {
            //right side finally
        }
    }
}

I am obviously using the convention of leading underscore indicating helper variables not visible to user written code, rather than indicating fields or similar.

Member

CyrusNajmabadi commented Feb 24, 2024

If I understand correctly, this specification would not enable users to do:

The intent is for that to be possible. And our last design meeting on the topic indicated a means by which that would be possible.

Member

CyrusNajmabadi commented Feb 24, 2024

The general idea (similar to what you were saying) is that we will target to some collection_type<Te> (just for the purposes of binding all the elements. But we will also effectively say that that collection_type itself is irrelevant and that the compiler is free to substitute any type it wants there (or eschew using a type at all).

cston added 2 commits

March 3, 2024 16:25


          Updates

c242f78


          Merge remote-tracking branch 'upstream/main' into spread-conditional

053a2d7

hez2010 commented Mar 31, 2024 •

edited

Loading

I think some part of this proposal needs to be updated with #7994
which makes using Span<T>, ReadOnlySpan<T> as the underlying type a thing in async code as well.

KennethHoff commented Mar 31, 2024 •

edited

Loading

I realize this isn't really intended to add new syntax, but instead make existing syntax work inside collections (.. it's also a PR, and not an issue or even a discussion), but it would be cool to have a binary operator that conditionally does something and nothing otherwise, like the following:

List<int> ints = [1, 2, 3, a && ..[4], 10];

Which equates to

List<int> ints = [1, 2, 3, ..a ? [4] : [], 10)


          Merge remote-tracking branch 'upstream/main' into spread-conditional

527ed1a

cston mentioned this pull request

Collection expressions: inferred element type #8582

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet