-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collection expressions: inline collections in spreads and foreach #7864
base: main
Are you sure you want to change the base?
Conversation
91db6cf
to
5afdd36
Compare
|
||
An expression `E` is *spreadable* as values of type `Tₑ` if one of the following holds: | ||
* `E` is a *collection expression* and for each element `Eᵢ` in the collection expression there is an implicit conversion to `Tₑ`. | ||
* `E` is a [*conditional expression*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/expressions.md#1115-conditional-operator) `b ? x : y`, and `x` and `y` are *spreadable* as values of type `Tₑ`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only curious: Is there anywhere in the spec that states this in the opposite form, for purposes like target-typed new inside a conditional expression looking outward? Could both directions be unified somehow in some definition of type inference for any language constructs that support "seeing through" them?
@@ -250,6 +259,8 @@ If the target type is an *array*, a *span*, a type with a *[create method](#crea | |||
* For each element in order: | |||
* If the element is an *expression element*, the initialization instance *indexer* is invoked to add the evaluated expression at the current index. | |||
* If the element is a *spread element* then one of the following is used: | |||
* If the spread element expression is a *collection expression*, then the collection expression elements are evaluated in order as if the elements were in the containing collection expression. | |||
* If the spread element expression is a *conditional expression*, then the condition is evaluated and if `true` the second-, otherwise the third-operand is evaluated as if the operand was an element in the containing collection expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we did not specify treating conditional expressions differently during construction than other expressions, and leave compiler optimizations discretionary?
I like the |
|
||
### Conversions | ||
|
||
To support conversions for inline collection expressions, an abstract *collection_type* is introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abstract
is a bit ambiguous. I would say anonymous
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced "abstract" with "compile time".
|
||
The [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) of a *collection_type* `col<E>` is `E`. | ||
|
||
An implicit *collection_type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An implicit *collection_type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`. | |
An implicit *collection type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to refer to iteration type? or should we say element type
? (just thinking upon the last few ldm talks).
> | ||
> The implicit *collection expression conversion* exists if the type has an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` where for each *element* `Eᵢ` in the collection expression: | ||
> * If `Eᵢ` is an *expression element*, there is an implicit conversion from `Eᵢ` to `Tₑ`. | ||
> * If `Eᵢ` is a *spread element* `..Sᵢ`, there is an implicit conversion **from `Sᵢ` to the *collection_type* `col<Tₑ>`**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
|
||
The elements of a nested collection expression within a spread can be added to the containing collection instance directly, without instantiating an intermediate collection. | ||
|
||
The elements of the nested collection expression are evaluated in order within the nested collection. There is no implied evaluation order between the elements of the nested collection and other elements within the containing collection expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems wrong to me. there should def be an implied evaluation order. if i have
[a, b, .. c ? [d] : [e], f, g]
i would expect that d
and e
would be evaluated before f
and g
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to:
The elements of a collection expression within a spread are evaluated in order as if the spread elements were declared in the containing collection expression directly.
* If `Eᵢ` is an *expression element*, the type of `Eᵢ`. | ||
* If `Eᵢ` is a *spread element* `..Sᵢ`, the [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) of `Sᵢ`. | ||
|
||
Since collection expressions do not have a *type* or an *iteration type*, the *best common type* does not consider elements from nested collection expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pity. mght be something to reconsider later. for example:
[a, b, .. c ? [d] : [e] f, g]
. I could imaginet he type is the bct of a, c, d, e, f, g
```csharp | ||
byte[] x = [1, .. b ? [2] : []]; // error: cannot convert int to byte | ||
int[] y = [1, .. b ? [default] : []]; // error: no type for [default] | ||
int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null> | |
int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null> | |
Predicate<int>[] predicates = [a, .. b ? [i => i.ToString()] : []]; // No type for lambda. | |
Predicate<int>[] predicates = [a, .. b ? [(int i) => i.ToString()] : []]; // Func<int, string> not compatible with Predicate<int> |
Natural type would also allow a subset of the scenarios that are supported above. But since the above proposal relies on target typing and natural type relies on *best common type* of the elements within the collection expression, there will be some limitations if we don't also support target typing with nested collection expressions. | ||
```csharp | ||
byte[] x = [1, .. b ? [2] : []]; // error: cannot convert int to byte | ||
int[] y = [1, .. b ? [default] : []]; // error: no type for [default] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int[] y = [1, .. b ? [default] : []]; // error: no type for [default] | |
int[] y = [1, .. b ? [default] : []]; // error: no type for [default] | |
string?[] y = [1, .. b ? [null] : []]; // error: no type for [null] |
|
||
For a collection expression used as the collection in a `foreach` statement, the compiler may use any conforming representation for the collection instance, including eliding the collection. | ||
|
||
The elements of the collection are evaluated in order, and the loop body is executed for each element in order. The compiler *may* evaluate subsequent elements before executing the loop body for preceding elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler may evaluate subsequent elements before executing the loop body for preceding elements.
Is that really something that should be left unspecified? The compiler cannot easilly change it later, as that would be an observable breaking change, and one that feels pretty likely to break real programs, whenever somebody does foreach (Foo foo in [myFoo, ..someFunctionWithSideEffects()])
, especially if someFunctionWithSideEffects might mutate myFoo
.
Edit: Actually an even better example of a case where users would absolutely notice a change if one is made later is if some element throws an exception when evaluated. That is a case where it could make a really big difference to users as to whether the exception occurs before any loop iterations, or not.
I'd honestly expect all the elements to be evaluated first before any loop executions, simply because that it what would happen if the expression were converted to literally any runtime type.
The only sensible implementation strategies I can even think of that would allow for delayed evaluation would be unrolling the loop body (even if an unusual unrolling like: loop and process these initial literal elements, then loop and process these elements from a spread), or the creation of a generator that yields the elements one by one, and then iterating over that generator's output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that really something that should be left unspecified?
Yes. It allows the compiler great leeway in the impl strategy for converting the expressions to a collection.
The compiler cannot easilly change it later, as that would be an observable breaking change
It's not a breaking change as it's explicitly marked as unspecified. Taking a dependency on it puts one in unsupported territory, with changes free to happen.
and one that feels pretty likely to break real programs,
I would definitely hope not.
whenever somebody does foreach (Foo foo in [myFoo, ..someFunctionWithSideEffects()]), especially if someFunctionWithSideEffects might mutate myFoo.
NOte: it's fine for someFunctionWithSideEffects
to have side effects. Those will happen after myFoo
is evaluated, and before any other elements are evaluated. What is unspecified is what it means is if the iteration of that evaluated value itself mutates. That would be a pattern we consider too 'out there', and we're willing to say that if your iterators do this, then collection expressions are not a good fit for you. We want to optimzie for the 99.99% case, where iteration order will not have an impact.
Note: to reiterate. Evaluation-order of the elements is still guaranteed to be left-to-right, just as it always has been in C#. So side effects from prior element evaluations will be seen by later element evaluations.
I'd honestly expect all the elements to be evaluated first before any loop executions
That would require stack space for all their values. That would def be undesirable in many cases. Simply imagine a collection expr with 1000 elements. If we require all that evaluation up front, we need to store all those values somewhere (like teh stack) then copy them into the dest. However, if we give the compiler freedom, it can do something like use a single space on the stack, evaluate each element it it, add to the dest, and the move on to the next (reusing that single space).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and one that feels pretty likely to break real programs,
I would definitely hope not.
A better example is if evaluation of any element throws. I can predict with high certainty that users will notice and care about the difference between processing the loop body for the first handful elements and then throwing vs throwing before processing any elements. Imagine if say the loop body is writing to a file and the 15th element throws when evaluated. Pretty significant difference if the file ends up empty, vs if it has lines for the previously processed elements.
If initially implemented with one at a time evaluation, then users may come to depend on the file having the lines from all elements before the throwing was was evaluated. If initially pre-evaluating all elements then users could easily become reliant on that behavior.
And files are hardly the only such scenarios. This could also apply to GUI scenarios like adding elements to listbox, or whatever.
Obviously users understand they can get partial execution of the loop if the body throws. And users quickly learn about lazy Enumerables, like those created by generators and or LINQ, where they can get partial execution from the enumerator throwing. But users are used to any type of foreach-able construct either being eagerly evaluated or lazily evaluated.
But here you are trying to reserve the right for the compiler to sometimes treat the collection expression as eagerly evaluated (like arrays or spans or most collection classes), and other times to be lazily evaluated like a generator or LINQ Enumerables, and possibly even mix the two within one expression in undefined ways that may change in the future. Which is very much not something users are used to, and something which will be hard to explain to users.
It's not a breaking change as it's explicitly marked as unspecified. Taking a dependency on it puts one in unsupported territory, with changes free to happen.
Average users don't read the compiler specification closely enough to have any idea what is marked as unspecified or not. They see the new feature announcement which gives the basic overview (which probably won't have a giant warning about eagerness being unspecified) or they discover the feature in some codebase. Either way they play around with it a bit, and check (experimentally) to see the current behavior for things they might care about (like eager or lazy evaluation) and then assume it won't change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better example is if evaluation of any element throws
As mentioned previously, evaluation order of elements is strictly defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and something which will be hard to explain to users
For the tiny set of people that run into an issue here, the situation can be explained.
and then assume it won't change in the future.
Then that is the consequence of behaving that way. We have made a decision balancing the needs of the ecosystem as a whole. If developers run into an issue because of that, we are ok with it on balance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also apply to GUI scenarios like adding elements to listbox, or whatever.
It shouldn't. You can only get the final collection if no throwing occurred during the creation of the collectino. It is all assigned to a temp. So you either get it all, or you get nothing (since the actual code that cn reference that newly created collection doesn't run due to the exception).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which probably won't have a giant warning about eagerness being unspecified
For nearly all users, there's no need to specify. Nearly all users with nearly all cases will simply get collections that work well including wrt to the efficiency of producing the result. Users that absolutely depend not only on element evaluation order, but enumeration order of those elements are going to be extremely slim. To the point that we literally haven't heard of a single user being affected by this in the entire beta and release period since release.
This indicates to me that this is not the sort of footgun being warned about. But is rather just a small part of the design that most people won't ever need to know about or be affected by. This is in line with many other features we have where there's some niche case that might result in slightly unexpected or undefined behavior if the user is doing something very far out of the norm. As such, designing and documenting it in the same vein seems totally ok :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CyrusNajmabadi I really don't think we are talking about the same thing, and that may explain some confusion.
As best I can tell, you seem to be talking about part of the already shipped collection expression feature for building collections where iteration order of spreads when building a collection may vary relative to the evaluation of elements. If that is what you are talking about then yes, I fully agree that it is really unlikely that changes there break people in practice.
But what I am talking about here is part of proposed specification for foreach loops over collection expressions. What I am looking at seems to say executions of the body of the foreach loop can interleave in unspecified ways with evaluating the elements of collection expression passed to the foreach loop:
The elements of the collection are evaluated in order, and the loop body is executed for each element in order. The compiler may evaluate subsequent elements before executing the loop body for preceding elements.
Which says that foreach (int i in [1, myfunc()]) { foo(i); }
might do:
- evaluate 1
- run foo(1)
- evaluate myfunc() (which happens to throw)
or it might do:
- evaluate 1.
- evaluate myfunc() (which happens to throw).
- Would now try to run loop body for each, which won't happen because of the throw.
And furthermore which one it does might change in the future. That is what I have concerns about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concerns. We'll def consider them during the ldm discussions on this topic.
If I understand correctly, this specification would not enable users to do: foreach(int digit in isEven ? [0,2,4,6,8] : [1,3,5,7,9] )
{...} That is because the proposed changes for foreach are defined in terms of the collection directly being a collection expression. So instead users would need to wrap that with another collection expression and a spread: Allowing foreach to support any Supporting similar for implicitly typed iteration variables, like in
I know that isn't correct specification language, but hopefully it gets the idea across. Now it may be that further extending foreach beyond what this proposal has to support that scenario is too much work for too little gain, but thought it was worth considering. Example of lowering a conditional `col` when sides do not share a type.public void Z(bool b, IEnumerable<char> left, ReadOnlySpan<Char> right)
{
foreach(var c in b ? left : right)
{
//body
}
} could be lowered as: public void Z(bool b, IEnumerable<char> left, ReadOnlySpan<char> right)
{
// Save off condition, as we only want to evaluate it once.
bool _cond = b;
// left side helper variable(s)
IEnumerator<char>? _enumerator = default;
// right side helper variable(s)
ReadOnlySpan<char> _span = default;
int _num = default;
if(_cond) {
// left setup;
_enumerator = left.GetEnumerator();
} else {
//right setup;
_span = right;
_num=0;
}
try{
while(_cond ? _enumerator!.MoveNext() : _num<_span.Length)
{
char c = _cond ? _enumerator!.Current : _span[_num];
// Body goes here
if(_cond) {
//left side end of loop
} else {
// right side end of loop
_num++;
}
}
}
finally
{
if (_cond)
{
// left side finally
_enumerator?.Dispose();
}
else
{
//right side finally
}
}
} I am obviously using the convention of leading underscore indicating helper variables not visible to user written code, rather than indicating fields or similar. |
The intent is for that to be possible. And our last design meeting on the topic indicated a means by which that would be possible. |
The general idea (similar to what you were saying) is that we will target to some |
I think some part of this proposal needs to be updated with #7994 |
I realize this isn't really intended to add new syntax, but instead make existing syntax work inside collections (.. it's also a PR, and not an issue or even a discussion), but it would be cool to have a binary operator that conditionally does something and nothing otherwise, like the following: List<int> ints = [1, 2, 3, a && ..[4], 10]; Which equates to List<int> ints = [1, 2, 3, ..a ? [4] : [], 10) |
Support collection expressions inline in expression contexts where the collection type is unspecified.
In spreads to add elements conditionally to the containing collection:
In
foreach
: