Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Skip allows a Collection to be modified during enumeration #40415

Closed
Tracked by #64601
joshbartley opened this issue Aug 5, 2020 · 17 comments
Closed
Tracked by #64601

.Skip allows a Collection to be modified during enumeration #40415

joshbartley opened this issue Aug 5, 2020 · 17 comments
Labels
area-System.Linq enhancement Product code improvement that does NOT require public API changes/additions needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@joshbartley
Copy link

Description

I flipped a variable and found out that .Skip allows the IEnumerable to be modified during enumeration which caused an infinite loop. Related to a similar issue that was moved in [dotnet/corefx] (#32278).

            var example = new List<string>() { "test0","test1" };  
            var i = 0;  
            foreach (var item in example.Skip(1))
            {
                example.Add("test");
                i++;
                if (i > 1000)
                {
                    Console.WriteLine("Modified IEnumerable through LINQ");
                    break;
                }
            }

Tested without LINQ, and the exception is thrown that you cannot modify a IEnumerable in progress.

Tried with .OrderBy() and it allowed the code to run but did not add new entries to the end of the IEnumerable and would only run twice, which could be a separate bug.

I have not tried with a LINQ/Lambda only statement as the first time this happened it locked a 24gb Ryzen machine.

Configuration

.Net 3.1 ,Windows 10 x64

Regression?

Previously any modification of the IEnumerable while enumerating caused an exception.

Other information

An optimization caused an issue with a similar bug report for 2.1 dealing with infinite loops and IEnumerable [dotnet/corefx] (#32278). It was decided to roll back that merge request to fix the issue.

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Aug 5, 2020
@Clockwork-Muse
Copy link
Contributor

OrderBy has to create a copy of the input enumerable (O(n log n)) in order to serve up the results in reasonable time (O(n)), since otherwise it would scan the list each time (O(n^2)), so that's not surprising. What did you mean by "only run twice", though?

@joshbartley
Copy link
Author

When using OrderBy(x=> x) versus Skip(1) the loop only runs twice, once for each item. An exception should still throw for OrderBy() as in the example it is modifying a IEnumerator which is not allowed.

@benaadams
Copy link
Member

It doesn't look to use the IEnumerable but use IList indexing (hence modified enumerator not triggered).

However, it should probably use sourceList.Count - count as the upperbound rather than int.MaxValue so max enumerations is snapshotted at the creation?

private static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count) =>
source is IList<TSource> sourceList ?
(IEnumerable<TSource>)new ListPartition<TSource>(sourceList, count, int.MaxValue) :

Uses both _source.Count and _maxIndexInclusive as a guard on enumeration; but that's not helpful if _source.Count is increasing and _maxIndexInclusive is set to 2 billion rather than starting count - skip

int index = _state - 1;
if (unchecked((uint)index <= (uint)(_maxIndexInclusive - _minIndexInclusive) && index < _source.Count - _minIndexInclusive))
{
_current = _source[_minIndexInclusive + index];

@Clockwork-Muse
Copy link
Contributor

When using OrderBy(x=> x) versus Skip(1) the loop only runs twice, once for each item. An exception should still throw for OrderBy() as in the example it is modifying a IEnumerator which is not allowed.

Not a bug in itself, since it's iterating over the "original" items. Since it appears to be using indices directly, it's likely snapshotting the range.

@benaadams - possibly, although that's still not going to throw in all the same cases (ie, add+remove, which will throw for an enumerator, but not due to index checking). Likely good enough for this case, though.

@benaadams
Copy link
Member

possibly, although that's still not going to throw in all the same cases

It wouldn't throw at all; however throwing is a peculiarity of some collections not guaranteed by IEnumerable. Looping effectively forever is more of an issue?

@Clockwork-Muse
Copy link
Contributor

True, yes.

@ghost
Copy link

ghost commented Aug 6, 2020

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

@eiriktsarpalis eiriktsarpalis added this to the Future milestone Aug 11, 2020
@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Aug 11, 2020
@stephentoub
Copy link
Member

However, it should probably use sourceList.Count - count as the upperbound rather than int.MaxValue so max enumerations is snapshotted at the creation?

No, there shouldn't be a difference functionally between:

List<T> list = ...;
IEnumerable<T> e = list.SomeLinqOps();
list.Add(...)/Remove(...)/etc.;
foreach (T t in e) { ... }

and

List<T> list = ...;
list.Add(...)/Remove(...)/etc.;
IEnumerable<T> e = list.SomeLinqOps();
foreach (T t in e) { ... }

Snapshotting of anything to do with the contents of the list shouldn't happen until the enumerable is actually enumerated.

@eiriktsarpalis
Copy link
Member

We could perhaps improve it so that the list count is snapshotted on GetEnumerator(). Note however that ListPartition<T> implements IEnumerator itself so any changes to that effect would have to be made carefully.

@eiriktsarpalis eiriktsarpalis added the enhancement Product code improvement that does NOT require public API changes/additions label Dec 1, 2020
@eiriktsarpalis
Copy link
Member

I tried prototyping snapshotting logic for ListPartition however I'm not particularly satisfied with the result. Due to the current design of the internal Iterator<TResult> type, the snapshotting needs to happen in the first MoveNext() call, which could have performance ramifications.

The question then is whether we consider the potential infinite loop serious enough to make such an inclusion.

@jeffhandley
Copy link
Member

I'm inclined to close this without resolution, but I'd like to confirm a couple understandings I have on it.

  1. Guarding against mutation will incur a performance penalty; the more robustly we guard against it, the steeper the penalty
  2. Such an infinite loop bug would likely be encountered during development/testing
  3. It's ostensible that this loophole could be used for good; I can imagine a scenario where under certain circumstances, new elements are added to be included in the enumeration, but without causing an infinite loop

@jeffhandley jeffhandley added the needs-author-action An issue or pull request that requires more info or actions from the author. label Feb 9, 2022
@ghost
Copy link

ghost commented Feb 9, 2022

This issue has been marked needs-author-action since it may be missing important information. Please refer to our contribution guidelines for tips on how to report issues effectively.

@joshbartley
Copy link
Author

I believe the root cause of this started from a performance optimization and now exposed this issue. I understand that performance is important and this is a bit of an edge case. In my case it was an accident during testing, while writing unit tests as I was creating a lot of test data. With int.MaxValue being so large, it crashed the whole system allocating all those objects.

Keeping the performance penalty in mind, can there be a simple guard in a debug configuration? It's really terrible to run into this. Remind me of classic asp and forgetting the .MoveNext and crashing the system too. Like if the list size is less than int.MaxValue, the _maxIndexInclusive is a smaller reasonable number since (twice list count or .Capacity which ever is higher?) it doesn't feel that int.MaxValue was picked for a reason besides that is the highest it can go. I don't know personally any collections that are 2 billion large but maybe you all know that use case better since you work on the framework.

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Feb 9, 2022
@Clockwork-Muse
Copy link
Contributor

It's ostensible that this loophole could be used for good; I can imagine a scenario where under certain circumstances, new elements are added to be included in the enumeration, but without causing an infinite loop

There are no scenarios where this is reasonable, even without the infinite loop bug.

Modifying a collection during an enumeration is always user error, and should be fixed by one or more different mechanisms (yielding an additional item, building a list to concatenate on the end...)

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Feb 9, 2022

I should clarify that the workaround I had proposed does not guard against mutation in general, it simply attaches an upper bound to specifically prevent the type of blow-up encountered in the original example. The following would still be legal:

            var example = new List<string>() { "test0","test1" };  
            var i = 0;  
            foreach (var item in example.Skip(1))
            {
                example.RemoveAt(^1);
            }

I suspect a "correct" fix would involve removing the ListPartition<T> optimization altogether, since any LINQ method using it under wraps would be susceptible to the source being modified.

@eiriktsarpalis
Copy link
Member

Closing as by-design, combining mutation with LINQ is arguably a corner case (or rather anti-pattern) that can result in surprising behavior in the presence of performance optimizations. I don't think we should be reverting optimizations on lists for that reason.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Linq enhancement Product code improvement that does NOT require public API changes/additions needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

8 participants