Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How best to implement Linq's iterators. #15289

Closed
JonHanna opened this issue Sep 26, 2015 · 8 comments
Closed

How best to implement Linq's iterators. #15289

JonHanna opened this issue Sep 26, 2015 · 8 comments

Comments

@JonHanna
Copy link
Contributor

The current approach with System.Linq.Enumerator's enumerables and their enumerators copies the approach for compiler-generated iterators; the same instance that serves as the enumerable serves as the result for the first call to GetEnumerator(), so avoiding an extra allocation.

There is a race inherent to this approach; if two threads called GetEnumerator() at the same time they could end up with the same object when they should have separate instances. This is resolved by capturing Environment.CurrentManagedThreadId on construction and only using this as the result of GetEnumerator() if it matches a second call.

This relatively expensive call is of no value if:

  1. GetEnumerator() is never called (many optimisations skip it in various ways).
  2. GetEnumerator() is repeatedly called (after the first call).
  3. GetEnumerator() is only called once, on another thread (e.g. if a query is constructed before an await and enumerated after it).
    [WIP] Avoid race in GetEnumerator with Interlocked rather than ThreadID check corefx#3313 changes the approach to changing the state with Interlocked.CompareExchange() to resolve the race in a different way. This means no call in case 1 and 3 above, but is more expensive in case 2. It's about even in the case of one GetEnumerator() at least as some tests with Win64 DNX suggest (CompareExchange being about twice the case of CurrentManagedThreadId and being called half as often).

In the discussion on that pull request, @VSadov points out that the optimisation of avoid an allocation may not be as valuable as it once was in any case. Some very limited experimentation suggests that indeed just creating a new object on every call to GetEnumerator() is a clear win.

There are a few possible variants here. We can have the same classes that server as both enumerable and enumerator, or separate classes (some reduction in state held, some increase in jitted code). If separate classes are used, internal calls to GetEnumerator() can be moved to this object's construction, further reducing state held in some cases, but being an observable change if that call throws an exception.

A completely different possibility is to use the current approach, but avoid calling CurrentManagedThreadId is some cases.

Some experimentation is in order.

Some possible approaches:

  1. Leave things as they are.
  2. Use the basic approach currently used, but avoiding the CurrentManagedThreadId call, https://github.com/JonHanna/corefx/tree/fewer_threadid_calls
  3. Use Interlocked.CompareExchange(): https://github.com/JonHanna/corefx/tree/defer_obtaining_threadid
  4. Use the same classes as currently, but always allocating for GetEnumerator(): https://github.com/JonHanna/corefx/tree/fresh_enumerators
  5. Use separate enumerators https://github.com/JonHanna/corefx/tree/separate_enumerators_cautious
  6. Use separate enumerators, shifting GetEnumerator() calls (observable change) https://github.com/JonHanna/corefx/tree/separate_enumerators

I'm planning to do some comparisons with these variants, but I'm opening this issue ahead of that for suggestions as to situations the comparisons must cover.

In particular:

Is 6 (with the observable change) completely out of the question?

What tests would one expect avoiding an allocation to do better than not avoiding it? (First brief experiments suggest it's always a lose, but I'm wary of removing what was clearly intended as an optimisation if I can't see where it succeeded in optimising, though maybe it's just a matter of older CLR versions).

@JonHanna
Copy link
Contributor Author

Option 6 ruled out, as I can create too many cases where code that handles an exception would become code that fails to do so. It might be worth doing, but the impact is probably worth considering as a separate issue.

@joshfree
Copy link
Member

joshfree commented Oct 1, 2015

/cc @jaredpar

@VSadov
Copy link
Member

VSadov commented Oct 1, 2015

I think option 6 is not an option. It is an observable change for sideeffecting GetEnumerator implementations. That is not extremely common, but still possible for GetEnumerator to open a file or grab a lock. We would not want to change when that happens.

@JonHanna
Copy link
Contributor Author

JonHanna commented Oct 1, 2015

Yes, I already ruled that one out. I haven't had a chance to race the others yet, because it needs a time while my computer is neither turned off nor in such heavy use that could upset the measures, and it tends to be one of the other, but a holiday soon will be a fine time to have comparisons running away while I do other things.

@VSadov
Copy link
Member

VSadov commented Oct 1, 2015

I tried to make some synthetic benchmarks measuring just overhead of iterator strategies between 1) ThreadID 2)Interlocked 3)enumerator/enumerable not merged.

Time-wise the results are very close with ThreadID approach being faster than Interlocked and separate enumerator being yet slightly faster. That is under ideal conditions for combined enumerable/enumerator - when enumerable is iterated just once.

The main concern is that in a most trivial case (enumerable wraps a single array) separate iterator results in 14%-23% (32bit, 64bit) theoretical increase in allocated bytes compared to current implementation. That is because while we would have same fields and drop ThreadID, we would also have two objects allocated instead of one each having a header (2 native ptrs).

So basically, we will save on synchronization calls, but add some memory management costs (in amortized sense - including GC). These expenses are very hard to compare since they tend to scale differently between scenarios/hardware/platforms.

For example on a busy manycore box Interlocked would be more expensive, even more so on ARM.

So far I think the "separate enumerator" is the more promising of all because

  • Memory management in contemporary systems is very well tuned and is getting only better.
  • The life time of iterators would be short, making it easy case for GC. If this is not the case, then the scenarios would likely have other perf dominating factors, than the one we are looking at.
  • Non-optimal case where enumerable is iterated more than once is still possible, and in such case "separate enumerator" is definitely better.
  • In many cases Linq iterators are bypassed, so not allocating one would be a saving.

However I do not yet feel convinced enough, to make changes.

@VSadov VSadov assigned VSadov and unassigned jaredpar Oct 1, 2015
@stephentoub
Copy link
Member

cc: @KrzysztofCwalina

@VSadov
Copy link
Member

VSadov commented Nov 19, 2016

Ok, we have to get this to some conclusion.

Assuming that

  1. It was observed a while ago that most queries are enumerated only once. I think this observation still holds.
  2. If we separate Enumerable/Enumerator classes, we would need to pay for 2 object headers + duplicated data like "source", "predicate", that would not be compensated by the loss of ThreadID

I think the above rules out "just separate Enumerable and Enumerator classes".
The most attractive part about this solution is that it does not involve additional dependencies on threading stuff, but that does not seem to be bothering anyoine.

Now we have to decide between CurrentManagedThreadId and InterlockedExchange

On one hand we could save an int field, on the other hand CurrentManagedThreadId seems to be very fast since it is just a fetch from the TLS, while the cost of Interlocked is less clear. Some CPU may have pretty heavy implementations.

Considering that we do not have a very strong case against CurrentManagedThreadId, I will consider keeping existing implementation as acceptable and close this.

If we find other reasons, substantial enough to tip the scales, we can revisit.

@VSadov VSadov closed this as completed Nov 19, 2016
@VSadov VSadov removed their assignment Nov 19, 2016
@jcouv
Copy link
Member

jcouv commented Apr 26, 2018

Note: The "async streams" feature is running into a similar question.
When generating a type that implements IAsyncEnumerable (from an async IAsyncEnumerable M() {...} method), the compiler will produce a GetAsyncEnumerator() method.

Based on the discussion above, I will stick with the current pattern (if state machine is stopped and if it came from current thread, then we'll use this, otherwise we'll instantiate an enumerator).

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants