-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐇 Use a segmented list to avoid LOH allocations in the formatter #43464
Conversation
} | ||
|
||
/// <summary> | ||
/// Segmented list implementation, copied from Microsoft.Exchange.Collections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safe to license as per dotnet/MIT license?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming yes since we put it in PerfView already.
/// </summary> | ||
/// <typeparam name="T">The type of the list element.</typeparam> | ||
/// <remarks> | ||
/// This class implement a list which is allocated in segments, to avoid large lists to go into LOH. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid large lists to go into LOH.
is clunky sounding :)
/// Copy to Array | ||
/// </summary> | ||
/// <returns>Array copy</returns> | ||
public T[] UnderlyingArray => ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need both UnderlyingArray and ToArray? The name is also misleading. Underlying makes it sounds like the actual internal array is being returned.
|
||
if (newSegmentIndex != oldSegmentIndex) | ||
{ | ||
_items[oldSegmentIndex] = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pity to lose the array to GC. should it be pooled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely gen 0 by the time it's collected. Pooling may be more expensive with write barriers, plus we don't need to worry about LOH like before.
/// <remarks> | ||
/// This class implement a list which is allocated in segments, to avoid large lists to go into LOH. | ||
/// </remarks> | ||
internal sealed class SegmentedList<T> : ICollection<T>, IReadOnlyList<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super surprising that this a mutable list, but doesn't implement IList
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would also be ok with this not implementing any interfaces. it would be nice to ensure that things like iteration are alloc-free and that sort of thing.
/// <param name="index"></param> | ||
/// <param name="slot"></param> | ||
/// <returns></returns> | ||
public T[] GetSlot(int index, out int slot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer to make thsi as private as possible and not expose internals liek this unless we def have a consumer for it.
} | ||
|
||
/// <summary> | ||
/// Appends a range of elements from anothe list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Appends a range of elements from anothe list. | |
/// Appends a range of elements from another list. |
|
||
if (_capacity < minCapacity) | ||
{ | ||
EnsureCapacity(minCapacity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like we could just have callers call EnsureCapacity, and EnsureCapacity can early bail if necessary.
/// <summary> | ||
/// Returns the enumerator. | ||
/// </summary> | ||
IEnumerator<T> IEnumerable<T>.GetEnumerator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IEnumerator<T> IEnumerable<T>.GetEnumerator() | |
IEnumerator<T> IEnumerable<T>.GetEnumerator() => GetEnumerator(); |
/// <param name="item">Element to check.</param> | ||
bool ICollection<T>.Contains(T item) | ||
{ | ||
throw new NotImplementedException("This method of ICollection is not implemented"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
} | ||
|
||
Array.Copy(_items[lastSegment], 0, _items[lastSegment], 1, lastOffset); | ||
_items[lastSegment][0] = save; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i'm basically glossing over these sections. but i assume they're correct :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. but i would prefer SegmentedList is as minimal as possible. if we can ifdef/remove as much of it's public surface area that we don't need, that would make me happy :)
@danmosemsft It'd be great if corefx would just provide something like this (collections that don't allocate on the LOH; a reality of running on .net). This code was copied from perfview which was copied from exchange, which was copied from sharepoint, which was copied from office shared ... |
That's interesting. @stephentoub has this been discussed before? |
@danmosemsft I discussed this a few times casually with Stephen Toub. The thing that's most interesting to me is the variety of data types which can be ideal for different applications. The two of most interest to me currently are segmented arrays and B+-trees. |
I feel like it's important to have this data type. But not because it's a particularly important one, but primarily to just deal with the LOH problem. i.e. an array truly is hte best representation here for our needs, but this is a workaround to very poor behavior of something out of our control. That seems like a pity as there are many cases where "large" does not equate to "long lived". |
@Maoni0, do you have an opinion on such data structures working around the LOH vs anything we may be able to improve in the LOH? |
If we're taking data structure requests. I'd like to +1 the B+-Trees, and also add readonly NativeHashTable from R2R images (@jkotas may know more about that), and build-time generation of perfect hashing functions for use in read-only hash tables. |
@mjsabby One of my side projects is work on B+-trees. tunnelvisionlabs/dotnet-trees |
Is the improvement caused by avoiding LOH, or is it the improvement actually coming from reduced copying? I suspect that it is the latter. The threshold for LOH is configurable using |
Note that copying of large arrays that contain object references has several cost:
In certain situations, the cost of 3. can be more than the cost of 1. + 2. It is not easy to see the true cost of 3 because of it is hidden in the GC pause time that is hard to attribute back to lines of code. We may consider shifting some of the cost from 3. to 2. to make it easier to attribute by having more precise bulk write barriers (e.g. under config switch). It is probably not a good tradeoff for typical app, but it may be useful for performance analysis or in environments that are willing to trade raw throughput for smaller GC pause times. |
Also, it may be useful to do more than 2 iterations. Benchmarks that are sensitive to GC behavior tend to be very noisy. |
The LOH improvements tend to impact downstream scenarios more than localized testing. For example, LOH allocations inside Visual Studio (which tends to run with less than 500M free VM) often result in frequent Gen 2 GC with observable pauses, while SOH allocations are better able to avoid that.
Each iteration runs the formatter on ~35000 files. The allocation numbers tend to be more predictable than the CPU numbers though. |
So why is the execution time for the second iteration significantly lower? I would expect the two numbers to be much closer to each if they are averages. I assume that you are running this on .NET Framework. It may be interesting to run it on .NET Core 3.1 too. |
Our primary environment is Visual Studio. This is a user facing app where latency is far preferred to throughput to keep the experience responsive. GC pauses (esp LOH ones) are particularly devastating to the experience. This has been one of the main reason we've been moving things out of proc over the years. The GC times in-situ are just so problematic. By moving to another process, we can have independent GCs where the ones in our OOP server don't cause pauses in the VS proc. |
Note: having a latency tuned GC (like what Go has) would be absolutely fantastic for part of our domain. |
LOH tuning is not always possible, so having these data structures standardized would be helpful. |
Two passes prior to this change:
Two passes after this change: