🐇 Use a segmented list to avoid LOH allocations in the formatter #43464

sharwell · 2020-04-17T23:20:31Z

Two passes prior to this change:

Found 0 diagnostics in 64232ms (78658653664 bytes allocated)
Execution time (ms):            1023796.7926

Found 0 diagnostics in 65474ms (79440019016 bytes allocated)
Execution time (ms):            943722.7411

Two passes after this change:

Found 0 diagnostics in 56274ms (83220563552 bytes allocated)
Execution time (ms):            883581.9364

Found 0 diagnostics in 59726ms (83103917688 bytes allocated)
Execution time (ms):            779339.9673

CyrusNajmabadi · 2020-04-18T02:14:26Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+    }
+
+    /// <summary>
+    /// Segmented list implementation, copied from Microsoft.Exchange.Collections.


safe to license as per dotnet/MIT license?

I'm assuming yes since we put it in PerfView already.

CyrusNajmabadi · 2020-04-18T02:14:44Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+    /// </summary>
+    /// <typeparam name="T">The type of the list element.</typeparam>
+    /// <remarks>
+    /// This class implement a list which is allocated in segments, to avoid large lists to go into LOH.


to avoid large lists to go into LOH. is clunky sounding :)

CyrusNajmabadi · 2020-04-18T02:16:09Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+        /// Copy to Array
+        /// </summary>
+        /// <returns>Array copy</returns>
+        public T[] UnderlyingArray => ToArray();


why do we need both UnderlyingArray and ToArray? The name is also misleading. Underlying makes it sounds like the actual internal array is being returned.

CyrusNajmabadi · 2020-04-18T02:16:37Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+
+            if (newSegmentIndex != oldSegmentIndex)
+            {
+                _items[oldSegmentIndex] = null;


pity to lose the array to GC. should it be pooled?

It's likely gen 0 by the time it's collected. Pooling may be more expensive with write barriers, plus we don't need to worry about LOH like before.

CyrusNajmabadi · 2020-04-18T02:17:33Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+    /// <remarks>
+    /// This class implement a list which is allocated in segments, to avoid large lists to go into LOH.
+    /// </remarks>
+    internal sealed class SegmentedList<T> : ICollection<T>, IReadOnlyList<T>


super surprising that this a mutable list, but doesn't implement IList

i would also be ok with this not implementing any interfaces. it would be nice to ensure that things like iteration are alloc-free and that sort of thing.

CyrusNajmabadi · 2020-04-18T02:18:15Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+        /// <param name="index"></param>
+        /// <param name="slot"></param>
+        /// <returns></returns>
+        public T[] GetSlot(int index, out int slot)


would prefer to make thsi as private as possible and not expose internals liek this unless we def have a consumer for it.

CyrusNajmabadi · 2020-04-18T02:19:46Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+        }
+
+        /// <summary>
+        /// Appends a range of elements from anothe list.


Suggested change

/// Appends a range of elements from anothe list.

/// Appends a range of elements from another list.

CyrusNajmabadi · 2020-04-18T02:20:17Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+
+                if (_capacity < minCapacity)
+                {
+                    EnsureCapacity(minCapacity);


feels like we could just have callers call EnsureCapacity, and EnsureCapacity can early bail if necessary.

CyrusNajmabadi · 2020-04-18T02:21:05Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+        /// <summary>
+        /// Returns the enumerator.
+        /// </summary>
+        IEnumerator<T> IEnumerable<T>.GetEnumerator()


Suggested change

IEnumerator<T> IEnumerable<T>.GetEnumerator()

IEnumerator<T> IEnumerable<T>.GetEnumerator() => GetEnumerator();

CyrusNajmabadi · 2020-04-18T02:21:19Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+        /// <param name="item">Element to check.</param>
+        bool ICollection<T>.Contains(T item)
+        {
+            throw new NotImplementedException("This method of ICollection is not implemented");


CyrusNajmabadi · 2020-04-18T02:21:52Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Collections/SegmentedList`1.cs

+                }
+
+                Array.Copy(_items[lastSegment], 0, _items[lastSegment], 1, lastOffset);
+                _items[lastSegment][0] = save;


so i'm basically glossing over these sections. but i assume they're correct :)

CyrusNajmabadi

I like it. but i would prefer SegmentedList is as minimal as possible. if we can ifdef/remove as much of it's public surface area that we don't need, that would make me happy :)

mjsabby · 2020-04-18T08:11:22Z

@danmosemsft It'd be great if corefx would just provide something like this (collections that don't allocate on the LOH; a reality of running on .net). This code was copied from perfview which was copied from exchange, which was copied from sharepoint, which was copied from office shared ...

danmoseley · 2020-04-18T15:52:02Z

That's interesting. @stephentoub has this been discussed before?

sharwell · 2020-04-19T20:32:40Z

@danmosemsft I discussed this a few times casually with Stephen Toub. The thing that's most interesting to me is the variety of data types which can be ideal for different applications. The two of most interest to me currently are segmented arrays and B+-trees.

CyrusNajmabadi · 2020-04-19T21:14:06Z

@danmosemsft I discussed this a few times casually with Stephen Toub. The thing that's most interesting to me is the variety of data types which can be ideal for different applications. The two of most interest to me currently are segmented arrays and B+-trees.

I feel like it's important to have this data type. But not because it's a particularly important one, but primarily to just deal with the LOH problem. i.e. an array truly is hte best representation here for our needs, but this is a workaround to very poor behavior of something out of our control. That seems like a pity as there are many cases where "large" does not equate to "long lived".

stephentoub · 2020-04-19T22:24:55Z

@Maoni0, do you have an opinion on such data structures working around the LOH vs anything we may be able to improve in the LOH?

mjsabby · 2020-04-19T23:41:01Z

If we're taking data structure requests. I'd like to +1 the B+-Trees, and also add readonly NativeHashTable from R2R images (@jkotas may know more about that), and build-time generation of perfect hashing functions for use in read-only hash tables.

sharwell · 2020-04-20T00:56:10Z

@mjsabby One of my side projects is work on B+-trees. tunnelvisionlabs/dotnet-trees

jkotas · 2020-04-20T01:19:18Z

Is the improvement caused by avoiding LOH, or is it the improvement actually coming from reduced copying? I suspect that it is the latter.

The threshold for LOH is configurable using COMPlus_GCLOHThreshold. You can set this to above the maximum array byte size in your benchmark to see the impact of LOH policies alone. I am curious what it is going to tell us.

jkotas · 2020-04-20T01:32:05Z

reduced copying

Note that copying of large arrays that contain object references has several cost:

Copying the memory alone
Write barrier (setting the cards for the copied memory)
Gen0/1 GC having to visit the cards.

In certain situations, the cost of 3. can be more than the cost of 1. + 2. It is not easy to see the true cost of 3 because of it is hidden in the GC pause time that is hard to attribute back to lines of code.

We may consider shifting some of the cost from 3. to 2. to make it easier to attribute by having more precise bulk write barriers (e.g. under config switch). It is probably not a good tradeoff for typical app, but it may be useful for performance analysis or in environments that are willing to trade raw throughput for smaller GC pause times.

jkotas · 2020-04-20T13:33:56Z

Also, it may be useful to do more than 2 iterations. Benchmarks that are sensitive to GC behavior tend to be very noisy.

sharwell · 2020-04-20T13:43:49Z

Is the improvement caused by avoiding LOH, or is it the improvement actually coming from reduced copying? I suspect that it is the latter.

The LOH improvements tend to impact downstream scenarios more than localized testing. For example, LOH allocations inside Visual Studio (which tends to run with less than 500M free VM) often result in frequent Gen 2 GC with observable pauses, while SOH allocations are better able to avoid that.

Also, it may be useful to do more than 2 iterations. Benchmarks that are sensitive to GC behavior tend to be very noisy.

Each iteration runs the formatter on ~35000 files. The allocation numbers tend to be more predictable than the CPU numbers though.

jkotas · 2020-04-20T13:59:41Z

Each iteration runs the formatter on ~35000 files

So why is the execution time for the second iteration significantly lower? I would expect the two numbers to be much closer to each if they are averages.

I assume that you are running this on .NET Framework. It may be interesting to run it on .NET Core 3.1 too.

CyrusNajmabadi · 2020-04-20T16:50:10Z

but it may be useful for performance analysis or in environments that are willing to trade raw throughput for smaller GC pause times.

Our primary environment is Visual Studio. This is a user facing app where latency is far preferred to throughput to keep the experience responsive. GC pauses (esp LOH ones) are particularly devastating to the experience. This has been one of the main reason we've been moving things out of proc over the years. The GC times in-situ are just so problematic. By moving to another process, we can have independent GCs where the ones in our OOP server don't cause pauses in the VS proc.

CyrusNajmabadi · 2020-04-20T16:50:36Z

Note: having a latency tuned GC (like what Go has) would be absolutely fantastic for part of our domain.

mjsabby · 2020-04-20T19:13:11Z

LOH tuning is not always possible, so having these data structures standardized would be helpful.

sharwell added 4 commits April 17, 2020 16:07

Add SegmentedList<T> from PerfView

f503740

Update to match the expected code style

bd526c0

Use SegmentedList<T> to avoid LOH allocations

68a5b69

Use a by-ref comparer for searching in SegmentedList<T>

e9ae579

sharwell requested a review from a team as a code owner April 17, 2020 23:20

sharwell changed the title ~~Use a segmented list to avoid LOH allocations in the formatter~~ 🐇 Use a segmented list to avoid LOH allocations in the formatter Apr 17, 2020

sharwell marked this pull request as draft April 18, 2020 00:36

CyrusNajmabadi reviewed Apr 18, 2020

View reviewed changes

CyrusNajmabadi approved these changes Apr 18, 2020

View reviewed changes

jinujoseph added the Area-IDE label Apr 22, 2020

sharwell mentioned this pull request May 15, 2020

Make all formatting operations into structs. #42863

Closed

jkotas mentioned this pull request Sep 12, 2020

Move off of ImmutableDictionary #47637

Open

sharwell mentioned this pull request Feb 8, 2021

Use segmented collections to avoid LOH allocations in the formatter #51065

Merged

CyrusNajmabadi approved these changes Feb 18, 2021

View reviewed changes

sharwell closed this Feb 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐇 Use a segmented list to avoid LOH allocations in the formatter #43464

🐇 Use a segmented list to avoid LOH allocations in the formatter #43464

sharwell commented Apr 17, 2020

CyrusNajmabadi Apr 18, 2020

sharwell Apr 18, 2020 •

edited

Loading

CyrusNajmabadi Apr 18, 2020

CyrusNajmabadi Apr 18, 2020

CyrusNajmabadi Apr 18, 2020

sharwell Apr 18, 2020

CyrusNajmabadi Apr 18, 2020

CyrusNajmabadi Apr 18, 2020

CyrusNajmabadi Apr 18, 2020

CyrusNajmabadi Apr 18, 2020

CyrusNajmabadi Apr 18, 2020

CyrusNajmabadi Apr 18, 2020

CyrusNajmabadi Apr 18, 2020

CyrusNajmabadi Apr 18, 2020

CyrusNajmabadi left a comment

mjsabby commented Apr 18, 2020 •

edited

Loading

danmoseley commented Apr 18, 2020

sharwell commented Apr 19, 2020

CyrusNajmabadi commented Apr 19, 2020

stephentoub commented Apr 19, 2020 •

edited

Loading

mjsabby commented Apr 19, 2020

sharwell commented Apr 20, 2020

jkotas commented Apr 20, 2020

jkotas commented Apr 20, 2020 •

edited

Loading

jkotas commented Apr 20, 2020

sharwell commented Apr 20, 2020 •

edited

Loading

jkotas commented Apr 20, 2020

CyrusNajmabadi commented Apr 20, 2020

CyrusNajmabadi commented Apr 20, 2020

mjsabby commented Apr 20, 2020

	/// Appends a range of elements from anothe list.
	/// Appends a range of elements from another list.

	IEnumerator<T> IEnumerable<T>.GetEnumerator()
	IEnumerator<T> IEnumerable<T>.GetEnumerator() => GetEnumerator();

🐇 Use a segmented list to avoid LOH allocations in the formatter #43464

🐇 Use a segmented list to avoid LOH allocations in the formatter #43464

Conversation

sharwell commented Apr 17, 2020

Choose a reason for hiding this comment

sharwell Apr 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CyrusNajmabadi left a comment

Choose a reason for hiding this comment

mjsabby commented Apr 18, 2020 • edited Loading

danmoseley commented Apr 18, 2020

sharwell commented Apr 19, 2020

CyrusNajmabadi commented Apr 19, 2020

stephentoub commented Apr 19, 2020 • edited Loading

mjsabby commented Apr 19, 2020

sharwell commented Apr 20, 2020

jkotas commented Apr 20, 2020

jkotas commented Apr 20, 2020 • edited Loading

jkotas commented Apr 20, 2020

sharwell commented Apr 20, 2020 • edited Loading

jkotas commented Apr 20, 2020

CyrusNajmabadi commented Apr 20, 2020

CyrusNajmabadi commented Apr 20, 2020

mjsabby commented Apr 20, 2020

sharwell Apr 18, 2020 •

edited

Loading

mjsabby commented Apr 18, 2020 •

edited

Loading

stephentoub commented Apr 19, 2020 •

edited

Loading

jkotas commented Apr 20, 2020 •

edited

Loading

sharwell commented Apr 20, 2020 •

edited

Loading