Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify Capacity growth rate to be linear instead of doubling on resize. #75708

Closed

Conversation

ToddGrun
Copy link
Contributor

@ToddGrun ToddGrun commented Nov 2, 2024

Modify algorithm in SegmentedList.Grow to grow by a single segment at a time instead of doubling in size. This is an extension from some work done in #75661.

Note that when allocating the outer array in the Capacity setter, we still utilize the double size allocation mechanism to avoid an n^2 allocation pattern for it. However, we don't allocate all the inner arrays, only what is needed. If a subsequent request comes in to Grow, we'll utilize those empty array entries.

*** Benchmark.NET data ***
Note: The AddExtraItem flag is used to demonstrate that the old algorithm had a large allocation occurrence at that count. The new algorithm doesn't have such a steep allocation cliff at those discrete points. Generally,

Allocation changes from PR1 => PR2 => this PR:
Purple went from n => n/2 => n/2
Blue went from n => n/2 => n/2 => n/4

CPU changes are a little harder to summarize, but generally went down anywhere from 20%-70%

Before #75661
image

Ater #75661, before this PR:
image

With both PRs:
image

@ToddGrun ToddGrun requested review from a team as code owners November 2, 2024 00:49
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Nov 2, 2024
// Grow the array of segments, if necessary. Note that this array may end up having null entries
// at it's end when this method completes. Minimally doubling the outer array size allows amortized
// linear cost growth properties.
Array.Resize(ref segments, Math.Max(newSegmentCount, segments.Length * 2));
Copy link
Member

@sharwell sharwell Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 I don't like this being on the general Capacity.set path, since it could penalize users who are setting a capacity specifically because they know the exact size of the resulting collection. It would be preferable to have this only occur on the Grow(int) path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, let me move this to Grow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to move this to Grow and not modify the Capacity setter, but it ended up needing to duplicate pretty much all the logic in the Capacity setter. So, I created a helper method that both places can call into that takes in a flag to specify whether the doubling is desired for the segment array.

if (newSegmentCount > segments.Length)
{
// Grow the array of segments, if necessary. If isExactCapacity is false, then this array may end up
// having null entries at it's end when this method completes. Minimally doubling the outer array size
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having null entries at it's end when this method completes

@sharwell

As you mentioned offline, this could be problematic as various code assumes that entries in the array aren't null. Still looking into this, but to me it seems like it may require changing SegmentedArray to indicate that it's inner arrays can be null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made changes to SegmentedArray _items array to explicitly acknowledge that it may now contain nulls. Commented the _items declaration to indicate which parts of it may be null. By making this nullable, it catches all the locations that I missed that could have null ref'ed with this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the benchmark numbers with the changes from commit 4. Not a big difference from the numbers taken without the nullability changes, still significantly better in the measured scenarios than without this PR.

ToddGrun added a commit to ToddGrun/roslyn that referenced this pull request Nov 5, 2024
A growth rate of 2x matches the current code behavior. A growth rate of just over 1 matches the growth rate (1 segment) in the other PR (dotnet#75708).

All growth rates benchmarked: 1.000001, 1.1, 1.25, 1.5, 2

Obviously, the single segment growth rate is a non-starter without allowing null segments (which is the approach the other PR took).

The 2x rate matches current behavior, and is really only measured as a baseline. From my reading of this chart, it looks like 1.1 is the best of these choices.
@ToddGrun
Copy link
Contributor Author

Closed out in favor of #75756, which was probably slightly less efficient, but didn't require adding support to SegmentedArray to support null arrays in the outer array.

@ToddGrun ToddGrun closed this Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants