Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the exact initial capacity of a List literal #99073

Closed
wants to merge 1 commit into from

Conversation

jbevain
Copy link
Contributor

@jbevain jbevain commented Feb 28, 2024

In .NET 7:

List<string> s = ["bonjour"];
Console.WriteLine(s.Capacity);

Will print 1, as it will compile to a call to the List(int) constructor which sets the exact initial capacity, then to a List.Add.

After .NET 8,

This code prints 4 as CollectionsMarshal.SetCount(List,int) calls List.Grow.

It seems more optimal to create a List with an exact initial capacity. This PR is not be ready as there's a hundred scenarios I didn't consider and no test at the moment :)

Alternatively, this PR could be scrubbed and the C# compiler could also call the List(int) ctor.

@ghost
Copy link

ghost commented Feb 28, 2024

Tagging subscribers to this area: @dotnet/interop-contrib
See info in area-owners.md if you want to be subscribed.

Issue Details

In .NET 7:

List<string> s = ["bonjour"];
Console.WriteLine(s.Capacity);

Will print 1, as it will compile to a call to the List(int) constructor which sets the exact initial capacity, then to a List.Add.

After .NET 8,

This code prints 4 as CollectionsMarshal.SetCount(List,int) calls List.Grow.

This PR restores the previous behavior for this scenario.

Author: jbevain
Assignees: -
Labels:

area-System.Runtime.InteropServices

Milestone: -

@AaronRobinsonMSFT
Copy link
Member

Thanks @jbevain. This seems like a reasonable discussion point. Let me loop in @stephentoub who likely will have a more informed opinion about the intents of the langauge designers and is always looking for excess memory usage.

@AaronRobinsonMSFT AaronRobinsonMSFT modified the milestones: 9.0.0, Future Feb 28, 2024
@jbevain
Copy link
Contributor Author

jbevain commented Feb 28, 2024

Related: dotnet/roslyn#72318

@stephentoub
Copy link
Member

stephentoub commented Feb 28, 2024

For reference, here's what I wrote in an email earlier today when asked in a related conversation why SetCount behaves the way it does:

Because count != capacity, and for the same reason that new List() { item } yields a count of 1 and a capacity of 4. No one has specified a minimum capacity the list must be grow to initially, and in the absence of that, it follows a normal doubling scheme, starting at 4. It has no idea if you'll be adding more after this.

We discussed whether collection literals should set the capacity, and it comes down to how likely it will be that someone will add more after construction: if no, set capacity, if yes, don't. But the compiler can't always predict that (though it might be able to "see" it in some cases).

@AaronRobinsonMSFT
Copy link
Member

here's what I wrote in an email earlier today

@stephentoub Based on that snippet, I interepret your opinion to mean the compiler should emit the lower allocation pattern if it can be detected rather than changing the API. Is that fair?

@stephentoub
Copy link
Member

stephentoub commented Feb 29, 2024

here's what I wrote in an email earlier today

@stephentoub Based on that snippet, I interepret your opinion to mean the compiler should emit the lower allocation pattern if it can be detected rather than changing the API. Is that fair?

It's more that the behavior for both SetCount and collection literals is as was previously discussed and designed. We could of course choose to revisit.

I still think the behavior for SetCount is correct. The knobs exist on the API surface to separately control the count and capacity, so I don't love the idea of SetCount employing a different heuristic for growth than the rest of the API surface area. If you want the capacity to exactly match the count, you can do either:

var list = new List<T>(count);
CollectionsMarshal.SetCount(list, count);

or:

list.Capacity = count;
CollectionsMarshal.SetCount(list, count);

If you don't choose a specific capacity, then I think having SetCount employ the same growth scheme as the rest of the List<T> surface area is right from the perspective of consistency.

That's separate from how the language/compiler choose to lower collection literals. Because the compiler wouldn't always be able to predict whether a developer was going to add more to the list after it was initialized, we'd decided to just do the simple thing and use list's default growth scheme by not having the compiler set the capacity. If we now believe the 99% case is that someone using a collection literal to construct a List<T> won't ever append to the list after it's initialized, then it would likely make sense to change the lowering to set a capacity equal to the count. If not, what's currently there may make the most sense. And if the latter, and extra care wanted to be paid to analysis to determine whether the list was subsequently updated (e.g. the compiler can see it's a local being initialized, that local never escapes, and no methods are ever called on that local that could increase its size), then the compiler could also choose to special-case that. In any case, that's a question for Roslyn and dotnet/roslyn#72318.

Just for completeness why this matters, if you do:

var list = new List<T>(1) { item };
list.Add(item);
list.Add(item);

that's going to allocate a T[] of length 1, a T[] of length 2, and a T[] of length 4. If you instead do:

var list = new List<T>() { item };
list.Add(item);
list.Add(item);

that's only going to allocate a T[] of length 4. So from a microoptimization perspective, whether additional adds will be performed is relevant to whether setting the capacity is an optimization or deoptimization.

@AaronRobinsonMSFT
Copy link
Member

@jbevain Given Steve's commentary above and the discussion in dotnet/roslyn#72318, I'm going to convert this PR to Draft for now.

@AaronRobinsonMSFT AaronRobinsonMSFT marked this pull request as draft February 29, 2024 16:57
@jbevain jbevain closed this Mar 7, 2024
@jbevain jbevain deleted the list-literal-initial-capacity branch March 7, 2024 23:55
@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants