Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List SetCount/AsSpan optimization should produce a list with same capacity as List made with new/Add #72318

Closed
RikkiGibson opened this issue Feb 28, 2024 · 4 comments · Fixed by #72427
Labels
Area-Compilers Bug Code Gen Quality Room for improvement in the quality of the compiler's generated code Feature - Collection Expressions untriaged Issues and PRs which have not yet been triaged by a lead

Comments

@RikkiGibson
Copy link
Contributor

RikkiGibson commented Feb 28, 2024

List<string> s = ["s"];
Console.WriteLine(s.Capacity);

In absence of CollectionsMarshal.SetCount/AsSpan, the above program outputs 1, but in presence of those methods, it outputs 4.

It feels like the observable behavior of the optimized version should be the same as the non-optimized version, except if we have some really compelling reason to deviate.

Hopefully, we can simply fix by calling the 'List(int capacity)' constructor in the optimized case in addition to the non-optimized case (where we are already doing this.)

Relevant spec bits from https://github.com/dotnet/csharplang/blob/main/proposals/csharp-12.0/collection-expressions.md#known-length-translation:

  • If T supports collection initializers, then:

    • if the type T contains an accessible constructor with a single parameter int capacity, then the literal is translated as:

      T __result = new T(capacity: __len);
      __result.Add(__e1);
      foreach (var __t in __s1)
          __result.Add(__t);
      
      // further additions of the remaining elements

In other words, the default expectation of an optimization for List codegen would be that it behaves the same as the quoted codegen.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Feb 28, 2024
@RikkiGibson RikkiGibson added Feature - Collection Expressions Code Gen Quality Room for improvement in the quality of the compiler's generated code Area-Compilers Bug untriaged Issues and PRs which have not yet been triaged by a lead and removed Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Feb 28, 2024
@CyrusNajmabadi
Copy link
Member

In other words, the default expectation of an optimization for List codegen would be that it behaves the same as the quoted codegen.

Agree. Though note that we have permitted the compiler to do differently if it thinks it can do better. For example, like how we call ImmtuableCollectionMarshal.AsImmutableArray(T[]) instead of ImmutableArray.Create.

So it would be fine for the compiler to bypass the constructor if it thinks it is in a better state. That includes in ways that might be observable though things like 'Capacity' (since that's a hint, and not something we guarantee).

What's odd in this case is that the path we're taking produces what seems to be a worse result for the user (since it overallocates). So, in a sense, it's an odd-deoptimization. We def don't want those. But, to be clear, we should be willing to throw in optimizations when they going to be very good for the ecosystem, even if potentially minorly observable through things like capacity. The goal is to follow the intent and spirit of the APIs requirements here. Not to produce exactly identical semantics.

@CyrusNajmabadi
Copy link
Member

For context: https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.capacity?view=net-8.0

Gets or sets the total number of elements the internal data structure can hold without resizing.

There is no requirement (or desire for requirement) at the language level that this always return the same value. The only requirement is that we produce an instance that legally follows these rules and contains the elements.

The only invariants we defined (for well behavedness) were things like: If the collection provides a Count/Length, then iterating hte elements in it shoudl produce the same number of elements, and things like that.

@RikkiGibson
Copy link
Contributor Author

RikkiGibson commented Feb 28, 2024

I agree that yes we have the right to deviate on the capacity when optimizing. But, if we do, let's make sure it's because our different behavior is part of a better result "overall", rather than being a difference which is arbitrary, occurring by accident, or, which perhaps yields a worse result "overall".

@CyrusNajmabadi
Copy link
Member

100% agree @RikkiGibson :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Bug Code Gen Quality Room for improvement in the quality of the compiler's generated code Feature - Collection Expressions untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants