Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the List(int) constructor if possible when lowering collection expressions #72427

Merged
merged 9 commits into from
Mar 12, 2024

Conversation

jbevain
Copy link
Contributor

@jbevain jbevain commented Mar 6, 2024

Fixes #72318

by lowering a collection expression with a known length to:

var len = knownLength;
var list = new List<T>(len);
CollectionsMarshal.SetCount(list, len);

instead of

var list = new List<T>();
CollectionsMarshal.SetCount(list, knownLength);

This ensures that the List is sized exactly for the knownLength.

@jbevain jbevain requested a review from a team as a code owner March 6, 2024 22:57
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Mar 6, 2024
@jbevain jbevain changed the title Use the known length int constructor if available when lowering collection expressions Use the List(int) constructor if available when lowering collection expressions Mar 6, 2024
@jbevain jbevain changed the title Use the List(int) constructor if available when lowering collection expressions Use the List(int) constructor if possible when lowering collection expressions Mar 6, 2024
@RikkiGibson
Copy link
Contributor

RikkiGibson commented Mar 6, 2024

As far as I know SetCount is required for correctness, in order for the Span we get with AsSpan to have the correct length, and so on. In other words, if you want to adjust the codegen here, you should use the List(int) constructor, while leaving the call to SetCount unchanged.

Also, some microbenchmark to demonstrate the change in perf characteristics may be warranted here, a la #71195 (comment)

@jbevain
Copy link
Contributor Author

jbevain commented Mar 7, 2024

@RikkiGibson ah yep, I get it now. On a second pass I've reintroduced the call to SetCount, but now the IL duplicates the computation of the count for the list contructor and for the SetCount. I'll guess I'll need to store that in a variable. Is an easy way at this stage to compute the count, duplicate it on the stack so that we can call List.ctor and SetCount?

@RikkiGibson
Copy link
Contributor

@RikkiGibson ah yep, I get it now. On a second pass I've reintroduced the call to SetCount, but now the IL duplicates the computation of the count for the list contructor and for the SetCount. I'll guess I'll need to store that in a variable. Is an easy way at this stage to compute the count, duplicate it on the stack so that we can call List.ctor and SetCount?

Yeah there should be some examples of this in nearby lowering code. Look for existing usages of StoreToTemp.

@jbevain jbevain force-pushed the use-known-length-int-ctor branch from 05c5f8e to 96e6511 Compare March 7, 2024 01:39
@jbevain
Copy link
Contributor Author

jbevain commented Mar 7, 2024

Microbenchmark:

| Method                 | N     | Mean       | Error     | StdDev    | Gen0   | Gen1   | Allocated |
|----------------------- |------ |-----------:|----------:|----------:|-------:|-------:|----------:|
| CreateListNoCapacity   | 0     |   3.101 ns | 0.0212 ns | 0.0188 ns | 0.0017 |      - |      32 B |
| CreateListWithCapacity | 0     |   3.186 ns | 0.0340 ns | 0.0318 ns | 0.0017 |      - |      32 B |
| CreateListNoCapacity   | 1     |   4.703 ns | 0.0547 ns | 0.0512 ns | 0.0038 |      - |      72 B |
| CreateListWithCapacity | 1     |   4.810 ns | 0.0513 ns | 0.0480 ns | 0.0034 |      - |      64 B |
| CreateListNoCapacity   | 2     |   4.670 ns | 0.0329 ns | 0.0308 ns | 0.0038 |      - |      72 B |
| CreateListWithCapacity | 2     |   4.812 ns | 0.0272 ns | 0.0241 ns | 0.0034 |      - |      64 B |
| CreateListNoCapacity   | 3     |   4.649 ns | 0.0347 ns | 0.0324 ns | 0.0038 |      - |      72 B |
| CreateListWithCapacity | 3     |   4.942 ns | 0.0181 ns | 0.0169 ns | 0.0038 |      - |      72 B |
| CreateListNoCapacity   | 4     |   5.072 ns | 0.0281 ns | 0.0263 ns | 0.0038 |      - |      72 B |
| CreateListWithCapacity | 4     |   4.929 ns | 0.0383 ns | 0.0358 ns | 0.0038 |      - |      72 B |
| CreateListNoCapacity   | 10    |   5.394 ns | 0.0272 ns | 0.0254 ns | 0.0051 |      - |      96 B |
| CreateListWithCapacity | 10    |   5.411 ns | 0.0162 ns | 0.0143 ns | 0.0051 |      - |      96 B |
| CreateListNoCapacity   | 10000 | 767.173 ns | 6.4844 ns | 6.0655 ns | 2.1229 | 0.2651 |   40056 B |
| CreateListWithCapacity | 10000 | 761.286 ns | 5.3485 ns | 5.0030 ns | 2.1229 | 0.2651 |   40056 B |
[MemoryDiagnoser]
public class ListInitBenchmark
{
    [Params(0, 1, 2, 3, 4, 10, 10000)]
    public int N;

    [Benchmark]
    public List<int> CreateListNoCapacity()
    {
        List<int> list = new();
        CollectionsMarshal.SetCount(list, N);
        return list;
    }

    [Benchmark]
    public List<int> CreateListWithCapacity()
    {
        var knownLength = N;
        List<int> list = new(capacity: knownLength);
        CollectionsMarshal.SetCount(list, knownLength);
        return list;
    }
}

@cston
Copy link
Member

cston commented Mar 7, 2024

    }

Let's include a test where the length calculation includes side-effects, so it's clear we're only calculating the length once. Perhaps:

[CombinatorialData]
[Theory]
public void LengthWithSideEffects(
    [CombinatorialValues(TargetFramework.Net70, TargetFramework.Net80)]
    TargetFramework targetFramework)
{
    string source = """
        using System;
        using System.Collections;
        using System.Collections.Generic;
        class MyCollection<T> : IEnumerable<T>
        {
            private List<T> _list = new();
            public int Length
            {
                get { Console.Write("Length: {0}, ", _list.Count); return _list.Count; }
            }
            public void Add(T t) { _list.Add(t); }
            IEnumerator<T> IEnumerable<T>.GetEnumerator() => _list.GetEnumerator();
            IEnumerator IEnumerable.GetEnumerator() => _list.GetEnumerator();
        }
        class Program
        {
            static void Main()
            {
                MyCollection<int> x = [1, 2];
                MyCollection<object> y = [3];
                List<object> z = [..x, ..y];
            }
        }
        """;
    CompileAndVerify(
        source,
        targetFramework: targetFramework,
        verify: Verification.Skipped,
        expectedOutput: IncludeExpectedOutput("Length: 2, Length: 1, "));
}

Refers to: src/Compilers/CSharp/Test/Emit2/Semantics/CollectionExpressionTests.cs:32933 in cdd8f75. [](commit_id = cdd8f75, deletion_comment = False)

@cston
Copy link
Member

cston commented Mar 7, 2024

    }

We should test with and without the CollectionsMarshal optimizations. I've updated the test example above.


In reply to: 1984112547


Refers to: src/Compilers/CSharp/Test/Emit2/Semantics/CollectionExpressionTests.cs:32933 in cdd8f75. [](commit_id = cdd8f75, deletion_comment = False)

@RikkiGibson
Copy link
Contributor

RikkiGibson commented Mar 7, 2024

Microbenchmark:

Allocs are only reduced in the N = 1 or N = 2 case? Is that expected?

Apologies to belabor things but perhaps we should also test a case where collection creation is followed up with some explicit Adds.

@cston
Copy link
Member

cston commented Mar 7, 2024

    public void CreatingNewListFromLengthWithSideEffects([CombinatorialValues(TargetFramework.Net70, TargetFramework.Net80)], TargetFramework targetFramework)

Typo?


Refers to: src/Compilers/CSharp/Test/Emit2/Semantics/CollectionExpressionTests.cs:33652 in 18b3599. [](commit_id = 18b3599, deletion_comment = False)

@jbevain
Copy link
Contributor Author

jbevain commented Mar 7, 2024

Allocs are only reduced in the N = 1 or N = 2 case? Is that expected?

Looking at the interactions between SetCount and List.Grow, they should be reduced for n > 0 && n < List.DefaultCapacity, so yeah, 1, 2, 3. It's not obvious with the List<int> case probably because of alignment. For List<object>:

| Method                 | N     | Mean         | Error     | StdDev    | Gen0   | Gen1   | Allocated |
|----------------------- |------ |-------------:|----------:|----------:|-------:|-------:|----------:|
| CreateListNoCapacity   | 0     |     6.040 ns | 0.1560 ns | 0.4477 ns | 0.0017 |      - |      32 B |
| CreateListWithCapacity | 0     |     4.951 ns | 0.0807 ns | 0.1666 ns | 0.0017 |      - |      32 B |
| CreateListNoCapacity   | 1     |     6.003 ns | 0.0440 ns | 0.0390 ns | 0.0047 |      - |      88 B |
| CreateListWithCapacity | 1     |     5.632 ns | 0.1210 ns | 0.1132 ns | 0.0034 |      - |      64 B |
| CreateListNoCapacity   | 2     |     6.162 ns | 0.0543 ns | 0.0508 ns | 0.0047 |      - |      88 B |
| CreateListWithCapacity | 2     |     5.717 ns | 0.1423 ns | 0.1397 ns | 0.0038 |      - |      72 B |
| CreateListNoCapacity   | 3     |     6.079 ns | 0.0584 ns | 0.0518 ns | 0.0047 |      - |      88 B |
| CreateListWithCapacity | 3     |     5.996 ns | 0.0695 ns | 0.0650 ns | 0.0042 |      - |      80 B |
| CreateListNoCapacity   | 4     |     6.152 ns | 0.1421 ns | 0.1396 ns | 0.0047 |      - |      88 B |
| CreateListWithCapacity | 4     |     5.941 ns | 0.0761 ns | 0.0712 ns | 0.0047 |      - |      88 B |
| CreateListNoCapacity   | 10    |     6.869 ns | 0.1083 ns | 0.0904 ns | 0.0072 |      - |     136 B |
| CreateListWithCapacity | 10    |     6.772 ns | 0.0577 ns | 0.0540 ns | 0.0072 |      - |     136 B |
| CreateListNoCapacity   | 10000 | 1,574.879 ns | 7.2842 ns | 6.8136 ns | 4.2362 | 0.8469 |   80056 B |
| CreateListWithCapacity | 10000 | 1,582.583 ns | 8.5771 ns | 8.0230 ns | 4.2362 | 0.8469 |   80056 B |

This all started for us because we depend on a library that (over)uses List, and in many cases, we need List instances of only 1 element.

@RikkiGibson
Copy link
Contributor

It makes sense to me that some use cases depend heavily on numerous small Lists, and it feels reasonable to me for us to make an adjustment to this codegen in order to save the memory for those cases and increase uniformity of the observable behavior with the non-optimized case.

I'll have to review the codegen changes in the tests in this PR to really feel 100% sure though.

Thanks for taking a stab at this work :)

@RikkiGibson RikkiGibson self-assigned this Mar 7, 2024
@jbevain
Copy link
Contributor Author

jbevain commented Mar 11, 2024

@RikkiGibson I think the last 2 commits address both of your comments. There are tests in place that validate the different scenarios:

  • With or without the CollectionsMarshal methods
  • With or without List.ctor(int32)
  • Not using optimizations but there's a known length so we're using the List.ctor(int32) without the optimizations (and without the temporary variable).

@cston cston merged commit af15182 into dotnet:main Mar 12, 2024
24 checks passed
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Mar 12, 2024
@jbevain jbevain deleted the use-known-length-int-ctor branch March 12, 2024 17:22
@cston
Copy link
Member

cston commented Mar 12, 2024

Thanks @jbevain!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

List SetCount/AsSpan optimization should produce a list with same capacity as List made with new/Add
3 participants