Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Windows/x64: 4 Regressions in System.Collections.CreateAddAndRemove<String> #109734

Open
performanceautofiler bot opened this issue Nov 12, 2024 · 5 comments
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Name Value
Architecture x64
OS Windows 10.0.22631
Queue ViperWindows
Baseline 004d59ade00c9cdf929bc520ef6a950eb851578f
Compare 302e0d4cf9d603fbc76e508b0b41e778c69f2186
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Numerics.Tests.Perf_VectorConvert

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
469.32 ns 546.50 ns 1.16 0.04 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_VectorConvert*'

System.Numerics.Tests.Perf_VectorConvert.Convert_float_int

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.22631
Queue ViperWindows
Baseline 004d59ade00c9cdf929bc520ef6a950eb851578f
Compare 302e0d4cf9d603fbc76e508b0b41e778c69f2186
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Char

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
5.45 ns 7.14 ns 1.31 0.10 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Char*'

System.Tests.Perf_Char.Char_ToLowerInvariant(input: "Hello World!")

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.22631
Queue ViperWindows
Baseline 004d59ade00c9cdf929bc520ef6a950eb851578f
Compare 302e0d4cf9d603fbc76e508b0b41e778c69f2186
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
12.56 ns 15.31 ns 1.22 0.14 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Int32&gt;*'

System.Memory.Span<Int32>.BinarySearch(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.22631
Queue ViperWindows
Baseline 004d59ade00c9cdf929bc520ef6a950eb851578f
Compare 302e0d4cf9d603fbc76e508b0b41e778c69f2186
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.CreateAddAndRemove<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
3.15 μs 3.37 μs 1.07 0.02 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.CreateAddAndRemove&lt;String&gt;*'

System.Collections.CreateAddAndRemove<String>.Queue(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-x64 os-windows runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Nov 12, 2024
@LoopedBard3 LoopedBard3 transferred this issue from dotnet/perf-autofiling-issues Nov 12, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 12, 2024
@LoopedBard3 LoopedBard3 added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Nov 12, 2024
@LoopedBard3
Copy link
Member

Primarily focused on the System.Collections.CreateAddAndRemove Regression. Others look like noise. Likely due to #109258. FYI @saucecontrol and @AndyAyersMS.

Commit range: 1c10cee...30dabfd

@LoopedBard3 LoopedBard3 changed the title [Perf] Windows/x64: 4 Regressions on 11/5/2024 12:19:32 AM [Perf] Windows/x64: 4 Regressions in System.Collections.CreateAddAndRemove<String> Nov 12, 2024
@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 13, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@saucecontrol
Copy link
Member

There are no codegen changes for that benchmark -- it's noisy in general. Testing the ends of the commit range, I see anything from +2% to -15%.

Results are bimodal a lot of the time. ex:

CreateAddAndRemove<String>.Queue: Job-MQUEJG(PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\core_root_408caa4e\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1) [Size=512]
Runtime = .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI; GC = Concurrent Workstation
Mean = 1.816 us, StdErr = 0.020 us (1.12%), N = 20, StdDev = 0.091 us
Min = 1.654 us, Q1 = 1.750 us, Median = 1.793 us, Q3 = 1.882 us, Max = 1.962 us
IQR = 0.132 us, LowerFence = 1.552 us, UpperFence = 2.080 us
ConfidenceInterval = [1.737 us; 1.895 us] (CI 99.9%), Margin = 0.079 us (4.33% of Mean)
Skewness = 0.13, Kurtosis = 1.71, MValue = 3.2
-------------------- Histogram --------------------
[1.610 us ; 1.708 us) | @
[1.708 us ; 1.796 us) | @@@@@@@@@@
[1.796 us ; 1.867 us) | @
[1.867 us ; 1.955 us) | @@@@@@@
[1.955 us ; 2.006 us) | @
---------------------------------------------------

CreateAddAndRemove<String>.Queue: Job-XSVGKI(PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\core_root_f7334fab\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1) [Size=512]
Runtime = .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI; GC = Concurrent Workstation
Mean = 1.578 us, StdErr = 0.004 us (0.25%), N = 13, StdDev = 0.014 us
Min = 1.563 us, Q1 = 1.569 us, Median = 1.573 us, Q3 = 1.580 us, Max = 1.613 us
IQR = 0.012 us, LowerFence = 1.551 us, UpperFence = 1.598 us
ConfidenceInterval = [1.561 us; 1.595 us] (CI 99.9%), Margin = 0.017 us (1.10% of Mean)
Skewness = 1.25, Kurtosis = 3.33, MValue = 2
-------------------- Histogram --------------------
[1.558 us ; 1.621 us) | @@@@@@@@@@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet v0.14.1-nightly.20240924.187, Windows 11 (10.0.26100.2314)
Unknown processor
.NET SDK 9.0.100
  [Host]     : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-MQUEJG : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-XSVGKI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

| Method | Job        | Toolchain                       | Size | Mean     | Error     | StdDev    | Median   | Min      | Max      | Ratio | RatioSD | Code Size | Gen0   | Allocated | Alloc Ratio |
|------- |----------- |-------------------------------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------:|--------:|----------:|-------:|----------:|------------:|
| Queue  | Job-MQUEJG | \core_root_408caa4e\CoreRun.exe | 512  | 1.816 us | 0.0787 us | 0.0906 us | 1.793 us | 1.654 us | 1.962 us |  1.00 |    0.07 |   2,753 B | 1.0021 |    8.2 KB |        1.00 |
| Queue  | Job-XSVGKI | \core_root_f7334fab\CoreRun.exe | 512  | 1.578 us | 0.0173 us | 0.0145 us | 1.573 us | 1.563 us | 1.613 us |  0.87 |    0.04 |   2,714 B | 0.9973 |    8.2 KB |        1.00 |

// * Warnings *
MultimodalDistribution
  CreateAddAndRemove<String>.Queue: PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\core_root_408caa4e\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1 -> It seems that the distribution can have several modes (mValue = 3.2)

@saucecontrol
Copy link
Member

saucecontrol commented Nov 13, 2024

Oops, I missed there are some small diffs, just nothing HWIntrinsics related.

https://www.diffchecker.com/cZokjbLf/

In any case, the best times across multiple runs are the same before and after.

| Method | Job        | Toolchain                       | Size | Mean     | Error     | StdDev    | Median   | Min      | Max      | Ratio | RatioSD | Code Size | Gen0   | Allocated | Alloc Ratio |
|------- |----------- |-------------------------------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------:|--------:|----------:|-------:|----------:|------------:|
| Queue  | Job-WCNCZS | \core_root_408caa4e\CoreRun.exe | 512  | 1.589 us | 0.0304 us | 0.0325 us | 1.581 us | 1.559 us | 1.665 us |  1.00 |    0.03 |   2,895 B | 0.9974 |    8.2 KB |        1.00 |
| Queue  | Job-GDAQWM | \core_root_f7334fab\CoreRun.exe | 512  | 1.571 us | 0.0111 us | 0.0093 us | 1.569 us | 1.560 us | 1.595 us |  0.99 |    0.02 |   2,856 B | 1.0021 |    8.2 KB |        1.00 |

| Method | Job        | Toolchain                       | Size | Mean     | Error     | StdDev    | Median   | Min      | Max      | Ratio | RatioSD | Code Size | Gen0   | Allocated | Alloc Ratio |
|------- |----------- |-------------------------------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------:|--------:|----------:|-------:|----------:|------------:|
| Queue  | Job-UVLVSQ | \core_root_408caa4e\CoreRun.exe | 512  | 1.568 us | 0.0153 us | 0.0128 us | 1.568 us | 1.553 us | 1.590 us |  1.00 |    0.01 |   2,753 B | 1.0021 |    8.2 KB |        1.00 |
| Queue  | Job-EMTGJG | \core_root_f7334fab\CoreRun.exe | 512  | 1.607 us | 0.0486 us | 0.0541 us | 1.598 us | 1.556 us | 1.768 us |  1.02 |    0.03 |   2,714 B | 1.0022 |    8.2 KB |        1.00 |

| Method | Job        | Toolchain                       | Size | Mean     | Error     | StdDev    | Median   | Min      | Max      | Ratio | RatioSD | Code Size | Gen0   | Allocated | Alloc Ratio |
|------- |----------- |-------------------------------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------:|--------:|----------:|-------:|----------:|------------:|
| Queue  | Job-BZHTXI | \core_root_408caa4e\CoreRun.exe | 512  | 1.864 us | 0.0469 us | 0.0540 us | 1.850 us | 1.745 us | 1.955 us |  1.00 |    0.04 |   2,753 B | 0.9973 |    8.2 KB |        1.00 |
| Queue  | Job-XXKNGE | \core_root_f7334fab\CoreRun.exe | 512  | 1.576 us | 0.0164 us | 0.0146 us | 1.571 us | 1.559 us | 1.605 us |  0.85 |    0.03 |   2,714 B | 1.0005 |    8.2 KB |        1.00 |

@JulieLeeMSFT
Copy link
Member

@AndyAyersMS, PTAL.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Nov 14, 2024
@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Nov 14, 2024
@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

6 participants