Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regressions in System.Xml.Linq.Perf_XName (FullPGO) #64626

Closed
performanceautofiler bot opened this issue Feb 1, 2022 · 29 comments
Closed

Regressions in System.Xml.Linq.Perf_XName (FullPGO) #64626

performanceautofiler bot opened this issue Feb 1, 2022 · 29 comments
Assignees
Labels
arch-x64 area-GC-coreclr blocking-release runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Architecture x64
OS Windows 10.0.18362
Baseline 215c328e039b3a6b6bfdc6514253307105f25816
Compare ca6716a658137b55d29b3ce100d5d14eb0edd67e
Diff Diff

Regressions in System.Xml.Linq.Perf_XName

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
NonEmptyNameSpaceToString - Duration of single invocation 19.80 ns 24.26 ns 1.22 0.18 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*'

Payloads

Baseline
Compare

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString


Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417.
IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM.
IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added CoreClr untriaged New issue has not been triaged by the area owner labels Feb 1, 2022
@EgorBo EgorBo changed the title [Perf] Changes at 1/31/2022 4:53:41 PM Regressions in System.Xml.Linq.Perf_XName (FullPGO) Feb 1, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Security untriaged New issue has not been triaged by the area owner labels Feb 1, 2022
@ghost
Copy link

ghost commented Feb 1, 2022

Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture x64
OS Windows 10.0.18362
Baseline 215c328e039b3a6b6bfdc6514253307105f25816
Compare ca6716a658137b55d29b3ce100d5d14eb0edd67e
Diff Diff

Regressions in System.Xml.Linq.Perf_XName

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
NonEmptyNameSpaceToString - Duration of single invocation 19.80 ns 24.26 ns 1.22 0.18 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*'

Payloads

Baseline
Compare

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString


Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417.
IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM.
IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

area-System.Security, untriaged

Milestone: -

@EgorBo EgorBo transferred this issue from dotnet/perf-autofiling-issues Feb 1, 2022
@EgorBo EgorBo added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Feb 1, 2022
@ghost
Copy link

ghost commented Feb 1, 2022

Tagging subscribers to this area: @dotnet/area-system-xml
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture x64
OS Windows 10.0.18362
Baseline 215c328e039b3a6b6bfdc6514253307105f25816
Compare ca6716a658137b55d29b3ce100d5d14eb0edd67e
Diff Diff

Regressions in System.Xml.Linq.Perf_XName

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
NonEmptyNameSpaceToString - Duration of single invocation 19.80 ns 24.26 ns 1.22 0.18 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*'

Payloads

Baseline
Compare

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString


Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417.
IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM.
IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

area-System.Xml, tenet-performance, tenet-performance-benchmarks, untriaged, refs/heads/main, x64, Regression, RunKind=micro, CoreClr, Windows 10.0.18362, PGOType=fullpgo

Milestone: -

@krwq
Copy link
Member

krwq commented Feb 17, 2022

@adamsitnik this seems a bit like a noise - looking at graph my gut tells me this is well within stdev, any clues?

@adamsitnik
Copy link
Member

@krwq that was my initial impression as well, but then I've clicked on "Toggle annotations" (1), the blue line (2) and moving average (3) and it's rather an actual regression:

image

@jeffhandley jeffhandley added runtime-coreclr specific to the CoreCLR runtime and removed CoreClr labels Mar 3, 2022
@jkotas jkotas added arch-x64 and removed x64 labels Mar 14, 2022
@krwq krwq added this to the 7.0.0 milestone Mar 24, 2022
@krwq krwq removed the untriaged New issue has not been triaged by the area owner label Mar 24, 2022
@eiriktsarpalis
Copy link
Member

Performance for the particular benchmarks appears to have regressed substantially over the recent months:

newplot

I don't believe we've made any change to Xml over that time, codegen related perhaps?

@krwq krwq self-assigned this Aug 11, 2022
@JulieLeeMSFT
Copy link
Member

@AndyAyersMS PTAL.
Cc @dotnet/jit-contrib @mangod9.

@EgorBo
Copy link
Member

EgorBo commented Aug 23, 2022

Regression in FramedAllocateString might be a sign of a regression in GC e.g. GC Regions.

@JulieLeeMSFT
Copy link
Member

FramedAllocateString might be a sign of a regression in GC e.g. GC Regions.

cc @Maoni0.

@AndyAyersMS
Copy link
Member

@krwq in the profiles above are the two runs measuring the same amount of work (that is, same values for iterationCount and invocationCount ?

And, were you filtering the profile to just the actual stages?

Assuming so, it does not look like a codegen issue, the jit-generated code is actually a good deal faster.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Aug 29, 2022

Also we haven't historically run triage on full PGO results; I'm a bit surprised we even opened this issue. If we look at perf of this test with the default settings we see

(default)
newplot - 2022-08-29T112657 971

For purposes of comparison:

(full pgo)

newplot - 2022-08-29T113211 040

Just for completeness here's the other bit of data we have:

(dynamic pgo)
newplot - 2022-08-29T113445 526

The regressions in the PGO modes on Jan 31 (which inspired this issue) seem to well correlated with #64521. But clearly since then all modes have regressed.

The most troubling of these is the recent regression in default perf:
newplot - 2022-08-29T114443 411

We have a reporting outage around the first and largest regression. Commit range there has 66 commits: bceefa9...e117479. I think @DrewScoggins is trying to backfill data for that gap.

Source of the second regression is unclear; implicated commit range is 9814b1b...e3b2af3 which is 16 commits.

For PGO there have been several regressions over the past months. It will take some time to sort through them all.

@AndyAyersMS
Copy link
Member

@krwq in the profiles above are the two runs measuring the same amount of work (that is, same values for iterationCount and invocationCount ?

And, were you filtering the profile to just the actual stages?

Assuming so, it does not look like a codegen issue, the jit-generated code is actually a good deal faster.

@krwq any info you can add here? If not, will start looking at this more closely tomorrow.

@krwq
Copy link
Member

krwq commented Sep 1, 2022

@AndyAyersMS yes, they measured exactly same amount of work. Unfortunately I don't have more info I could add here. I only observed newer code being consistently slower on my machine which is inline with what the issue description is showing and seems the only thing the XML code is doing is string concatenation and given PerfView also showed string related stuff as being slower it seems root cause is somewhere in coreclr but I don't know that code enough to pinpoint anything in particular.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Sep 7, 2022

Looking at default behavior for now (that is, product defaults, not Full PGO), I see 10% regressions vs .NET 6:

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-rc.1.22425.9
[Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2
Job-LJOSDL : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT AVX2
Job-UKYGTY : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
Job-HTVXBH : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 InvocationCount=9983200 IterationCount=25
IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True
MinIterationCount=15 WarmupCount=1

Method Job Runtime Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Allocated Alloc Ratio
NonEmptyNameSpaceToString Job-LJOSDL .NET 5.0 net5.0 20.14 ns 0.253 ns 0.338 ns 20.04 ns 19.51 ns 20.77 ns 1.07 0.03 0.0127 80 B 1.00
NonEmptyNameSpaceToString Job-UKYGTY .NET 6.0 net6.0 18.92 ns 0.336 ns 0.448 ns 18.81 ns 18.34 ns 20.02 ns 1.00 0.00 0.0127 80 B 1.00
NonEmptyNameSpaceToString Job-HTVXBH .NET 7.0 net7.0 21.05 ns 0.240 ns 0.320 ns 21.01 ns 20.74 ns 22.34 ns 1.11 0.03 0.0127 80 B 1.00

Comparing filtered BDN profiles hasn't been illuminating yet, though they are similar to the ones from @krwq above in that the native part of the allocation seems slower.

Per discussion in #74771 the .NET 7 regression seems greatly reduced if we don't use region-based GC:

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-rc.1.22425.9
[Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2
Job-PYJGQF : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

EnvironmentVariables=DOTNET_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000 Runtime=.NET 7.0
Toolchain=net7.0 InvocationCount=9983200 IterationCount=25
IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True
MinIterationCount=15 WarmupCount=1

Method Mean Error StdDev Median Min Max Gen 0 Allocated
NonEmptyNameSpaceToString 19.93 ns 0.470 ns 0.627 ns 19.84 ns 19.10 ns 22.15 ns 0.0127 80 B

and a similar with DOTNET_GCWriteBarrier=3

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-rc.1.22425.9
[Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2
Job-HBMGZU : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

EnvironmentVariables=DOTNET_GCWriteBarrier=3 PowerPlanMode=00000000-0000-0000-0000-000000000000 Runtime=.NET 7.0
Toolchain=net7.0 InvocationCount=9983200 IterationCount=25
IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True
MinIterationCount=15 WarmupCount=1

Method Mean Error StdDev Median Min Max Gen 0 Allocated
NonEmptyNameSpaceToString 20.04 ns 0.630 ns 0.842 ns 19.96 ns 18.94 ns 23.21 ns 0.0127 80 B

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Sep 9, 2022

Moving this to area-gc since it seems to be related to recent gc changes.

Again, this is mainly about the non-PGO behavior.

@ghost
Copy link

ghost commented Sep 9, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture x64
OS Windows 10.0.18362
Baseline 215c328e039b3a6b6bfdc6514253307105f25816
Compare ca6716a658137b55d29b3ce100d5d14eb0edd67e
Diff Diff

Regressions in System.Xml.Linq.Perf_XName

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
NonEmptyNameSpaceToString - Duration of single invocation 19.80 ns 24.26 ns 1.22 0.18 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*'

Payloads

Baseline
Compare

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString


Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417.
IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM.
IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: AndyAyersMS
Labels:

blocking-release, tenet-performance, tenet-performance-benchmarks, arch-x64, area-CodeGen-coreclr, area-GC-coreclr, runtime-coreclr, refs/heads/main, RunKind=micro, Windows 10.0.18362, PGOType=fullpgo

Milestone: 7.0.0

@AndyAyersMS AndyAyersMS removed the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 9, 2022
@JulieLeeMSFT
Copy link
Member

@AndyAyersMS, are you still looking into this? Should we assign a different owner in GC team?

@mrsharm
Copy link
Member

mrsharm commented Sep 12, 2022

I'll take over from here since this is prospectively a GC issue. Investigation in Progress.

@mrsharm mrsharm assigned mrsharm and unassigned AndyAyersMS Sep 12, 2022
@mrsharm
Copy link
Member

mrsharm commented Sep 13, 2022

Data

Without GC Normalization

There seems to be a regression within a 5% margin:

Benchmark Name Baseline Comparand Baseline Mean Duration Comparand Mean Duration Δ Mean Duration Δ% Mean Duration Baseline number of iterations Comparand number of iterations Δ number of iterations Δ% number of iterations Baseline gc count Comparand gc count Δ gc count Δ% gc count Baseline median Comparand median Δ median Δ% median Baseline non induced gc count Comparand non induced gc count Δ non induced gc count Δ% non induced gc count Baseline total allocated (mb) Comparand total allocated (mb) Δ total allocated (mb) Δ% total allocated (mb) Baseline total pause time (msec) Comparand total pause time (msec) Δ total pause time (msec) Δ% total pause time (msec) Baseline gc pause time % Comparand gc pause time % Δ gc pause time % Δ% gc pause time % Baseline JIT_WriteBarrier Comparand JIT_WriteBarrier Δ JIT_WriteBarrier Δ% JIT_WriteBarrier Baseline JIT_ByRefWriteBarrier Comparand JIT_ByRefWriteBarrier Δ JIT_ByRefWriteBarrier Δ% JIT_ByRefWriteBarrier
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments writebarrier_3 19.37 19.85 0.48 2.48 18 15 -3 -16.67 2612 2497 -115 -4.4 19.27 19.62 0.35 1.8 2376 2257 -119 -5.01 25090.88 23737.18 -1353.7 -5.4 11.29 11.26 -0.03 -0.3 2.4 2.5 0.11 4.42 0 344 344 0 0 0 NaN
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments regions 19.37 19.46 0.09 0.44 18 15 -3 -16.67 2612 2530 -82 -3.14 19.27 19.27 -0.01 -0.03 2376 2290 -86 -3.62 25090.88 24207.49 -883.39 -3.52 11.29 11.41 0.12 1.07 2.4 2.58 0.19 7.74 0 0 0 NaN 0 0 0 NaN

Baseline - Segments

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-FQYTMY : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

EnvironmentVariables=COMPlus_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Mean Error StdDev Median Min Max Gen 0 Allocated
NonEmptyNameSpaceToString 19.37 ns 0.438 ns 0.469 ns 19.27 ns 18.87 ns 20.32 ns 0.0076 80 B

Regions

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-HLWNHM : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method Mean Error StdDev Median Min Max Gen 0 Allocated
NonEmptyNameSpaceToString 19.46 ns 0.352 ns 0.329 ns 19.27 ns 19.09 ns 19.97 ns 0.0076 80 B

Regions w/ More Precise Writebarrier

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-QBQSZP : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

EnvironmentVariables=COMPlus_GCWriteBarrier=3,COMPlus_EnableWriteXorExecute=0 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Mean Error StdDev Median Min Max Gen 0 Allocated
NonEmptyNameSpaceToString 19.85 ns 0.429 ns 0.401 ns 19.62 ns 19.45 ns 20.66 ns 0.0076 80 B

With GC Normalization

Benchmark Name Baseline Comparand Baseline Mean Duration Comparand Mean Duration Δ Mean Duration Δ% Mean Duration Baseline number of iterations Comparand number of iterations Δ number of iterations Δ% number of iterations Baseline gc count Comparand gc count Δ gc count Δ% gc count Baseline median Comparand median Δ median Δ% median Baseline non induced gc count Comparand non induced gc count Δ non induced gc count Δ% non induced gc count Baseline total allocated (mb) Comparand total allocated (mb) Δ total allocated (mb) Δ% total allocated (mb) Baseline total pause time (msec) Comparand total pause time (msec) Δ total pause time (msec) Δ% total pause time (msec) Baseline gc pause time % Comparand gc pause time % Δ gc pause time % Δ% gc pause time % Baseline JIT_WriteBarrier Comparand JIT_WriteBarrier Δ JIT_WriteBarrier Δ% JIT_WriteBarrier Baseline JIT_ByRefWriteBarrier Comparand JIT_ByRefWriteBarrier Δ JIT_ByRefWriteBarrier Δ% JIT_ByRefWriteBarrier
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments regions 108.59 322.66 214.06 197.12 128 128 0 0 4 4 0 0 100 0 -100 -100 0 0 0 NaN 1.89 1.88 -0 -0.23 1.52 1.52 0.01 0.39 1.89 1.83 -0.06 -3.41 0 0 0 NaN 0 0 0 NaN
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments writebarrier_3 108.59 178.52 69.92 64.39 128 128 0 0 4 4 0 0 100 50 -50 -50 0 0 0 NaN 1.89 1.88 -0.01 -0.34 1.52 1.52 0.01 0.41 1.89 2.02 0.13 7.11 0 0 0 NaN 0 0 0 NaN

Baseline - Segments

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-QRXOHO : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove EnvironmentVariables=COMPlus_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000
Force=False InvocationCount=1 IterationCount=128
IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15
UnrollFactor=1 WarmupCount=1

Method Mean Error StdDev StdErr Median Min Max Q1 Q3 Op/s Allocated
NonEmptyNameSpaceToString 108.6 ns 45.77 ns 153.7 ns 13.59 ns 100.0 ns 0.0 ns 1,000.0 ns 100.0 ns 100.0 ns 9,208,633.1 488 B

Regions

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-VEMLHO : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove PowerPlanMode=00000000-0000-0000-0000-000000000000 Force=False
InvocationCount=1 IterationCount=128 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 UnrollFactor=1
WarmupCount=1

Method Mean Error StdDev StdErr Median Min Max Q1 Q3 Op/s Allocated
NonEmptyNameSpaceToString 322.7 ns 431.3 ns 1,448.5 ns 128.0 ns 0.0 ns 0.0 ns 14,300.0 ns 0.0 ns 0.0 ns 3,099,273.6 488 B

Regions w/ More Precise Writebarrier

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-LKBXKI : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove EnvironmentVariables=COMPlus_GCWriteBarrier=3,COMPlus_EnableWriteXorExecute=0 PowerPlanMode=00000000-0000-0000-0000-000000000000
Force=False InvocationCount=1 IterationCount=128
IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15
UnrollFactor=1 WarmupCount=1

Method Mean Error StdDev StdErr Median Min Max Q1 Q3 Op/s Allocated
NonEmptyNameSpaceToString 178.5 ns 155.5 ns 522.3 ns 46.17 ns 50.00 ns 0.0 ns 2,950.0 ns 50.00 ns 50.00 ns 5,601,750.5 488 B

@mrsharm
Copy link
Member

mrsharm commented Sep 23, 2022

We have been working through root-causing and fixing this issue and here are our conclusions:

  1. We root caused this issue to the introduction of a higher number and size of decommits for regions (enabled in 7.0) as opposed to segments (default in 6.0).
  2. We are working on a fix: Gradual decommit in wks #73620 that improves the numbers for these benchmarks, however, even with this fix we are not at the same level as segments in terms of mean execution time, number and size of virtual commits and decommits as before and this is something we plan to further address.
  3. We think the next steps here are to merge the fix that improves the regression and work on further improvements in 8.0. More specifically, we should merge the fix mentioned above and move the issue to 8.0 (cc: @mangod9)
  4. We found that running the benchmarks above using default parameters results in an inequitable for our GC evaluation comparison purposes where:
    • There are an unequal number of invocations per iteration resulting in a lob sided effect on the mean. Fixed by passing in a derived invocation count by doing a dry run and making use of that invocation count for all our runs.
    • Outliers are removed. Fixed by not removing outliers by setting the following in the command line args to run the microbenchmarks --outliers DontRemove.
    • There are a variable number of induced GCs resulting from the unequal number of invocations and evaluation overhead runs. Fixed by removing the evaluation overhead stage.
  5. Using a more equitable comparison repeated a number of times, we consistently observed the following:
    • The regressions for the tests for regions were never above 10%.
    • More consistent results that were less volatile.

Results

Benchmark Baseline Comparand ΔMean ΔMean %
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments regions 1.56 7.62
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments decommit_fix 0.97 4.75

Details

Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
  [Host]     : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
  Job-OERVVK : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  • The baseline run is based on Peter’s fix rebased on the latest main from yesterday and using the segments’ CLR GC.
  • The regions run is based on the latest main from yesterday.
  • The decommit_fix is based on Peter’s fix rebased on the latest main.
  • The commit hash for the main used: 82d76a4

Full comparison table with the virtual commit and decommit data: Compare.xlsx

Tracking these issues here: #73592

CC: @dotnet/gc

@mangod9
Copy link
Member

mangod9 commented Sep 23, 2022

Thanks @mrsharm. @adamsitnik checking that you agree with the analysis of ensuring the GC impact is equitable across runs and with that the regression is not too high?

@AndyAyersMS
Copy link
Member

I have seen this effect when comparing things too (say OSR on vs off) -- if the two variants behave differently BDN may change its measurement approach to the point where the measurement strategy becomes a contributing factor in the overall reported results.

Keeping things the "same" across the two sets of measurements often resolves this, though not always.

@mangod9
Copy link
Member

mangod9 commented Sep 23, 2022

is this the .NET equivalent of the uncertainty principle :)

@mrsharm
Copy link
Member

mrsharm commented Sep 23, 2022

Keeping things the "same" across the two sets of measurements often resolves this, though not always.

Definitely, that's why we kept an eye on standard error and standard error normalized by the mean as well to ensure the repeatability of these experiments:

Benchmark Name Baseline Comparand Δ standard error Δ standard error / mean
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments regions -0.2263 -0.0114
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments decommit_fix -0.2705 -0.0133

Getting to this low standard error was a matter of discerning a high enough invocation count described above. The result was a consistent and equitable comparison i.e., we ran this experiment for 20+ times and got similar results.

@mangod9
Copy link
Member

mangod9 commented Sep 30, 2022

Assume its ok to close this now since the normalized regression after the decommit fix is reasonable?

@mangod9 mangod9 closed this as completed Sep 30, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-GC-coreclr blocking-release runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests