Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Linux/x64: 4 Regressions on 2/25/2024 4:37:10 PM #99002

Closed
performanceautofiler bot opened this issue Feb 27, 2024 · 7 comments
Closed

[Perf] Linux/x64: 4 Regressions on 2/25/2024 4:37:10 PM #99002

performanceautofiler bot opened this issue Feb 27, 2024 · 7 comments
Labels
arch-x64 area-VM-coreclr os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented Feb 27, 2024

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline f32c428c86b4cc41e88e2e5a750c37dfb354e33a
Compare 5ef47c852ffd51aaeb52e34391fe4fed261c9f26
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Tests.Perf_StringBuilder

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
87.46 μs 101.66 μs 1.16 0.00 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Tests.Perf_StringBuilder*'

Payloads

Baseline
Compare

System.Text.Tests.Perf_StringBuilder.ToString_MultipleSegments(length: 100000)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-x64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Feb 27, 2024
@DrewScoggins DrewScoggins removed the untriaged New issue has not been triaged by the area owner label Feb 27, 2024
@DrewScoggins DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Feb 27, 2024
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 27, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 27, 2024
@DrewScoggins
Copy link
Member

DrewScoggins commented Feb 27, 2024

@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 28, 2024
@mangod9
Copy link
Member

mangod9 commented Jun 26, 2024

@EgorBo is this by design, so it could be closed?

@EgorBo
Copy link
Member

EgorBo commented Jun 26, 2024

@EgorBot -amd64 -intel -perf -commit 79dd9ba vs fab69ef --disasm

using BenchmarkDotNet.Attributes;
using System.Text;

public class MyProgram
{
    const int LOHAllocatedStringSize = 100_000;
    private string _stringLOH;
    private string _string100;
    private StringBuilder _builderSingleSegment100;
    private StringBuilder _builderSingleSegmentLOH;
    private StringBuilder _builderMultipleSegments100;
    private StringBuilder _builderMultipleSegmentsLOH;

    [GlobalSetup(Target = nameof(ToString_MultipleSegments))]
    public void Setup_ToString_MultipleSegments()
    {
        _builderMultipleSegments100 = Append_Char(100); // 16 + 32 + 48 + 96 char segments
        _builderMultipleSegmentsLOH = Append_Char(LOHAllocatedStringSize);
    }

    public StringBuilder Append_Char(int length)
    {
        StringBuilder builder = new StringBuilder();
        for (int i = 0; i < length; i++)
            builder.Append('a');
        return builder;
    }

    [Benchmark]
    [Arguments(LOHAllocatedStringSize)]
    public string ToString_MultipleSegments(int length) => (length == 100 ? _builderMultipleSegments100 : _builderMultipleSegmentsLOH).ToString();
}

@dotnet dotnet deleted a comment from EgorBot Jun 26, 2024
@dotnet dotnet deleted a comment from EgorBot Jun 26, 2024
@EgorBot
Copy link

EgorBot commented Jun 26, 2024

Benchmark results on Intel
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  Job-JXBPJF : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BMGTJF : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain length Mean Error Ratio Code Size
ToString_MultipleSegments Main 100000 72.87 μs 1.077 μs 1.00 262 B
ToString_MultipleSegments PR 100000 110.64 μs 1.693 μs 1.52 262 B

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBot
Copy link

EgorBot commented Jun 26, 2024

Benchmark results on Amd
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores
  Job-CNQFTW : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-FMFWZL : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Toolchain length Mean Error Ratio Code Size
ToString_MultipleSegments Main 100000 110.0 μs 0.37 μs 1.00 262 B
ToString_MultipleSegments PR 100000 145.6 μs 0.31 μs 1.32 262 B

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@mangod9
Copy link
Member

mangod9 commented Jul 3, 2024

@EgorBo so the regression still exists?

@EgorBo
Copy link
Member

EgorBo commented Jul 3, 2024

@EgorBo so the regression still exists?

It does, but there is not much we can do here.
Previously we used to call native memmove directly (in coop mode), now we call managed SpanHelpers.Memmove instead that forwards back to native memmove (with gc transition) for large sizes, so we effectively pay for that wrapper. It only happens for large sizes.

Since there are more improvements (11) than regressions (4) and the fact that that PR:

  1. Fixed a bug on NAOT: NativeAOT: Memset/Memcpy helpers don't throw NRE #95517
  2. Made Memmove more suspension friendly: memcpy/memset helpers on CoreCLR may not be suspension-friendly #98620

we can close it.

@EgorBo EgorBo closed this as completed Jul 3, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Jul 3, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Aug 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-VM-coreclr os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime
Projects
None yet
Development

No branches or pull requests

6 participants