Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Linux/arm64: 4 Regressions on 4/8/2024 7:16:22 PM #100922

Open
performanceautofiler bot opened this issue Apr 11, 2024 · 7 comments
Open

[Perf] Linux/arm64: 4 Regressions on 4/8/2024 7:16:22 PM #100922

performanceautofiler bot opened this issue Apr 11, 2024 · 7 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI in-pr There is an active PR which will close this issue when it is merged os-linux Linux OS (any supported distro) Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented Apr 11, 2024

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 230dc86e9d92fbf191bf3b45b3f1b656f83d4426
Compare 404b286b23093cd93a985791934756f64a33483e
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Tests.Perf_Encoding

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
109.71 ns 128.17 ns 1.17 0.03 False
112.29 ns 121.25 ns 1.08 0.04 False
156.63 ns 170.52 ns 1.09 0.01 False

graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Tests.Perf_Encoding*'

System.Text.Tests.Perf_Encoding.GetString(size: 512, encName: "ascii")

ETL Files

Histogram

JIT Disasms

System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "ascii")

ETL Files

Histogram

JIT Disasms

System.Text.Tests.Perf_Encoding.GetString(size: 512, encName: "utf-8")

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 230dc86e9d92fbf191bf3b45b3f1b656f83d4426
Compare 404b286b23093cd93a985791934756f64a33483e
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Perf_Ascii

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
15.27 ns 17.98 ns 1.18 0.32 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Perf_Ascii*'

System.Text.Perf_Ascii.ToUtf16(Size: 128)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Apr 11, 2024
@DrewScoggins DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Apr 11, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 11, 2024
@DrewScoggins DrewScoggins added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 11, 2024
@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 12, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@LoopedBard3
Copy link
Member

LoopedBard3 commented Apr 16, 2024

Related regressions:
Windows x64: dotnet/perf-autofiling-issues#32733
Linux x64: dotnet/perf-autofiling-issues#32721

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Apr 19, 2024
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Apr 19, 2024
@dotnet dotnet deleted a comment from EgorBot May 25, 2024
@dotnet dotnet deleted a comment from EgorBot May 25, 2024
@dotnet dotnet deleted a comment from EgorBot May 26, 2024
@dotnet dotnet deleted a comment from EgorBot May 26, 2024
@dotnet dotnet deleted a comment from EgorBot May 26, 2024
@dotnet dotnet deleted a comment from EgorBot May 26, 2024
@EgorBo
Copy link
Member

EgorBo commented May 26, 2024

@EgorBot -arm64 -perf -commit bc2bd2b vs 8e2655b --disasm

using BenchmarkDotNet.Attributes;
using System.Buffers;
using System.Linq;
using System.Text;

BenchmarkDotNet.Running.BenchmarkRunner.Run<Perf_Ascii>(args: args);

public class Perf_Ascii
{
    [Params(
        128)] // vectorized code path
    public int Size;

    private byte[] _bytes, _sameBytes, _bytesDifferentCase;
    private char[] _characters, _sameCharacters, _charactersDifferentCase;

    [GlobalSetup]
    public void Setup()
    {
        _bytes = new byte[Size];
        _bytesDifferentCase = new byte[Size];

        for (int i = 0; i < Size; i++)
        {
            // let ToLower and ToUpper perform the same amount of work
            _bytes[i] = i % 2 == 0 ? (byte)'a' : (byte)'A';
            _bytesDifferentCase[i] = i % 2 == 0 ? (byte)'A' : (byte)'a';
        }
        _sameBytes = _bytes.ToArray();
        _characters = _bytes.Select(b => (char)b).ToArray();
        _sameCharacters = _characters.ToArray();
        _charactersDifferentCase = _bytesDifferentCase.Select(b => (char)b).ToArray();
    }

    [Benchmark]
    [MemoryRandomization]
    public OperationStatus ToUtf16() => Ascii.ToUtf16(_bytes, _characters, out _);
}

@dotnet dotnet deleted a comment from EgorBot May 26, 2024
@EgorBot
Copy link

EgorBot commented May 26, 2024

Results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-TGKDHM : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-DZBMEA : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
OutlierMode=DontRemove  MemoryRandomization=True
Method Toolchain Size Mean Error Ratio Code Size
ToUtf16 Main 128 15.32 ns 0.337 ns 1.00 220 B
ToUtf16 PR 128 18.28 ns 0.117 ns 1.20 220 B

See BDN_Artifacts.zip for details.

🔥Profiler

Flame graphs: Main vs PR (interactive!)
Hot asm: Main vs PR
Hot functions: Main vs PR

Notes

For clean perf results, make sure you have just one [Benchmark] in your app.

@dotnet dotnet deleted a comment from EgorBot May 26, 2024
@EgorBo EgorBo added the Priority:2 Work that is important, but not critical for the release label Jul 22, 2024
@EgorBo
Copy link
Member

EgorBo commented Jul 29, 2024

Will be fixed by #102705

@EgorBo EgorBo added the in-pr There is an active PR which will close this issue when it is merged label Aug 1, 2024
@JulieLeeMSFT
Copy link
Member

Pushing out to .NET10 since it is a minor perf regression.

@JulieLeeMSFT JulieLeeMSFT modified the milestones: 9.0.0, 10.0.0 Aug 5, 2024
@stephentoub
Copy link
Member

stephentoub commented Aug 5, 2024

Pushing out to .NET10 since it is a minor perf regression.

Some of these are upwards of 20%, and in the linked issues that appear to be deduped against this, they're measured in us rather than ns. Am I reading this incorrectly or is there other context I'm missing?

@JulieLeeMSFT ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI in-pr There is an active PR which will close this issue when it is merged os-linux Linux OS (any supported distro) Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants