Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize "(vec & cns) == zero" on arm64 #102705

Closed
wants to merge 3 commits into from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented May 26, 2024

Fixes #100922 regression - it was regressed by #99982

bool AllAscii(Vector128<byte> vector) => 
    (vector & Vector128.Create((byte)0x80)).Equals(Vector128<byte>.Zero);

Main:

; Method Proga:AllAscii
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            movi    v16.16b, #0x80
            and     v16.16b, v0.16b, v16.16b
            umaxp   v16.4s, v16.4s, v16.4s
            umov    x0, v16.d[0]
            cmp     x0, #0
            cset    x0, eq
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 40

PR:

; Method Proga:AllAscii
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            umaxp   v16.4s, v0.4s, v0.4s
            umov    x0, v16.d[0]
            tst     x0, #0x8080808080808080
            cset    x0, eq
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 32

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 26, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member Author

EgorBo commented May 26, 2024

@EgorBot -arm64 -profiler

using BenchmarkDotNet.Attributes;
using System.Buffers;
using System.Text;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Perf_Ascii>(args: args);

[DisassemblyDiagnoser(maxDepth: 5)]
public class Perf_Ascii
{
    byte[] _bytes = new byte[128];
    char[] _characters = new char[128];

    [Benchmark]
    public OperationStatus ToUtf16() => Ascii.ToUtf16(_bytes, _characters, out _);
}

@EgorBot
Copy link

EgorBot commented May 27, 2024

Results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-TOHLVW : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-IXPJBK : .NET 9.0.0 (), Arm64 RyuJIT AdvSIMD
Method Toolchain Mean Error Ratio Code Size
ToUtf16 Main 17.24 ns 0.004 ns 1.00 516 B
ToUtf16 PR 15.67 ns 0.001 ns 0.91 500 B

See BDN_Artifacts.zip for details.

🔥Profiler

Flame graphs: Main vs PR (interactive!)
Hot asm: Main vs PR
Hot functions: Main vs PR

Notes

For clean perf results, make sure you have just one [Benchmark] in your app.

// If op is "vec & cnsVec" where both u64 components in that cnsVec are the same (for both SIMD12 and
// SIMD16) then we'd better do this AND on top of TYP_LONG NI_AdvSimd_Extract in the end - it produces a
// more optimal codegen.
if (op->OperIsHWIntrinsic(NI_AdvSimd_And) && op->AsHWIntrinsic()->Op(2)->OperIs(GT_CNS_VEC))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't have to be a constant right?

Just any (x & y) == zero or (x & y) != zero can be optimzied down to a tst (on both xarch and arm64).

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@xtqqczze
Copy link
Contributor

Blocks #105047.

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Perf] Linux/arm64: 4 Regressions on 4/8/2024 7:16:22 PM
4 participants