Performance regression with SIMD in .NET 6 #51915

aalmada · 2021-04-27T09:25:08Z

Description

I've been periodically running benchmarks on multiple LINQ libraries. I recently upgraded these to .NET 6 and noticed a regression for some SIMD cases.

Configuration

The benchmarks use BenchmarkDotNet and the configuration can be found at https://github.com/NetFabric/LinqBenchmarks/blob/afdb508341242c94d525f6858addbba2d96bc132/LinqBenchmarks/Program.cs#L25

I'm using .NET 6.0.100-preview.3.21202.5

The regression can be reproduced both using LinqFaster and NetFabric.Hyperlinq.

Regression?

The benchmark repository contains the latest results of the benchmarks, comparing the results of .NET 5 against .NET 6.

Data

The benchmarks for the query Range().ToArray() shows no major difference between .NET 5 and .NET 6: https://github.com/NetFabric/LinqBenchmarks/blob/afdb508341242c94d525f6858addbba2d96bc132/Results/Range.RangeToArray.md

But, for the query Range().Select().ToArray(), the SIMD-enabled .NET 6 version is much slower, for both libraries: https://github.com/NetFabric/LinqBenchmarks/blob/afdb508341242c94d525f6858addbba2d96bc132/Results/Range.RangeSelectToArray.md

Analysis

I'm very sorry, I tried, but I cannot pinpoint the issue. Still, I hope this will help.

Both libraries use System.Numerics.

I'm the developer of NetFabric.Hyperlinq and I can point you to the core source code used for both cases:

Range().ToArray() - https://github.com/NetFabric/NetFabric.Hyperlinq/blob/7c971368b925cb9c4e687bf94c8314c4178d4410/NetFabric.Hyperlinq/Utils/Copy/Copy.Range.cs#L12
Range().Select().ToArray() - https://github.com/NetFabric/NetFabric.Hyperlinq/blob/7c971368b925cb9c4e687bf94c8314c4178d4410/NetFabric.Hyperlinq/Utils/Copy/Copy.Range.cs#L69

In both cases, an array is allocated with the known size and passed to one of these methods as a Span<int>.

I run the benchmarks multiple times and always get the same results.

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2021-04-27T09:25:12Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

EgorBo · 2021-04-27T13:41:32Z

After some investigation I think the reason is the same as #49071
(should be fixed in some next Preview version, see #49503)
e.g.:

[Benchmark]
public Vector<int> Bench2()
{
    return new Vector<int>(Vector<int>.Count) * 3;
}

BDN:

.NET 5.0.4 (5.0.421.11614), X64 RyuJIT

; Prog.Bench2()
       vzeroupper
       mov       eax,8
       vmovd     xmm0,eax
       vpbroadcastd ymm0,xmm0
       mov       eax,3
       vmovd     xmm1,eax
       vpbroadcastd ymm1,xmm1
       vpmulld   ymm0,ymm1,ymm0
       vmovupd   [rdx],ymm0
       mov       rax,rdx
       vzeroupper
       ret
; Total bytes of code 47

.NET 6.0.0 (6.0.21.20104), X64 RyuJIT

; Prog.Bench2()
       push      rdi
       push      rsi
       sub       rsp,68
       vzeroupper
       mov       rsi,rdx
       mov       ecx,8
       vmovd     xmm0,ecx
       vpbroadcastd ymm0,xmm0
       vmovupd   [rsp+20],ymm0
       vxorps    ymm0,ymm0,ymm0
       vmovupd   [rsp+40],ymm0
       xor       edi,edi
M00_L00:
       lea       rcx,[rsp+20]
       mov       ecx,[rcx+rdi*4]
       mov       edx,3
       call      System.Numerics.Vector`1[[System.Int32, System.Private.CoreLib]].ScalarMultiply(Int32, Int32)
       lea       rdx,[rsp+40]
       mov       [rdx+rdi*4],eax
       inc       rdi
       cmp       rdi,8
       jl        short M00_L00
       vmovupd   ymm0,[rsp+40]
       vmovupd   [rsi],ymm0
       mov       rax,rsi
       vzeroupper
       add       rsp,68
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 102

tannergooding · 2021-04-27T15:04:46Z

For reference, the codegen as of the current nightly bits is now:

; Prog.Bench2()
    vzeroupper 
    mov      eax, 8
    vmovd    xmm0, eax
    vpbroadcastd ymm0, ymm0
    vpmulld  ymm0, ymm0, ymmword ptr[reloc @RWD00]
    vmovupd  ymmword ptr[rcx], ymm0
    mov      rax, rcx
    vzeroupper
    ret
; Total bytes of code 37

tannergooding · 2021-04-27T15:09:33Z

@aalmada, Would you be willing to retest with the latest nightly SDK: https://github.com/dotnet/installer#installers-and-binaries?

Doing so would allow us to confirm there are no other regressions in the area and that the fix does indeed cover the regression you detected.

aalmada · 2021-04-27T22:38:53Z

@tannergooding
I rerun the benchmark for the affected query, now with the nightly build, and I can confirm that the issue is gone. 👍
I'm now going to let all the benchmarks run through the night.
Thanks!

aalmada · 2021-04-28T07:56:49Z

@tannergooding
I rerun all the benchmarks using .NET 6.0.100-preview.4.21227.6 and can confirm once again that the issue is gone.
In case you'd like to compare the performance of multiple LINQ implementations, both on .NET 5 and this version of .NET 6, you can find all the results at https://github.com/NetFabric/LinqBenchmarks/tree/444de2fe44b60fa86a2da02751551804dd834e61

aalmada added the tenet-performance Performance related issue label Apr 27, 2021

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 27, 2021

aalmada mentioned this issue Apr 27, 2021

Dynamic PGO #43618

Closed

54 tasks

tannergooding removed the untriaged New issue has not been triaged by the area owner label Apr 27, 2021

tannergooding self-assigned this Apr 27, 2021

jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 27, 2021

aalmada closed this as completed Apr 28, 2021

ghost locked as resolved and limited conversation to collaborators May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression with SIMD in .NET 6 #51915

Performance regression with SIMD in .NET 6 #51915

aalmada commented Apr 27, 2021

dotnet-issue-labeler bot commented Apr 27, 2021

EgorBo commented Apr 27, 2021 •

edited

Loading

tannergooding commented Apr 27, 2021

tannergooding commented Apr 27, 2021

aalmada commented Apr 27, 2021

aalmada commented Apr 28, 2021

Performance regression with SIMD in .NET 6 #51915

Performance regression with SIMD in .NET 6 #51915

Comments

aalmada commented Apr 27, 2021

Description

Configuration

Regression?

Data

Analysis

dotnet-issue-labeler bot commented Apr 27, 2021

EgorBo commented Apr 27, 2021 • edited Loading

.NET 5.0.4 (5.0.421.11614), X64 RyuJIT

.NET 6.0.0 (6.0.21.20104), X64 RyuJIT

tannergooding commented Apr 27, 2021

tannergooding commented Apr 27, 2021

aalmada commented Apr 27, 2021

aalmada commented Apr 28, 2021

EgorBo commented Apr 27, 2021 •

edited

Loading