-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression with SIMD in .NET 6 #51915
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
After some investigation I think the reason is the same as #49071 [Benchmark]
public Vector<int> Bench2()
{
return new Vector<int>(Vector<int>.Count) * 3;
} BDN: .NET 5.0.4 (5.0.421.11614), X64 RyuJIT; Prog.Bench2()
vzeroupper
mov eax,8
vmovd xmm0,eax
vpbroadcastd ymm0,xmm0
mov eax,3
vmovd xmm1,eax
vpbroadcastd ymm1,xmm1
vpmulld ymm0,ymm1,ymm0
vmovupd [rdx],ymm0
mov rax,rdx
vzeroupper
ret
; Total bytes of code 47 .NET 6.0.0 (6.0.21.20104), X64 RyuJIT; Prog.Bench2()
push rdi
push rsi
sub rsp,68
vzeroupper
mov rsi,rdx
mov ecx,8
vmovd xmm0,ecx
vpbroadcastd ymm0,xmm0
vmovupd [rsp+20],ymm0
vxorps ymm0,ymm0,ymm0
vmovupd [rsp+40],ymm0
xor edi,edi
M00_L00:
lea rcx,[rsp+20]
mov ecx,[rcx+rdi*4]
mov edx,3
call System.Numerics.Vector`1[[System.Int32, System.Private.CoreLib]].ScalarMultiply(Int32, Int32)
lea rdx,[rsp+40]
mov [rdx+rdi*4],eax
inc rdi
cmp rdi,8
jl short M00_L00
vmovupd ymm0,[rsp+40]
vmovupd [rsi],ymm0
mov rax,rsi
vzeroupper
add rsp,68
pop rsi
pop rdi
ret
; Total bytes of code 102 |
For reference, the codegen as of the current nightly bits is now: ; Prog.Bench2()
vzeroupper
mov eax, 8
vmovd xmm0, eax
vpbroadcastd ymm0, ymm0
vpmulld ymm0, ymm0, ymmword ptr[reloc @RWD00]
vmovupd ymmword ptr[rcx], ymm0
mov rax, rcx
vzeroupper
ret
; Total bytes of code 37 |
@aalmada, Would you be willing to retest with the latest nightly SDK: https://github.com/dotnet/installer#installers-and-binaries? Doing so would allow us to confirm there are no other regressions in the area and that the fix does indeed cover the regression you detected. |
@tannergooding |
@tannergooding |
Description
I've been periodically running benchmarks on multiple LINQ libraries. I recently upgraded these to .NET 6 and noticed a regression for some SIMD cases.
Configuration
The benchmarks use BenchmarkDotNet and the configuration can be found at https://github.com/NetFabric/LinqBenchmarks/blob/afdb508341242c94d525f6858addbba2d96bc132/LinqBenchmarks/Program.cs#L25
I'm using .NET 6.0.100-preview.3.21202.5
The regression can be reproduced both using LinqFaster and NetFabric.Hyperlinq.
Regression?
The benchmark repository contains the latest results of the benchmarks, comparing the results of .NET 5 against .NET 6.
Data
The benchmarks for the query
Range().ToArray()
shows no major difference between .NET 5 and .NET 6: https://github.com/NetFabric/LinqBenchmarks/blob/afdb508341242c94d525f6858addbba2d96bc132/Results/Range.RangeToArray.mdBut, for the query
Range().Select().ToArray()
, the SIMD-enabled .NET 6 version is much slower, for both libraries: https://github.com/NetFabric/LinqBenchmarks/blob/afdb508341242c94d525f6858addbba2d96bc132/Results/Range.RangeSelectToArray.mdAnalysis
I'm very sorry, I tried, but I cannot pinpoint the issue. Still, I hope this will help.
Both libraries use
System.Numerics
.I'm the developer of
NetFabric.Hyperlinq
and I can point you to the core source code used for both cases:Range().ToArray()
- https://github.com/NetFabric/NetFabric.Hyperlinq/blob/7c971368b925cb9c4e687bf94c8314c4178d4410/NetFabric.Hyperlinq/Utils/Copy/Copy.Range.cs#L12Range().Select().ToArray()
- https://github.com/NetFabric/NetFabric.Hyperlinq/blob/7c971368b925cb9c4e687bf94c8314c4178d4410/NetFabric.Hyperlinq/Utils/Copy/Copy.Range.cs#L69In both cases, an array is allocated with the known size and passed to one of these methods as a
Span<int>
.I run the benchmarks multiple times and always get the same results.
The text was updated successfully, but these errors were encountered: