-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regressions in Burgers.Test3 #80129
Comments
Run Information
Regressions in PerfLabTests.DelegatePerf
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'PerfLabTests.DelegatePerf*' PayloadsHistogramPerfLabTests.DelegatePerf.DelegateInvoke
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
Regressions in System.Collections.Tests.Perf_PriorityQueue<String, String>
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Tests.Perf_PriorityQueue<String, String>*' PayloadsHistogramSystem.Collections.Tests.Perf_PriorityQueue<String, String>.Enumerate(Size: 100)
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Tests.Perf_Single
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Single*' PayloadsHistogramSystem.Tests.Perf_Single.ToString(value: -3.4028235E+38)
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Tests.Perf_Random
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Random*' PayloadsHistogramSystem.Tests.Perf_Random.NextBytes_span
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Xml.Tests.Perf_XmlConvert
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Xml.Tests.Perf_XmlConvert*' PayloadsHistogramSystem.Xml.Tests.Perf_XmlConvert.DateTime_ToString_Unspecified
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Reflection.Invoke
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Reflection.Invoke*' PayloadsHistogramSystem.Reflection.Invoke.StaticMethod4_arrayNotCached_int_string_struct_class
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Drawing.Tests.Perf_Color
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Drawing.Tests.Perf_Color*' PayloadsHistogramSystem.Drawing.Tests.Perf_Color.GetHue
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.MathBenchmarks.Single
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.MathBenchmarks.Single*' PayloadsHistogramSystem.MathBenchmarks.Single.ExpM1
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository |
Run Information
Regressions in PerfLabTests.CastingPerf
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'PerfLabTests.CastingPerf*' PayloadsHistogramPerfLabTests.CastingPerf.IntObj
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
Regressions in System.Memory.Span<Int32>
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span<Int32>*' PayloadsHistogramSystem.Memory.Span<Int32>.IndexOfAnyFourValues(Size: 33)
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Diagnostics.Perf_Activity
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Diagnostics.Perf_Activity*' PayloadsHistogramSystem.Diagnostics.Perf_Activity.EnumerateActivityEventsSmall
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in PerfLabTests.LowLevelPerf
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'PerfLabTests.LowLevelPerf*' PayloadsHistogramPerfLabTests.LowLevelPerf.StaticDelegate
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.IO.Tests.Perf_StreamWriter
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.IO.Tests.Perf_StreamWriter*' PayloadsHistogramSystem.IO.Tests.Perf_StreamWriter.WriteString(writeLength: 100)
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*' PayloadsHistogramSystem.Text.RegularExpressions.Tests.Perf_Regex_Common.ReplaceWords(Options: Compiled)
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Memory.Slice<String>
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Slice<String>*' PayloadsHistogramSystem.Memory.Slice<String>.ReadOnlyMemorySpanStartLength
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in Burgers
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Burgers*' PayloadsHistogramBurgers.Test3
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Tests.Perf_Char
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Char*' PayloadsHistogramSystem.Tests.Perf_Char.Char_IsUpper(input: "Good afternoon, Constable!")
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Xml.Linq.Perf_XElementList
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Xml.Linq.Perf_XElementList*' PayloadsHistogramSystem.Xml.Linq.Perf_XElementList.Enumerator
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Text.Json.Document.Tests.Perf_EnumerateArray
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Json.Document.Tests.Perf_EnumerateArray*' PayloadsHistogramSystem.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateUsingIndexer(TestCase: ArrayOfStrings)
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository |
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Most of these look like a noise, but some e.g. Burgers.Test3 is likely regressed by #79720 |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsRun Information
Regressions in System.Text.Encodings.Web.Tests.Perf_Encoders
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Encodings.Web.Tests.Perf_Encoders*' PayloadsHistogramSystem.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf16(arguments: JavaScript,no escaping required,512)
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
Regressions in PerfLabTests.CastingPerf2.CastingPerf
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'PerfLabTests.CastingPerf2.CastingPerf*' PayloadsHistogramPerfLabTests.CastingPerf2.CastingPerf.ScalarValueTypeObj
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Text.Json.Tests.Utf8JsonReaderCommentsTests
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Json.Tests.Utf8JsonReaderCommentsTests*' PayloadsHistogramSystem.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing(CommentHandling: Skip, SegmentSize: 100, TestCase: ShortSingleLine)
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Linq.Tests.Perf_Enumerable
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Linq.Tests.Perf_Enumerable*' PayloadsHistogramSystem.Linq.Tests.Perf_Enumerable.WhereLast_LastElementMatches(input: IEnumerable)
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Numerics.Tests.Perf_BigInteger
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_BigInteger*' PayloadsHistogramSystem.Numerics.Tests.Perf_BigInteger.Add(arguments: 1024,1024 bits)
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Perf_Convert
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Perf_Convert*' PayloadsHistogramSystem.Perf_Convert.ChangeType
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Collections.IterateForEach<String>
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateForEach<String>*' PayloadsHistogramSystem.Collections.IterateForEach<String>.List(Size: 512)
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
|
Same on arm: (only Burgers.Test3 popped up) |
Working on getting a disasm diff right now. |
Before; Assembly listing for method Burgers:GetCalculated3(int,int,double,double,double,double[]):double[]
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T15] ( 4, 7 ) int -> rdi single-def
; V01 arg1 [V01,T10] ( 8, 11 ) int -> rsi single-def
; V02 arg2 [V02,T30] ( 6, 12 ) double -> mm6 single-def
; V03 arg3 [V03,T31] ( 6, 12 ) double -> mm7 single-def
; V04 arg4 [V04,T38] ( 1, 1 ) double -> [rsp+E0H] single-def
; V05 arg5 [V05,T18] ( 2, 2 ) ref -> rbx class-hnd single-def
; V06 loc0 [V06,T17] ( 4, 4 ) int -> rbp single-def
; V07 loc1 [V07,T04] ( 8, 53 ) ref -> registers class-hnd
; V08 loc2 [V08,T02] ( 16, 91 ) ref -> registers class-hnd
; V09 loc3 [V09,T34] ( 4, 10 ) double -> mm0 single-def
; V10 loc4 [V10,T11] ( 4, 13 ) int -> rdx
; V11 loc5 [V11,T16] ( 2, 8 ) ref -> r14 class-hnd
; V12 loc6 [V12,T00] ( 12,180 ) int -> r8
; V13 loc7 [V13,T19] ( 5, 80 ) simd32 -> mm4 ld-addr-op
; V14 loc8 [V14,T20] ( 3, 48 ) simd32 -> mm5 ld-addr-op
; V15 loc9 [V15,T21] ( 2, 32 ) simd32 -> mm9 ld-addr-op
; V16 loc10 [V16,T22] ( 2, 32 ) simd32 -> mm4 ld-addr-op
; V17 OutArgs [V17 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V18 tmp1 [V18,T28] ( 2, 16 ) double -> mm4 "Strict ordering of exceptions for Array store"
; V19 tmp2 [V19,T29] ( 2, 16 ) double -> mm4 "Strict ordering of exceptions for Array store"
; V20 cse0 [V20,T25] ( 2, 17 ) simd32 -> mm1 "CSE - moderate"
; V21 cse1 [V21,T23] ( 5, 20 ) double -> mm4 "CSE - moderate"
; V22 cse2 [V22,T32] ( 3, 12 ) double -> mm9 "CSE - moderate"
; V23 cse3 [V23,T33] ( 3, 12 ) double -> mm9 "CSE - moderate"
; V24 cse4 [V24,T24] ( 5, 20 ) double -> mm4 "CSE - moderate"
; V25 cse5 [V25,T12] ( 4, 13 ) long -> rcx "CSE - moderate"
; V26 cse6 [V26,T01] ( 12,120 ) int -> r9 "CSE - aggressive"
; V27 cse7 [V27,T14] ( 4, 10 ) int -> rax "CSE - moderate"
; V28 cse8 [V28,T09] ( 3, 21 ) int -> rbp "CSE - moderate"
; V29 cse9 [V29,T26] ( 2, 17 ) simd32 -> mm2 "CSE - moderate"
; V30 cse10 [V30,T03] ( 6, 60 ) int -> r10 "CSE - aggressive"
; V31 cse11 [V31,T05] ( 3, 48 ) int -> r10 "CSE - aggressive"
; V32 cse12 [V32,T06] ( 3, 48 ) int -> r10 "CSE - aggressive"
; V33 cse13 [V33,T07] ( 3, 48 ) int -> r11 "CSE - aggressive"
; V34 cse14 [V34,T08] ( 3, 48 ) int -> rbx "CSE - aggressive"
; V35 cse15 [V35,T13] ( 3, 12 ) int -> r8 "CSE - moderate"
; V36 cse16 [V36,T35] ( 4, 10 ) double -> mm8 "CSE - moderate"
; V37 cse17 [V37,T27] ( 2, 17 ) simd32 -> mm3 "CSE - moderate"
; V38 rat0 [V38,T36] ( 2, 4 ) double -> mm0 "argument with side effect"
; V39 rat1 [V39,T37] ( 2, 4 ) double -> mm1 "argument with side effect"
;
; Lcl frame size = 136
G_M52065_IG01:
push r15
push r14
push rdi
push rsi
push rbp
push rbx
sub rsp, 136
vzeroupper
vmovaps xmmword ptr [rsp+70H], xmm6
vmovaps xmmword ptr [rsp+60H], xmm7
vmovaps xmmword ptr [rsp+50H], xmm8
vmovaps xmmword ptr [rsp+40H], xmm9
vmovaps xmmword ptr [rsp+30H], xmm10
vmovaps xmmword ptr [rsp+20H], xmm11
mov edi, ecx
mov esi, edx
vmovaps xmm6, xmm2
vmovaps xmm7, xmm3
mov rbx, gword ptr [rsp+E8H]
;; size=74 bbWeight=1 PerfScore 21.25
G_M52065_IG02:
mov edx, esi
sar edx, 31
and edx, 3
add edx, esi
and edx, -4
mov ecx, esi
sub ecx, edx
mov edx, ecx
neg edx
lea ebp, [rdx+rsi+04H]
movsxd rdx, ebp
mov rcx, 0xD1FFAB1E ; double[]
call CORINFO_HELP_NEWARR_1_VC
mov r14, rax
movsxd rdx, ebp
mov rcx, 0xD1FFAB1E ; double[]
call CORINFO_HELP_NEWARR_1_VC
mov r15, rax
mov r8d, dword ptr [rbx+08H]
mov rcx, rbx
mov rdx, r15
call [System.Array:Copy(System.Array,System.Array,int)]
vmulsd xmm0, xmm7, qword ptr [rsp+E0H]
vdivsd xmm0, xmm0, xmm6
vmovsd xmm8, qword ptr [reloc @RWD00]
vmovaps xmm1, xmm8
call System.Math:Pow(double,double):double
xor edx, edx
test edi, edi
jle G_M52065_IG07
add ebp, -3
vdivsd xmm1, xmm7, xmm6
vbroadcastsd ymm1, ymm1
vmovupd ymm2, ymmword ptr[reloc @RWD32]
vbroadcastsd ymm3, ymm0
lea eax, [rsi-01H]
mov ecx, eax
;; size=154 bbWeight=1 PerfScore 56.25
G_M52065_IG03:
mov r8d, 1
cmp ebp, 1
jle G_M52065_IG05
mov r9d, dword ptr [r15+08H]
align [0 bytes for IG04]
;; size=19 bbWeight=4 PerfScore 14.00
G_M52065_IG04:
cmp r8d, r9d
jae G_M52065_IG10
lea r11d, [r8+03H]
cmp r11d, r9d
jae G_M52065_IG10
vmovupd ymm4, ymmword ptr[r15+8*r8+10H]
lea r10d, [r8-01H]
cmp r10d, r9d
jae G_M52065_IG10
lea ebx, [r8+02H]
cmp ebx, r9d
jae G_M52065_IG10
vmovupd ymm5, ymmword ptr[r15+8*r10+10H]
lea r10d, [r8+01H]
cmp r10d, r9d
jae G_M52065_IG10
lea ebx, [r8+04H]
cmp ebx, r9d
jae G_M52065_IG10
vmovupd ymm9, ymmword ptr[r15+8*r10+10H]
vmulpd ymm10, ymm4, ymm1
vsubpd ymm11, ymm4, ymm5
vmulpd ymm10, ymm10, ymm11
vsubpd ymm10, ymm4, ymm10
vmulpd ymm4, ymm4, ymm2
vsubpd ymm4, ymm9, ymm4
vaddpd ymm4, ymm4, ymm5
vmulpd ymm4, ymm3, ymm4
vaddpd ymm4, ymm10, ymm4
mov r10d, dword ptr [r14+08H]
cmp r8d, r10d
jae G_M52065_IG10
cmp r11d, r10d
jae G_M52065_IG11
mov r8d, r8d
vmovupd ymmword ptr[r14+8*r8+10H], ymm4
mov r8d, ebx
cmp r8d, ebp
jl G_M52065_IG04
;; size=177 bbWeight=16 PerfScore 964.00
G_M52065_IG05:
mov r9d, dword ptr [r15+08H]
test r9d, r9d
je G_M52065_IG12
vmovsd xmm4, qword ptr [r15+10H]
vmulsd xmm5, xmm4, xmm7
vdivsd xmm5, xmm5, xmm6
cmp eax, r9d
jae G_M52065_IG12
vmovsd xmm9, qword ptr [r15+8*rcx+10H]
vsubsd xmm10, xmm4, xmm9
vmulsd xmm5, xmm5, xmm10
vsubsd xmm5, xmm4, xmm5
cmp r9d, 1
jbe G_M52065_IG12
vmovsd xmm10, qword ptr [r15+18H]
vmulsd xmm4, xmm4, xmm8
vsubsd xmm4, xmm10, xmm4
vaddsd xmm4, xmm4, xmm9
vmulsd xmm4, xmm4, xmm0
vaddsd xmm4, xmm5, xmm4
mov r10d, dword ptr [r14+08H]
test r10d, r10d
je G_M52065_IG12
vmovsd qword ptr [r14+10H], xmm4
vmovsd xmm4, qword ptr [r15+8*rcx+10H]
vmulsd xmm5, xmm4, xmm7
vdivsd xmm5, xmm5, xmm6
lea r8d, [rsi-02H]
cmp r8d, r9d
jae G_M52065_IG12
mov r9d, r8d
vmovsd xmm9, qword ptr [r15+8*r9+10H]
vsubsd xmm10, xmm4, xmm9
vmulsd xmm5, xmm5, xmm10
vsubsd xmm5, xmm4, xmm5
vmovsd xmm10, qword ptr [r15+10H]
vmulsd xmm4, xmm4, xmm8
vsubsd xmm4, xmm10, xmm4
vaddsd xmm4, xmm4, xmm9
vmulsd xmm4, xmm4, xmm0
vaddsd xmm4, xmm5, xmm4
cmp eax, r10d
jae SHORT G_M52065_IG12
vmovsd qword ptr [r14+8*rcx+10H], xmm4
inc edx
cmp edx, edi
jl SHORT G_M52065_IG09
;; size=212 bbWeight=4 PerfScore 479.00
G_M52065_IG06:
mov r15, r14
;; size=3 bbWeight=2 PerfScore 0.50
G_M52065_IG07:
mov rax, r15
;; size=3 bbWeight=1 PerfScore 0.25
G_M52065_IG08:
vmovaps xmm6, xmmword ptr [rsp+70H]
vmovaps xmm7, xmmword ptr [rsp+60H]
vmovaps xmm8, xmmword ptr [rsp+50H]
vmovaps xmm9, xmmword ptr [rsp+40H]
vmovaps xmm10, xmmword ptr [rsp+30H]
vmovaps xmm11, xmmword ptr [rsp+20H]
vzeroupper
add rsp, 136
pop rbx
pop rbp
pop rsi
pop rdi
pop r14
pop r15
ret
;; size=55 bbWeight=1 PerfScore 29.25
G_M52065_IG09:
xchg r14, r15
jmp G_M52065_IG03
;; size=8 bbWeight=2 PerfScore 6.00
G_M52065_IG10:
call CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
int3
;; size=6 bbWeight=0 PerfScore 0.00
G_M52065_IG11:
call CORINFO_HELP_THROW_ARGUMENTEXCEPTION
int3
;; size=6 bbWeight=0 PerfScore 0.00
G_M52065_IG12:
call CORINFO_HELP_RNGCHKFAIL
int3
;; size=6 bbWeight=0 PerfScore 0.00
RWD00 dq 4000000000000000h ; 2
RWD08 dd 00000000h, 00000000h, 00000000h, 00000000h, 00000000h, 00000000h
RWD32 dq 4000000000000000h, 4000000000000000h, 4000000000000000h, 4000000000000000h
; Total bytes of code 723, prolog size 74, PerfScore 1642.80, instruction count 172, allocated bytes for code 723 (MethodHash=2f54349e) for method Burgers:GetCalculated3(int,int,double,double,double,double[]):double[]
; ============================================================
Unwind Info:
>> Start offset : 0x000000 (not in unwind data)
>> End offset : 0xd1ffab1e (not in unwind data)
Version : 1
Flags : 0x00
SizeOfProlog : 0x36
CountOfUnwindCodes: 20
FrameRegister : none (0)
FrameOffset : N/A (no FrameRegister) (Value=0)
UnwindCodes :
CodeOffset: 0x36 UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM11 (11)
Scaled Small Offset: 2 * 16 = 32 = 0x00020
CodeOffset: 0x30 UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM10 (10)
Scaled Small Offset: 3 * 16 = 48 = 0x00030
CodeOffset: 0x2A UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM9 (9)
Scaled Small Offset: 4 * 16 = 64 = 0x00040
CodeOffset: 0x24 UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM8 (8)
Scaled Small Offset: 5 * 16 = 80 = 0x00050
CodeOffset: 0x1E UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM7 (7)
Scaled Small Offset: 6 * 16 = 96 = 0x00060
CodeOffset: 0x18 UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6)
Scaled Small Offset: 7 * 16 = 112 = 0x00070
CodeOffset: 0x0F UnwindOp: UWOP_ALLOC_LARGE (1) OpInfo: 0 - Scaled small
Size: 17 * 8 = 136 = 0x00088
CodeOffset: 0x08 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbx (3)
CodeOffset: 0x07 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbp (5)
CodeOffset: 0x06 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rsi (6)
CodeOffset: 0x05 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rdi (7)
CodeOffset: 0x04 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: r14 (14)
CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: r15 (15) After; Assembly listing for method Burgers:GetCalculated3(int,int,double,double,double,double[]):double[]
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 4 single block inlinees; 4 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T13] ( 4, 7 ) int -> rdi single-def
; V01 arg1 [V01,T08] ( 8, 14 ) int -> rsi single-def
; V02 arg2 [V02,T20] ( 6, 27 ) double -> mm6 single-def
; V03 arg3 [V03,T21] ( 6, 27 ) double -> mm7 single-def
; V04 arg4 [V04,T32] ( 1, 1 ) double -> [rsp+B0H] single-def
; V05 arg5 [V05,T16] ( 2, 2 ) ref -> rbx class-hnd single-def
; V06 loc0 [V06,T15] ( 4, 4 ) int -> rbp single-def
; V07 loc1 [V07,T06] ( 8, 53 ) ref -> registers class-hnd
; V08 loc2 [V08,T04] ( 16,103 ) ref -> registers class-hnd
; V09 loc3 [V09,T22] ( 4, 25 ) double -> mm0 single-def
; V10 loc4 [V10,T11] ( 4, 13 ) int -> rdx
; V11 loc5 [V11,T14] ( 2, 8 ) ref -> r14 class-hnd
; V12 loc6 [V12,T00] ( 13,196 ) int -> rax
; V13 loc7 [V13,T17] ( 5, 80 ) simd32 -> mm1 ld-addr-op
; V14 loc8 [V14,T18] ( 3, 48 ) simd32 -> mm2 ld-addr-op
;* V15 loc9 [V15 ] ( 0, 0 ) simd32 -> zero-ref ld-addr-op
; V16 loc10 [V16,T19] ( 2, 32 ) simd32 -> mm1 ld-addr-op
; V17 OutArgs [V17 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V18 tmp1 [V18,T25] ( 2, 16 ) double -> mm1 "Strict ordering of exceptions for Array store"
; V19 tmp2 [V19,T26] ( 2, 16 ) double -> mm1 "Strict ordering of exceptions for Array store"
;* V20 tmp3 [V20 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
; V21 tmp4 [V21,T01] ( 5,160 ) int -> r8 "Inlining Arg"
;* V22 tmp5 [V22 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
; V23 tmp6 [V23,T02] ( 5,160 ) int -> r8 "Inlining Arg"
;* V24 tmp7 [V24 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V25 tmp8 [V25 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V26 tmp9 [V26 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg"
; V27 cse0 [V27,T23] ( 5, 20 ) double -> mm1 "CSE - moderate"
; V28 cse1 [V28,T27] ( 3, 12 ) double -> mm3 "CSE - moderate"
; V29 cse2 [V29,T28] ( 3, 12 ) double -> mm3 "CSE - moderate"
; V30 cse3 [V30,T24] ( 5, 20 ) double -> mm1 "CSE - moderate"
; V31 cse4 [V31,T09] ( 4, 16 ) long -> r8 "CSE - moderate"
; V32 cse5 [V32,T03] ( 12,132 ) int -> rcx "CSE - aggressive"
; V33 cse6 [V33,T10] ( 4, 16 ) int -> rax "CSE - moderate"
; V34 cse7 [V34,T05] ( 6, 60 ) int -> registers "CSE - aggressive"
; V35 cse8 [V35,T07] ( 3, 21 ) int -> rbp "CSE - moderate"
; V36 cse9 [V36,T12] ( 3, 12 ) int -> r10 "CSE - moderate"
; V37 cse10 [V37,T29] ( 4, 10 ) double -> mm8 "CSE - moderate"
; V38 rat0 [V38,T30] ( 2, 4 ) double -> mm0 "argument with side effect"
; V39 rat1 [V39,T31] ( 2, 4 ) double -> mm1 "argument with side effect"
;
; Lcl frame size = 88
G_M52065_IG01:
push r15
push r14
push rdi
push rsi
push rbp
push rbx
sub rsp, 88
vzeroupper
vmovaps xmmword ptr [rsp+40H], xmm6
vmovaps xmmword ptr [rsp+30H], xmm7
vmovaps xmmword ptr [rsp+20H], xmm8
mov edi, ecx
mov esi, edx
vmovaps xmm6, xmm2
vmovaps xmm7, xmm3
mov rbx, gword ptr [rsp+B8H]
;; size=53 bbWeight=1 PerfScore 15.25
G_M52065_IG02:
mov edx, esi
sar edx, 31
and edx, 3
add edx, esi
and edx, -4
mov ecx, esi
sub ecx, edx
mov edx, ecx
neg edx
lea ebp, [rdx+rsi+04H]
movsxd rdx, ebp
mov rcx, 0xD1FFAB1E ; double[]
call CORINFO_HELP_NEWARR_1_VC
mov r14, rax
movsxd rdx, ebp
mov rcx, 0xD1FFAB1E ; double[]
call CORINFO_HELP_NEWARR_1_VC
mov r15, rax
mov r8d, dword ptr [rbx+08H]
mov rcx, rbx
mov rdx, r15
call [System.Array:Copy(System.Array,System.Array,int)]
vmulsd xmm0, xmm7, qword ptr [rsp+B0H]
vdivsd xmm0, xmm0, xmm6
vmovsd xmm8, qword ptr [reloc @RWD00]
vmovaps xmm1, xmm8
call System.Math:Pow(double,double):double
xor edx, edx
test edi, edi
jle G_M52065_IG08
add ebp, -3
;; size=127 bbWeight=1 PerfScore 35.50
G_M52065_IG03:
mov eax, 1
cmp ebp, 1
jle G_M52065_IG06
align [0 bytes for IG04]
;; size=14 bbWeight=4 PerfScore 6.00
G_M52065_IG04:
test eax, eax
jl G_M52065_IG11
mov ecx, dword ptr [r15+08H]
mov r8d, ecx
sub r8d, eax
cmp r8d, 4
jl G_M52065_IG11
cmp eax, ecx
jae G_M52065_IG14
mov r8d, eax
vmovupd ymm1, ymmword ptr[r15+8*r8+10H]
lea r8d, [rax-01H]
test r8d, r8d
jl G_M52065_IG11
mov r9d, ecx
sub r9d, r8d
cmp r9d, 4
jl G_M52065_IG11
cmp r8d, ecx
jae G_M52065_IG14
mov r8d, r8d
vmovupd ymm2, ymmword ptr[r15+8*r8+10H]
lea r8d, [rax+01H]
test r8d, r8d
jl G_M52065_IG11
mov r9d, ecx
sub r9d, r8d
cmp r9d, 4
jl G_M52065_IG11
vdivsd xmm3, xmm7, xmm6
vbroadcastsd ymm3, ymm3
vmulpd ymm3, ymm1, ymm3
vsubpd ymm4, ymm1, ymm2
vmulpd ymm3, ymm3, ymm4
vsubpd ymm3, ymm1, ymm3
cmp r8d, ecx
jae G_M52065_IG14
mov ecx, r8d
vmovupd ymm4, ymmword ptr[r15+8*rcx+10H]
vmulpd ymm1, ymm1, ymmword ptr[reloc @RWD32]
vsubpd ymm1, ymm4, ymm1
vaddpd ymm1, ymm1, ymm2
vbroadcastsd ymm2, ymm0
vmulpd ymm1, ymm2, ymm1
vaddpd ymm1, ymm3, ymm1
mov ecx, dword ptr [r14+08H]
cmp ecx, eax
jbe G_M52065_IG12
sub ecx, eax
cmp ecx, 4
jl G_M52065_IG13
mov ecx, eax
;; size=221 bbWeight=16 PerfScore 1304.00
G_M52065_IG05:
vmovupd ymmword ptr[r14+8*rcx+10H], ymm1
add eax, 4
cmp eax, ebp
jl G_M52065_IG04
;; size=18 bbWeight=16 PerfScore 56.00
G_M52065_IG06:
mov ecx, dword ptr [r15+08H]
test ecx, ecx
je G_M52065_IG14
vmovsd xmm1, qword ptr [r15+10H]
vmulsd xmm2, xmm1, xmm7
vdivsd xmm2, xmm2, xmm6
lea eax, [rsi-01H]
cmp eax, ecx
jae G_M52065_IG14
mov r8d, eax
vmovsd xmm3, qword ptr [r15+8*r8+10H]
vsubsd xmm4, xmm1, xmm3
vmulsd xmm2, xmm2, xmm4
vsubsd xmm2, xmm1, xmm2
cmp ecx, 1
jbe G_M52065_IG14
vmovsd xmm4, qword ptr [r15+18H]
vmulsd xmm1, xmm1, xmm8
vsubsd xmm1, xmm4, xmm1
vaddsd xmm1, xmm1, xmm3
vmulsd xmm1, xmm1, xmm0
vaddsd xmm1, xmm2, xmm1
mov r9d, dword ptr [r14+08H]
test r9d, r9d
je G_M52065_IG14
vmovsd qword ptr [r14+10H], xmm1
vmovsd xmm1, qword ptr [r15+8*r8+10H]
vmulsd xmm2, xmm1, xmm7
vdivsd xmm2, xmm2, xmm6
lea r10d, [rsi-02H]
cmp r10d, ecx
jae G_M52065_IG14
mov ecx, r10d
vmovsd xmm3, qword ptr [r15+8*rcx+10H]
vsubsd xmm4, xmm1, xmm3
vmulsd xmm2, xmm2, xmm4
vsubsd xmm2, xmm1, xmm2
vmovsd xmm4, qword ptr [r15+10H]
vmulsd xmm1, xmm1, xmm8
vsubsd xmm1, xmm4, xmm1
vaddsd xmm1, xmm1, xmm3
vmulsd xmm1, xmm1, xmm0
vaddsd xmm1, xmm2, xmm1
cmp eax, r9d
jae SHORT G_M52065_IG14
vmovsd qword ptr [r14+8*r8+10H], xmm1
inc edx
cmp edx, edi
jl SHORT G_M52065_IG10
;; size=209 bbWeight=4 PerfScore 482.00
G_M52065_IG07:
mov r15, r14
;; size=3 bbWeight=2 PerfScore 0.50
G_M52065_IG08:
mov rax, r15
;; size=3 bbWeight=1 PerfScore 0.25
G_M52065_IG09:
vmovaps xmm6, xmmword ptr [rsp+40H]
vmovaps xmm7, xmmword ptr [rsp+30H]
vmovaps xmm8, xmmword ptr [rsp+20H]
vzeroupper
add rsp, 88
pop rbx
pop rbp
pop rsi
pop rdi
pop r14
pop r15
ret
;; size=34 bbWeight=1 PerfScore 17.25
G_M52065_IG10:
xchg r14, r15
jmp G_M52065_IG03
;; size=8 bbWeight=2 PerfScore 6.00
G_M52065_IG11:
call [System.ThrowHelper:ThrowArgumentOutOfRange_IndexMustBeLessOrEqualException()]
int3
;; size=7 bbWeight=0 PerfScore 0.00
G_M52065_IG12:
call [System.ThrowHelper:ThrowStartIndexArgumentOutOfRange_ArgumentOutOfRange_IndexMustBeLess()]
int3
;; size=7 bbWeight=0 PerfScore 0.00
G_M52065_IG13:
call [System.ThrowHelper:ThrowArgumentException_DestinationTooShort()]
int3
;; size=7 bbWeight=0 PerfScore 0.00
G_M52065_IG14:
call CORINFO_HELP_RNGCHKFAIL
int3
;; size=6 bbWeight=0 PerfScore 0.00
RWD00 dq 4000000000000000h ; 2
RWD08 dd 00000000h, 00000000h, 00000000h, 00000000h, 00000000h, 00000000h
RWD32 dq 4000000000000000h, 4000000000000000h, 4000000000000000h, 4000000000000000h
; Total bytes of code 717, prolog size 53, PerfScore 1994.45, instruction count 180, allocated bytes for code 717 (MethodHash=2f54349e) for method Burgers:GetCalculated3(int,int,double,double,double,double[]):double[]
; ============================================================
Unwind Info:
>> Start offset : 0x000000 (not in unwind data)
>> End offset : 0xd1ffab1e (not in unwind data)
Version : 1
Flags : 0x00
SizeOfProlog : 0x21
CountOfUnwindCodes: 13
FrameRegister : none (0)
FrameOffset : N/A (no FrameRegister) (Value=0)
UnwindCodes :
CodeOffset: 0x21 UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM8 (8)
Scaled Small Offset: 2 * 16 = 32 = 0x00020
CodeOffset: 0x1B UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM7 (7)
Scaled Small Offset: 3 * 16 = 48 = 0x00030
CodeOffset: 0x15 UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6)
Scaled Small Offset: 4 * 16 = 64 = 0x00040
CodeOffset: 0x0C UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 10 * 8 + 8 = 88 = 0x58
CodeOffset: 0x08 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbx (3)
CodeOffset: 0x07 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbp (5)
CodeOffset: 0x06 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rsi (6)
CodeOffset: 0x05 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rdi (7)
CodeOffset: 0x04 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: r14 (14)
CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: r15 (15) |
We have some significant diffs that result in 6 bytes less of codegen. It looks like the main cost difference is we have two hoisted broadcasts in the "before" case where-as we don't in the new one. This is a hoist of We have the same number of |
This basically comes down to the following "minimal" repro: private static (int, int) Load(int[] array, int index)
{
if ((index < 0) || ((array.Length - index) < 2))
{
throw new ArgumentOutOfRangeException();
}
return (array[index + 0], array[index + 1]);
} When this was intrinsic, we created a Given such a pattern is fairly common/typical, it would likely be beneficial for us to either recognize/handle such sequences or even optimize them down to |
Going to close this issue as "by design" given the other improvements and the relatively simple workaround users have (they can use I've opened #80256 to track the broader issue around manual bounds checks not getting recognized/optimized. |
Run Information
Regressions in System.Text.Encodings.Web.Tests.Perf_Encoders
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf16(arguments: JavaScript,no escaping required,512)
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Regressions in PerfLabTests.CastingPerf2.CastingPerf
Test Report
Repro
Payloads
Baseline
Compare
Histogram
PerfLabTests.CastingPerf2.CastingPerf.ScalarValueTypeObj
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Text.Json.Tests.Utf8JsonReaderCommentsTests
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing(CommentHandling: Skip, SegmentSize: 100, TestCase: ShortSingleLine)
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Linq.Tests.Perf_Enumerable
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Linq.Tests.Perf_Enumerable.WhereLast_LastElementMatches(input: IEnumerable)
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Numerics.Tests.Perf_BigInteger
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Numerics.Tests.Perf_BigInteger.Add(arguments: 1024,1024 bits)
Description of detection logic
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Perf_Convert
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Perf_Convert.ChangeType
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Collections.IterateForEach<String>
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Collections.IterateForEach<String>.List(Size: 512)
Description of detection logic
Description of detection logic
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: