Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Enable EVEX embedded broadcast in more places #109258

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Oct 26, 2024

Factored out the containment checks and included an attempt to use embedded broadcast where possible.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 26, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 26, 2024
@saucecontrol saucecontrol changed the title Enable EVEX embedded broadcast in more places JIT: Enable EVEX embedded broadcast in more places Oct 27, 2024
@saucecontrol saucecontrol marked this pull request as ready for review October 27, 2024 01:19
@saucecontrol
Copy link
Member Author

I expected there might be a couple of diffs from this one. Wild guess is that spmi failed because the embedded encoding diffs would have something like qword ptr [reloc @RWD00] {1to8} in the asm, which may have fouled up the json parsing.

@MichalPetryka
Copy link
Contributor

@MihuBot

@saucecontrol
Copy link
Member Author

@MihuBot -intel

@tannergooding
Copy link
Member

CC. @dotnet/jit-contrib for secondary review

@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Oct 28, 2024
@JulieLeeMSFT
Copy link
Member

@BruceForstall PTAL for code review.

@BruceForstall
Copy link
Member

/azp run runtime-coreclr superpmi-diffs

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@saucecontrol
Copy link
Member Author

Ah, you fixed spmi 🎉

ASM diffs generated on windows x64

linux arm64

Diffs are based on 2,324,354 contexts (942,241 MinOpts, 1,382,113 FullOpts).

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
coreclr_tests.run.linux.arm64.checked.mch 670,230 419,865 250,365 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.arm64.checked.mch 283,878 6 283,872 0 (0.00%) 0 (0.00%)
libraries_tests.run.linux.arm64.Release.mch 783,707 522,351 261,356 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.arm64.checked.mch 283,878 6 283,872 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.arm64.checked.mch 283,878 6 283,872 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.linux.arm64.checked.mch 18,783 7 18,776 0 (0.00%) 0 (0.00%)
2,324,354 942,241 1,382,113 0 (0.00%) 0 (0.00%)

linux x64

Diffs are based on 2,352,092 contexts (925,689 MinOpts, 1,426,403 FullOpts).

Overall (+39 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
coreclr_tests.run.linux.x64.checked.mch 400,125,258 +3 0.00%
libraries.pmi.linux.x64.checked.mch 59,499,926 +0 0.00%
libraries_tests.run.linux.x64.Release.mch 372,437,145 +12 0.00%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 144,246,049 +24 0.00%
libraries.pmi.linux.x64.checked.mch 59,499,926 +0 0.00%
smoke_tests.nativeaot.linux.x64.checked.mch 4,141,216 +0 0.00%
MinOpts (+12 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
coreclr_tests.run.linux.x64.checked.mch 278,559,249 +0 0.00%
libraries_tests.run.linux.x64.Release.mch 201,259,057 +12 0.00%
FullOpts (+27 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
coreclr_tests.run.linux.x64.checked.mch 121,566,009 +3 0.00%
libraries.pmi.linux.x64.checked.mch 59,387,697 +0 0.00%
libraries_tests.run.linux.x64.Release.mch 171,178,088 +0 0.00%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 133,315,529 +24 0.00%
libraries.pmi.linux.x64.checked.mch 59,387,697 +0 0.00%
smoke_tests.nativeaot.linux.x64.checked.mch 4,140,186 +0 0.00%
Example diffs
coreclr_tests.run.linux.x64.checked.mch
+0 (0.00%) : 1629.dasm - System.Number+Grisu3:GetCachedPowerForBinaryExponentRange(int,int,byref):System.Number+DiyFp (Instrumented Tier1)
@@ -96,7 +96,7 @@ G_M18819_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=8000 {r15}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si edi, xmm0 vpbroadcastd xmm0, edi
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd r13d, xmm0 add r13d, 347 mov r12d, r13d @@ -181,7 +181,7 @@ RWD00 dq 3FD34413509F79FFh ; 0.301029996 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 338, prolog size 19, PerfScore 90.98, instruction count 74, allocated bytes for code 339 (MethodHash=763bb67c) for method System.Number+Grisu3:GetCachedPowerForBinaryExponentRange(int,int,byref):System.Number+DiyFp (Instrumented Tier1)
+0 (0.00%) : 6145.dasm - System.Number:Dragon4(ulong,int,uint,ubyte,int,ubyte,System.Span`1[ubyte],byref):uint (Tier1)
@@ -264,7 +264,7 @@ G_M41408_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=2000 {r13}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si edi, xmm0 vpbroadcastd xmm0, edi
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd eax, xmm0 test eax, eax jg G_M41408_IG68 @@ -1084,7 +1084,7 @@ RWD00 dq 3FD34413509F79FFh ; 0.301029996 RWD08 dq 3FE6147AE147AE14h ; 0.69 RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 3168, prolog size 66, PerfScore 367.76, instruction count 625, allocated bytes for code 3169 (MethodHash=ef215e3f) for method System.Number:Dragon4(ulong,int,uint,ubyte,int,ubyte,System.Span`1[ubyte],byref):uint (Tier1)
+0 (0.00%) : 6305.dasm - testout1:Func_0_1_2(testout1+VT_0_1_2):long (MinOpts)
@@ -268,7 +268,7 @@ G_M28902_IG07: ; bbWeight=1, extend vmovsd xmm0, qword ptr [rbp-0x100] vcvttsd2si rax, xmm0 vpbroadcastq xmm0, rax
- vpblendmq xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmq xmm0 {k1}, xmm0, qword ptr [reloc @RWD32] {1to2}
vmovd rax, xmm0 ;; size=344 bbWeight=1 PerfScore 132.00 G_M28902_IG08: ; bbWeight=1, extend @@ -317,7 +317,7 @@ G_M28902_IG11: ; bbWeight=1, epilog, nogc, extend ;; size=6 bbWeight=1 PerfScore 2.00 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 43E0000000000000h, 43E0000000000000h
-RWD32 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh
+RWD32 dq 7FFFFFFFFFFFFFFFh
; Total bytes of code 1027, prolog size 88, PerfScore 297.75, instruction count 197, allocated bytes for code 1028 (MethodHash=f9ab8f19) for method testout1:Func_0_1_2(testout1+VT_0_1_2):long (MinOpts)
+0 (0.00%) : 556112.dasm - TestApp:test_3_5(double):double (FullOpts)
@@ -55,7 +55,7 @@ G_M29212_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si edx, xmm0 vpbroadcastd xmm0, edx
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd edx, xmm0 lea r8d, [rdx-0x01] mov r9d, r8d @@ -144,7 +144,7 @@ G_M29212_IG05: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { ;; size=6 bbWeight=0 PerfScore 0.00 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD32 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD32 dd 7FFFFFFFh
; Total bytes of code 315, prolog size 1, PerfScore 130.25, instruction count 85, allocated bytes for code 316 (MethodHash=82508de3) for method TestApp:test_3_5(double):double (FullOpts)
+2 (+0.16%) : 514325.dasm - Packet256Tracer:CreateDefaultScene():Scene (FullOpts)
@@ -419,7 +419,7 @@ G_M36222_IG04: ; bbWeight=1, extend vmovups ymm0, ymmword ptr [reloc @RWD224] vmovups ymm1, ymmword ptr [reloc @RWD256] vmovups ymm2, ymmword ptr [reloc @RWD288]
- vsqrtps ymm3, ymmword ptr [reloc @RWD320]
+ vsqrtps ymm3, dword ptr [reloc @RWD320] {1to8}
vdivps ymm0, ymm0, ymm3 vdivps ymm1, ymm1, ymm3 vdivps ymm2, ymm2, ymm3 @@ -446,7 +446,7 @@ G_M36222_IG04: ; bbWeight=1, extend vaddps ymm4, ymm8, ymm4 vsqrtps ymm4, ymm4 vdivps ymm7, ymm7, ymm4
- ;; size=348 bbWeight=1 PerfScore 196.42
+ ;; size=350 bbWeight=1 PerfScore 196.42
G_M36222_IG05: ; bbWeight=1, extend vdivps ymm5, ymm5, ymm4 vdivps ymm3, ymm3, ymm4 @@ -572,14 +572,16 @@ RWD208 dq 3F4CCCCD3F4CCCCDh, 3F4CCCCD3F4CCCCDh RWD224 dq C0566666C0566666h, C0566666C0566666h, C0566666C0566666h, C0566666C0566666h RWD256 dq BFC00000BFC00000h, BFC00000BFC00000h, BFC00000BFC00000h, BFC00000BFC00000h RWD288 dq C0700000C0700000h, C0700000C0700000h, C0700000C0700000h, C0700000C0700000h
-RWD320 dq 41DC47AE41DC47AEh, 41DC47AE41DC47AEh, 41DC47AE41DC47AEh, 41DC47AE41DC47AEh
+RWD320 dd 41DC47AEh ; 27.535 +RWD324 dd 00000000h, 00000000h, 00000000h, 00000000h, 00000000h, 00000000h + dd 00000000h
RWD352 dq BF800000BF800000h, BF800000BF800000h, BF800000BF800000h, BF800000BF800000h RWD384 dq 3FC000003FC00000h, 3FC000003FC00000h, 3FC000003FC00000h, 3FC000003FC00000h RWD416 dq 4030000040300000h, 4030000040300000h RWD432 dq 4070000040700000h, 4070000040700000h
-; Total bytes of code 1252, prolog size 23, PerfScore 507.50, instruction count 217, allocated bytes for code 1252 (MethodHash=a0627281) for method Packet256Tracer:CreateDefaultScene():Scene (FullOpts)
+; Total bytes of code 1254, prolog size 23, PerfScore 507.50, instruction count 217, allocated bytes for code 1254 (MethodHash=a0627281) for method Packet256Tracer:CreateDefaultScene():Scene (FullOpts)
; ============================================================ Unwind Info:
+1 (+0.18%) : 505847.dasm - VectorMathTests.Program:TestEntryPoint():int (FullOpts)
@@ -86,7 +86,7 @@ G_M13424_IG04: ; bbWeight=1, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, byr ;; size=23 bbWeight=1 PerfScore 6.50 G_M13424_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ; gcrRegs -[rax]
- vpabsd ymm0, ymmword ptr [reloc @RWD00]
+ vpabsd ymm0, dword ptr [reloc @RWD00] {1to8}
vpextrd edi, xmm0, 3 cmp edi, 1 jne G_M13424_IG19 @@ -98,19 +98,19 @@ G_M13424_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr vmovups ymmword ptr [rbp-0x30], ymm0 cmp qword ptr [rbp-0x28], 11 jne G_M13424_IG19
- vmovsd xmm0, qword ptr [reloc @RWD32]
+ vmovsd xmm0, qword ptr [reloc @RWD08]
vmovsd qword ptr [rbp-0x50], xmm0 vmovsd qword ptr [rbp-0x48], xmm0 vmovsd qword ptr [rbp-0x40], xmm0 vmovsd qword ptr [rbp-0x38], xmm0 vmovups ymm0, ymmword ptr [rbp-0x50]
- vandpd ymm0, ymm0, qword ptr [reloc @RWD40] {1to4}
+ vandpd ymm0, ymm0, qword ptr [reloc @RWD16] {1to4}
vmovups ymmword ptr [rbp-0x50], ymm0 vmovsd xmm0, qword ptr [rbp-0x50]
- vucomisd xmm0, qword ptr [reloc @RWD48]
+ vucomisd xmm0, qword ptr [reloc @RWD24]
jp G_M13424_IG19 jne G_M13424_IG19
- ;; size=155 bbWeight=0.50 PerfScore 19.62
+ ;; size=156 bbWeight=0.50 PerfScore 19.62
G_M13424_IG06: ; bbWeight=0.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref xor ebx, ebx ;; size=2 bbWeight=0.25 PerfScore 0.06 @@ -144,10 +144,10 @@ G_M13424_IG10: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=21 bbWeight=4 PerfScore 13.00 G_M13424_IG11: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70]
- vandps ymm0, ymm0, dword ptr [reloc @RWD56] {1to8}
+ vandps ymm0, ymm0, dword ptr [reloc @RWD32] {1to8}
vmovups ymmword ptr [rbp-0x70], ymm0 vmovss xmm0, dword ptr [rbp-0x64]
- vucomiss xmm0, dword ptr [reloc @RWD60]
+ vucomiss xmm0, dword ptr [reloc @RWD36]
jp G_M13424_IG19 jne G_M13424_IG19 ;; size=45 bbWeight=0.50 PerfScore 8.00 @@ -210,15 +210,16 @@ G_M13424_IG20: ; bbWeight=0.50, epilog, nogc, extend pop rbp ret ;; size=13 bbWeight=0.50 PerfScore 1.62
-RWD00 dq 0000000100000001h, 0000000100000001h, 0000000100000001h, 0000000100000001h -RWD32 dq C059000000000000h ; -100 -RWD40 dq 7FFFFFFFFFFFFFFFh ; nan -RWD48 dq 4059000000000000h ; 100 -RWD56 dd 7FFFFFFFh ; nan -RWD60 dd 41B00000h ; 22
+RWD00 dd 00000001h +RWD04 dd 00000000h +RWD08 dq C059000000000000h ; -100 +RWD16 dq 7FFFFFFFFFFFFFFFh ; nan +RWD24 dq 4059000000000000h ; 100 +RWD32 dd 7FFFFFFFh ; nan +RWD36 dd 41B00000h ; 22
-; Total bytes of code 555, prolog size 53, PerfScore 117.48, instruction count 109, allocated bytes for code 555 (MethodHash=897bcb8f) for method VectorMathTests.Program:TestEntryPoint():int (FullOpts)
+; Total bytes of code 556, prolog size 53, PerfScore 117.48, instruction count 109, allocated bytes for code 556 (MethodHash=897bcb8f) for method VectorMathTests.Program:TestEntryPoint():int (FullOpts)
; ============================================================ Unwind Info:
libraries.pmi.linux.x64.checked.mch
+0 (0.00%) : 9181.dasm - Microsoft.FSharp.Collections.ArrayModule:RandomChoiceBy[short](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],short[]):short (FullOpts)
@@ -51,7 +51,7 @@ G_M8967_IG02: ; bbWeight=1, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd eax, xmm0 cmp eax, r15d jae G_M8967_IG06 @@ -119,7 +119,7 @@ G_M8967_IG06: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=6 bbWeight=0 PerfScore 0.00 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD32 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD32 dd 7FFFFFFFh
; Total bytes of code 263, prolog size 12, PerfScore 50.33, instruction count 61, allocated bytes for code 264 (MethodHash=9b04dcf8) for method Microsoft.FSharp.Collections.ArrayModule:RandomChoiceBy[short](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],short[]):short (FullOpts)
+0 (0.00%) : 9185.dasm - Microsoft.FSharp.Collections.ArrayModule:RandomChoiceBy[long](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],long[]):long (FullOpts)
@@ -51,7 +51,7 @@ G_M831_IG02: ; bbWeight=1, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, b vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd eax, xmm0 cmp eax, r15d jae G_M831_IG06 @@ -119,7 +119,7 @@ G_M831_IG06: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=6 bbWeight=0 PerfScore 0.00 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD32 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD32 dd 7FFFFFFFh
; Total bytes of code 262, prolog size 12, PerfScore 48.33, instruction count 61, allocated bytes for code 263 (MethodHash=e694fcc0) for method Microsoft.FSharp.Collections.ArrayModule:RandomChoiceBy[long](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],long[]):long (FullOpts)
+0 (0.00%) : 9201.dasm - Microsoft.FSharp.Collections.ArrayModule:RandomSampleBy[System.__Canon](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],int,System.__Canon[]):System.__Canon[] (FullOpts)
@@ -155,7 +155,7 @@ G_M49895_IG07: ; bbWeight=4, gcrefRegs=8008 {rbx r15}, byrefRegs=0000 {}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si edx, xmm0 vpbroadcastd xmm0, edx
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd ecx, xmm0 mov edx, ecx mov ecx, dword ptr [r15+0x08] @@ -245,7 +245,7 @@ G_M49895_IG13: ; bbWeight=4, gcrefRegs=A008 {rbx r13 r15}, byrefRegs=0000 vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si edx, xmm0 vpbroadcastd xmm0, edx
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd eax, xmm0 lea rdx, [rbp-0x48] mov rdi, r13 @@ -459,7 +459,7 @@ G_M49895_IG19: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=6 bbWeight=0 PerfScore 0.00 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD32 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD32 dd 7FFFFFFFh
; Total bytes of code 1112, prolog size 33, PerfScore 470.78, instruction count 252, allocated bytes for code 1114 (MethodHash=1a363d18) for method Microsoft.FSharp.Collections.ArrayModule:RandomSampleBy[System.__Canon](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],int,System.__Canon[]):System.__Canon[] (FullOpts)
+0 (0.00%) : 275964.dasm - System.Threading.Barrier:SignalAndWait(System.TimeSpan,System.Threading.CancellationToken):ubyte:this (FullOpts)
@@ -59,7 +59,7 @@ G_M43881_IG04: ; bbWeight=1, gcrefRegs=0084 {rdx rdi}, byrefRegs=0000 {}, vcmppd k1, xmm1, xmmword ptr [reloc @RWD48], 13 vcvttsd2si rsi, xmm1 vpbroadcastq xmm1, rsi
- vpblendmq xmm1 {k1}, xmm1, xmmword ptr [reloc @RWD64]
+ vpblendmq xmm1 {k1}, xmm1, qword ptr [reloc @RWD64] {1to2}
vmovd rsi, xmm1 cmp rsi, -1 jl G_M43881_IG12 @@ -78,7 +78,7 @@ G_M43881_IG06: ; bbWeight=1, gcrefRegs=0084 {rdx rdi}, byrefRegs=0000 {}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD80], 13 vcvttsd2si esi, xmm0 vpbroadcastd xmm0, esi
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD96]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD96] {1to4}
vmovd esi, xmm0 call [<unknown method>] ; gcrRegs -[rdx rdi] @@ -149,9 +149,10 @@ RWD16 dq C30A36E2EB1C4328h ; -9.22337204e+14 RWD24 dd 00000000h, 00000000h RWD32 dq 0000000000000088h, 0000000000000000h RWD48 dq 43E0000000000000h, 43E0000000000000h
-RWD64 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh
+RWD64 dq 7FFFFFFFFFFFFFFFh +RWD72 dd 00000000h, 00000000h
RWD80 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD96 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD96 dd 7FFFFFFFh
; Total bytes of code 357, prolog size 12, PerfScore 88.33, instruction count 74, allocated bytes for code 359 (MethodHash=58165496) for method System.Threading.Barrier:SignalAndWait(System.TimeSpan,System.Threading.CancellationToken):ubyte:this (FullOpts)
+0 (0.00%) : 276656.dasm - System.Text.RegularExpressions.RegexRunner:g__ConfigureTimeout|24_0(System.TimeSpan):this (FullOpts)
@@ -46,7 +46,7 @@ G_M45711_IG04: ; bbWeight=1, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD48], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD64]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4}
vmovd dword ptr [rbx+0x64], xmm0 call <unknown method> movsxd rcx, dword ptr [rbx+0x64] @@ -73,7 +73,7 @@ RWD16 dq C30A36E2EB1C4328h ; -9.22337204e+14 RWD24 dq 3FE0000000000000h ; 0.5 RWD32 dq 0000000000000088h, 0000000000000000h RWD48 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD64 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD64 dd 7FFFFFFFh
; Total bytes of code 154, prolog size 8, PerfScore 65.58, instruction count 33, allocated bytes for code 155 (MethodHash=c23a4d70) for method System.Text.RegularExpressions.RegexRunner:<InitializeTimeout>g__ConfigureTimeout|24_0(System.TimeSpan):this (FullOpts)
+0 (0.00%) : 279404.dasm - Microsoft.CodeAnalysis.Collections.SegmentedList`1[ubyte]:TrimExcess():this (FullOpts)
@@ -30,7 +30,7 @@ G_M63820_IG02: ; bbWeight=1, gcrefRegs=0080 {rdi}, byrefRegs=0000 {}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd eax, xmm0 cmp esi, eax jl SHORT G_M63820_IG04 @@ -47,7 +47,7 @@ RWD00 dq 3FECCCCCCCCCCCCDh ; 0.9 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 77, prolog size 0, PerfScore 38.08, instruction count 14, allocated bytes for code 78 (MethodHash=1c0206b3) for method Microsoft.CodeAnalysis.Collections.SegmentedList`1[ubyte]:TrimExcess():this (FullOpts)
libraries_tests.run.linux.x64.Release.mch
+0 (0.00%) : 27377.dasm - System.Threading.Tasks.Task:Delay(System.TimeSpan,System.Threading.CancellationToken):System.Threading.Tasks.Task (Instrumented Tier1)
@@ -117,7 +117,7 @@ G_M56119_IG07: ; bbWeight=1, gcrefRegs=8008 {rbx r15}, byrefRegs=0000 {}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD48], 13 vcvttsd2si rdi, xmm0 vpbroadcastq xmm0, rdi
- vpblendmq xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD64]
+ vpblendmq xmm0 {k1}, xmm0, qword ptr [reloc @RWD64] {1to2}
vmovd r14, xmm0 cmp r14, -1 jl G_M56119_IG32 @@ -378,7 +378,7 @@ RWD16 dq C30A36E2EB1C4328h ; -9.22337204e+14 RWD24 dd 00000000h, 00000000h RWD32 dq 0000000000000088h, 0000000000000000h RWD48 dq 43E0000000000000h, 43E0000000000000h
-RWD64 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh
+RWD64 dq 7FFFFFFFFFFFFFFFh
; Total bytes of code 715, prolog size 23, PerfScore 111.65, instruction count 163, allocated bytes for code 716 (MethodHash=29b424c8) for method System.Threading.Tasks.Task:Delay(System.TimeSpan,System.Threading.CancellationToken):System.Threading.Tasks.Task (Instrumented Tier1)
+0 (0.00%) : 27389.dasm - System.Threading.Tasks.Task:Delay(System.TimeSpan,System.TimeProvider,System.Threading.CancellationToken):System.Threading.Tasks.Task (Instrumented Tier1)
@@ -61,7 +61,7 @@ G_M63748_IG07: ; bbWeight=1, gcrefRegs=0044 {rdx rsi}, byrefRegs=0000 {}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD48], 13 vcvttsd2si rdi, xmm0 vpbroadcastq xmm0, rdi
- vpblendmq xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD64]
+ vpblendmq xmm0 {k1}, xmm0, qword ptr [reloc @RWD64] {1to2}
vmovd rdi, xmm0 cmp rdi, -1 jl SHORT G_M63748_IG10 @@ -100,7 +100,7 @@ RWD16 dq C30A36E2EB1C4328h ; -9.22337204e+14 RWD24 dd 00000000h, 00000000h RWD32 dq 0000000000000088h, 0000000000000000h RWD48 dq 43E0000000000000h, 43E0000000000000h
-RWD64 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh
+RWD64 dq 7FFFFFFFFFFFFFFFh
; Total bytes of code 194, prolog size 4, PerfScore 60.21, instruction count 42, allocated bytes for code 195 (MethodHash=d1bc06fb) for method System.Threading.Tasks.Task:Delay(System.TimeSpan,System.TimeProvider,System.Threading.CancellationToken):System.Threading.Tasks.Task (Instrumented Tier1)
+0 (0.00%) : 40657.dasm - Microsoft.CodeAnalysis.CSharp.Binder:DoUncheckedConversion(byte,Microsoft.CodeAnalysis.ConstantValue):System.Object (Instrumented Tier0)
@@ -3355,7 +3355,7 @@ G_M30586_IG202: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttsd2si eax, xmm0 ; gcrRegs -[rax] vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD816]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD816] {1to4}
vmovd eax, xmm0 mov rcx, gword ptr [rbp-0x178] ; gcrRegs +[rcx] @@ -3383,7 +3383,7 @@ G_M30586_IG203: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttsd2si eax, xmm0 ; gcrRegs -[rax] vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD816]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD816] {1to4}
vmovd eax, xmm0 mov rcx, gword ptr [rbp-0x188] ; gcrRegs +[rcx] @@ -3411,7 +3411,7 @@ G_M30586_IG204: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttsd2si eax, xmm0 ; gcrRegs -[rax] vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD816]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD816] {1to4}
vmovd eax, xmm0 mov rcx, gword ptr [rbp-0x168] ; gcrRegs +[rcx] @@ -3481,7 +3481,7 @@ G_M30586_IG207: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttsd2si eax, xmm0 ; gcrRegs -[rax] vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD816]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD816] {1to4}
vmovd eax, xmm0 mov rcx, gword ptr [rbp-0x180] ; gcrRegs +[rcx] @@ -3509,7 +3509,7 @@ G_M30586_IG208: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttsd2si eax, xmm0 ; gcrRegs -[rax] vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD816]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD816] {1to4}
vmovd eax, xmm0 mov rcx, gword ptr [rbp-0x170] ; gcrRegs +[rcx] @@ -3537,7 +3537,7 @@ G_M30586_IG209: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttsd2si eax, xmm0 ; gcrRegs -[rax] vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD816]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD816] {1to4}
vmovd eax, xmm0 mov rcx, gword ptr [rbp-0x160] ; gcrRegs +[rcx] @@ -3565,7 +3565,7 @@ G_M30586_IG210: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttsd2si rax, xmm0 ; gcrRegs -[rax] vpbroadcastq xmm0, rax
- vpblendmq xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD864]
+ vpblendmq xmm0 {k1}, xmm0, qword ptr [reloc @RWD864] {1to2}
vmovd rax, xmm0 mov rcx, gword ptr [rbp-0x150] ; gcrRegs +[rcx] @@ -3593,7 +3593,7 @@ G_M30586_IG211: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttsd2si eax, xmm0 ; gcrRegs -[rax] vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD816]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD816] {1to4}
vmovd eax, xmm0 mov rcx, gword ptr [rbp-0x110] ; gcrRegs +[rcx] @@ -3794,7 +3794,7 @@ G_M30586_IG221: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M30586_IG222: ; bbWeight=0.94, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref mov eax, dword ptr [rbp-0x764] mov eax, eax
- lea rcx, [reloc @RWD880]
+ lea rcx, [reloc @RWD872]
mov ecx, dword ptr [rcx+4*rax] lea rdx, G_M30586_IG02 add rcx, rdx @@ -4334,11 +4334,12 @@ RWD720 dd G_M30586_IG203 - G_M30586_IG02 RWD780 dd 00000000h RWD784 dq 0000000000000088h, 0000000000000000h RWD800 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD816 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD816 dd 7FFFFFFFh +RWD820 dd 00000000h, 00000000h, 00000000h
RWD832 dq 0000000008080088h, 0000000000000000h RWD848 dq 43E0000000000000h, 43E0000000000000h
-RWD864 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh -RWD880 dd G_M30586_IG225 - G_M30586_IG02
+RWD864 dq 7FFFFFFFFFFFFFFFh +RWD872 dd G_M30586_IG225 - G_M30586_IG02
dd G_M30586_IG229 - G_M30586_IG02 dd G_M30586_IG224 - G_M30586_IG02 dd G_M30586_IG230 - G_M30586_IG02
+0 (0.00%) : 817256.dasm - System.Threading.PortableThreadPool:AdjustMaxWorkersActive():this (Tier1)
@@ -264,11 +264,11 @@ G_M11351_IG15: ; bbWeight=0.88, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si rax, xmm0 vpbroadcastq xmm0, rax
- vpblendmq xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmq xmm0 {k1}, xmm0, qword ptr [reloc @RWD48] {1to2}
vmovd rax, xmm0 vxorps xmm0, xmm0, xmm0 vcvtsi2sd xmm0, xmm0, rax
- vdivsd xmm0, xmm0, qword ptr [reloc @RWD64]
+ vdivsd xmm0, xmm0, qword ptr [reloc @RWD56]
vmovsd qword ptr [rbp-0x30], xmm0 cmp dword ptr [(reloc)], 0 jne G_M11351_IG19 @@ -283,7 +283,7 @@ G_M11351_IG16: ; bbWeight=0.88, gcVars=0000000000000402 {V01 V39}, gcrefR vxorps xmm1, xmm1, xmm1 vcvtsi2sd xmm1, xmm1, eax vmovsd xmm0, qword ptr [rbp-0x30]
- vmulsd xmm2, xmm0, qword ptr [reloc @RWD72]
+ vmulsd xmm2, xmm0, qword ptr [reloc @RWD64]
vucomisd xmm2, xmm1 jb G_M11351_IG20 call <unknown method> @@ -567,9 +567,9 @@ RWD00 dq 3F847AE147AE147Bh ; 0.01 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 43E0000000000000h, 43E0000000000000h
-RWD48 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh -RWD64 dq 416312D000000000h ; 10000000 -RWD72 dq 408F400000000000h ; 1000
+RWD48 dq 7FFFFFFFFFFFFFFFh +RWD56 dq 416312D000000000h ; 10000000 +RWD64 dq 408F400000000000h ; 1000
; Total bytes of code 1167, prolog size 32, PerfScore 258.65, instruction count 283, allocated bytes for code 1168 (MethodHash=ea5cd3a8) for method System.Threading.PortableThreadPool:AdjustMaxWorkersActive():this (Tier1)
+6 (+4.26%) : 557002.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:ConvertToInt32NativeTest():this (Tier0)
@@ -21,23 +21,23 @@ G_M48973_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttss2si rax, dword ptr [reloc @RWD00] vpbroadcastd ymm0, eax vmovups ymmword ptr [rsp], ymm0
- vcvttps2dq ymm0, ymmword ptr [reloc @RWD32]
+ vcvttps2dq ymm0, dword ptr [reloc @RWD00] {1to8}
vmovups ymmword ptr [rsp+0x20], ymm0 call [<unknown method>]
- vcvttss2si rax, dword ptr [reloc @RWD64]
+ vcvttss2si rax, dword ptr [reloc @RWD04]
vpbroadcastd ymm0, eax vmovups ymmword ptr [rsp], ymm0
- vcvttps2dq ymm0, ymmword ptr [reloc @RWD96]
+ vcvttps2dq ymm0, dword ptr [reloc @RWD04] {1to8}
vmovups ymmword ptr [rsp+0x20], ymm0 call [<unknown method>]
- vcvttss2si rax, dword ptr [reloc @RWD128]
+ vcvttss2si rax, dword ptr [reloc @RWD08]
vpbroadcastd ymm0, eax vmovups ymmword ptr [rsp], ymm0
- vcvttps2dq ymm0, ymmword ptr [reloc @RWD160]
+ vcvttps2dq ymm0, dword ptr [reloc @RWD08] {1to8}
vmovups ymmword ptr [rsp+0x20], ymm0 call [<unknown method>] nop
- ;; size=118 bbWeight=1 PerfScore 60.25
+ ;; size=124 bbWeight=1 PerfScore 60.25
G_M48973_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 80 @@ -45,20 +45,11 @@ G_M48973_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=9 bbWeight=1 PerfScore 2.75 RWD00 dd FF7FFFFFh ; -3.40282e+38
-RWD04 dd 00000000h, 00000000h, 00000000h, 00000000h, 00000000h, 00000000h - dd 00000000h -RWD32 dq FF7FFFFFFF7FFFFFh, FF7FFFFFFF7FFFFFh, FF7FFFFFFF7FFFFFh, FF7FFFFFFF7FFFFFh -RWD64 dd 40266666h ; 2.6 -RWD68 dd 00000000h, 00000000h, 00000000h, 00000000h, 00000000h, 00000000h - dd 00000000h -RWD96 dq 4026666640266666h, 4026666640266666h, 4026666640266666h, 4026666640266666h -RWD128 dd 7F7FFFFFh ; 3.40282e+38 -RWD132 dd 00000000h, 00000000h, 00000000h, 00000000h, 00000000h, 00000000h - dd 00000000h -RWD160 dq 7F7FFFFF7F7FFFFFh, 7F7FFFFF7F7FFFFFh, 7F7FFFFF7F7FFFFFh, 7F7FFFFF7F7FFFFFh
+RWD04 dd 40266666h ; 2.6 +RWD08 dd 7F7FFFFFh ; 3.40282e+38
-; Total bytes of code 141, prolog size 10, PerfScore 65.75, instruction count 27, allocated bytes for code 141 (MethodHash=f33940b2) for method System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:ConvertToInt32NativeTest():this (Tier0)
+; Total bytes of code 147, prolog size 10, PerfScore 65.75, instruction count 27, allocated bytes for code 147 (MethodHash=f33940b2) for method System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:ConvertToInt32NativeTest():this (Tier0)
; ============================================================ Unwind Info:
+6 (+4.35%) : 557115.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector128Tests:ConvertToInt32NativeTest():this (Tier0)
@@ -21,40 +21,34 @@ G_M999_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttss2si rax, dword ptr [reloc @RWD00] vpbroadcastd xmm0, eax vmovups xmmword ptr [rsp], xmm0
- vcvttps2dq xmm0, xmmword ptr [reloc @RWD16]
+ vcvttps2dq xmm0, dword ptr [reloc @RWD00] {1to4}
vmovups xmmword ptr [rsp+0x10], xmm0 call [<unknown method>]
- vcvttss2si rax, dword ptr [reloc @RWD32]
+ vcvttss2si rax, dword ptr [reloc @RWD04]
vpbroadcastd xmm0, eax vmovups xmmword ptr [rsp], xmm0
- vcvttps2dq xmm0, xmmword ptr [reloc @RWD48]
+ vcvttps2dq xmm0, dword ptr [reloc @RWD04] {1to4}
vmovups xmmword ptr [rsp+0x10], xmm0 call [<unknown method>]
- vcvttss2si rax, dword ptr [reloc @RWD64]
+ vcvttss2si rax, dword ptr [reloc @RWD08]
vpbroadcastd xmm0, eax vmovups xmmword ptr [rsp], xmm0
- vcvttps2dq xmm0, xmmword ptr [reloc @RWD80]
+ vcvttps2dq xmm0, dword ptr [reloc @RWD08] {1to4}
vmovups xmmword ptr [rsp+0x10], xmm0 call [<unknown method>] nop
- ;; size=118 bbWeight=1 PerfScore 57.25
+ ;; size=124 bbWeight=1 PerfScore 57.25
G_M999_IG03: ; bbWeight=1, epilog, nogc, extend add rsp, 48 pop rbp ret ;; size=6 bbWeight=1 PerfScore 1.75 RWD00 dd FF7FFFFFh ; -3.40282e+38
-RWD04 dd 00000000h, 00000000h, 00000000h -RWD16 dq FF7FFFFFFF7FFFFFh, FF7FFFFFFF7FFFFFh -RWD32 dd 40266666h ; 2.6 -RWD36 dd 00000000h, 00000000h, 00000000h -RWD48 dq 4026666640266666h, 4026666640266666h -RWD64 dd 7F7FFFFFh ; 3.40282e+38 -RWD68 dd 00000000h, 00000000h, 00000000h -RWD80 dq 7F7FFFFF7F7FFFFFh, 7F7FFFFF7F7FFFFFh
+RWD04 dd 40266666h ; 2.6 +RWD08 dd 7F7FFFFFh ; 3.40282e+38
-; Total bytes of code 138, prolog size 10, PerfScore 61.75, instruction count 26, allocated bytes for code 138 (MethodHash=36e8fc18) for method System.Runtime.Intrinsics.Tests.Vectors.Vector128Tests:ConvertToInt32NativeTest():this (Tier0)
+; Total bytes of code 144, prolog size 10, PerfScore 61.75, instruction count 26, allocated bytes for code 144 (MethodHash=36e8fc18) for method System.Runtime.Intrinsics.Tests.Vectors.Vector128Tests:ConvertToInt32NativeTest():this (Tier0)
; ============================================================ Unwind Info:
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch
+0 (0.00%) : 805.dasm - Xunit.DelegatingLongRunningTestDetectionSink:ThreadWorker():this (FullOpts)
@@ -69,7 +69,7 @@ G_M23588_IG04: ; bbWeight=1, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD64], 13 vcvttsd2si edi, xmm0 vpbroadcastd xmm0, edi
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD80]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD80] {1to4}
vmovd r15d, xmm0 mov r14, qword ptr [rbx] mov r13, qword ptr [r14+0x50] @@ -136,7 +136,7 @@ RWD24 dq 4024000000000000h ; 10 RWD32 dq 408F400000000000h, 0000000000000000h RWD48 dq 0000000000000088h, 0000000000000000h RWD64 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD80 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD80 dd 7FFFFFFFh
; Total bytes of code 273, prolog size 16, PerfScore 198.33, instruction count 64, allocated bytes for code 274 (MethodHash=fe77a3db) for method Xunit.DelegatingLongRunningTestDetectionSink:ThreadWorker():this (FullOpts)
+0 (0.00%) : 4377.dasm - Microsoft.CodeAnalysis.Shared.TestHooks.AsynchronousOperationListenerProvider+NullOperationListener:Delay(System.TimeSpan,System.Threading.CancellationToken):System.Threading.Tasks.Task`1[ubyte]:this (FullOpts)
@@ -93,7 +93,7 @@ G_M49748_IG04: ; bbWeight=0.50, gcrefRegs=000C {rdx rbx}, byrefRegs=0000 vcmppd k1, xmm0, xmmword ptr [reloc @RWD48], 13 vcvttsd2si rdi, xmm0 vpbroadcastq xmm0, rdi
- vpblendmq xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD64]
+ vpblendmq xmm0 {k1}, xmm0, qword ptr [reloc @RWD64] {1to2}
vmovd rdi, xmm0 cmp rdi, -1 jl G_M49748_IG21 @@ -312,7 +312,7 @@ RWD16 dq C30A36E2EB1C4328h ; -9.22337204e+14 RWD24 dd 00000000h, 00000000h RWD32 dq 0000000000000088h, 0000000000000000h RWD48 dq 43E0000000000000h, 43E0000000000000h
-RWD64 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh
+RWD64 dq 7FFFFFFFFFFFFFFFh
; Total bytes of code 698, prolog size 17, PerfScore 74.85, instruction count 153, allocated bytes for code 699 (MethodHash=ba3a3dab) for method Microsoft.CodeAnalysis.Shared.TestHooks.AsynchronousOperationListenerProvider+NullOperationListener:Delay(System.TimeSpan,System.Threading.CancellationToken):System.Threading.Tasks.Task`1[ubyte]:this (FullOpts)
+0 (0.00%) : 23653.dasm - Tests.System.TimeProviderTests+TestExtensionsTaskFactory:WaitAsync(System.Threading.Tasks.Task,System.TimeSpan,System.TimeProvider,System.Threading.CancellationToken):System.Threading.Tasks.Task:this (FullOpts)
@@ -50,7 +50,7 @@ G_M15219_IG04: ; bbWeight=1, gcrefRegs=0142 {rcx rsi r8}, byrefRegs=0000 vcmppd k1, xmm0, xmmword ptr [reloc @RWD48], 13 vcvttsd2si rdi, xmm0 vpbroadcastq xmm0, rdi
- vpblendmq xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD64]
+ vpblendmq xmm0 {k1}, xmm0, qword ptr [reloc @RWD64] {1to2}
vmovd rdx, xmm0 cmp rdx, -1 jl SHORT G_M15219_IG08 @@ -105,7 +105,7 @@ RWD16 dq C30A36E2EB1C4328h ; -9.22337204e+14 RWD24 dd 00000000h, 00000000h RWD32 dq 0000000000000088h, 0000000000000000h RWD48 dq 43E0000000000000h, 43E0000000000000h
-RWD64 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh
+RWD64 dq 7FFFFFFFFFFFFFFFh
; Total bytes of code 208, prolog size 4, PerfScore 61.33, instruction count 46, allocated bytes for code 209 (MethodHash=1265c48c) for method Tests.System.TimeProviderTests+TestExtensionsTaskFactory:WaitAsync(System.Threading.Tasks.Task,System.TimeSpan,System.TimeProvider,System.Threading.CancellationToken):System.Threading.Tasks.Task:this (FullOpts)
+6 (+0.85%) : 205695.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector128Tests:ConvertToInt32NativeTest():this (FullOpts)
@@ -52,7 +52,7 @@ G_M999_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vcvttss2si rdi, dword ptr [reloc @RWD00] vpbroadcastd xmm0, edi vmovaps xmmword ptr [rbp-0x40], xmm0
- vcvttps2dq xmm1, xmmword ptr [reloc @RWD16]
+ vcvttps2dq xmm1, dword ptr [reloc @RWD00] {1to4}
vmovaps xmmword ptr [rbp-0x50], xmm1 mov rbx, 0xD1FFAB1E ; Xunit.Sdk.AssertEqualityComparer`1[System.Runtime.Intrinsics.Vector128`1[int]] mov rdi, rbx @@ -129,12 +129,12 @@ G_M999_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ; gcrRegs +[rdi] call [<unknown method>] ; gcrRegs -[rdi r15]
- vcvttss2si rdi, dword ptr [reloc @RWD32]
+ vcvttss2si rdi, dword ptr [reloc @RWD04]
vpbroadcastd xmm0, edi vmovaps xmmword ptr [rbp-0x60], xmm0
- vcvttps2dq xmm1, xmmword ptr [reloc @RWD48]
+ vcvttps2dq xmm1, dword ptr [reloc @RWD04] {1to4}
vmovaps xmmword ptr [rbp-0x70], xmm1
- ;; size=288 bbWeight=1 PerfScore 67.75
+ ;; size=292 bbWeight=1 PerfScore 67.75
G_M999_IG03: ; bbWeight=1, extend mov rdi, rbx call CORINFO_HELP_NEWSFAST @@ -206,10 +206,10 @@ G_M999_IG03: ; bbWeight=1, extend ; gcrRegs +[rdi] call [<unknown method>] ; gcrRegs -[rdi r15]
- vcvttss2si rdi, dword ptr [reloc @RWD64]
+ vcvttss2si rdi, dword ptr [reloc @RWD08]
vpbroadcastd xmm0, edi vmovaps xmmword ptr [rbp-0x80], xmm0
- vcvttps2dq xmm1, xmmword ptr [reloc @RWD80]
+ vcvttps2dq xmm1, dword ptr [reloc @RWD08] {1to4}
vmovaps xmmword ptr [rbp-0x90], xmm1 mov rdi, rbx call CORINFO_HELP_NEWSFAST @@ -232,7 +232,7 @@ G_M999_IG03: ; bbWeight=1, extend ; byrRegs +[rdi] mov rsi, r15 ; gcrRegs +[rsi]
- ;; size=262 bbWeight=1 PerfScore 57.00
+ ;; size=264 bbWeight=1 PerfScore 57.00
G_M999_IG04: ; bbWeight=1, extend call CORINFO_HELP_ASSIGN_REF ; gcrRegs -[rax rsi r15] @@ -290,17 +290,11 @@ G_M999_IG05: ; bbWeight=1, epilog, nogc, extend ret ;; size=18 bbWeight=1 PerfScore 4.25 RWD00 dd FF7FFFFFh ; -3.40282e+38
-RWD04 dd 00000000h, 00000000h, 00000000h -RWD16 dq FF7FFFFFFF7FFFFFh, FF7FFFFFFF7FFFFFh -RWD32 dd 40266666h ; 2.6 -RWD36 dd 00000000h, 00000000h, 00000000h -RWD48 dq 4026666640266666h, 4026666640266666h -RWD64 dd 7F7FFFFFh ; 3.40282e+38 -RWD68 dd 00000000h, 00000000h, 00000000h -RWD80 dq 7F7FFFFF7F7FFFFFh, 7F7FFFFF7F7FFFFFh
+RWD04 dd 40266666h ; 2.6 +RWD08 dd 7F7FFFFFh ; 3.40282e+38
-; Total bytes of code 710, prolog size 25, PerfScore 160.50, instruction count 152, allocated bytes for code 710 (MethodHash=36e8fc18) for method System.Runtime.Intrinsics.Tests.Vectors.Vector128Tests:ConvertToInt32NativeTest():this (FullOpts)
+; Total bytes of code 716, prolog size 25, PerfScore 160.50, instruction count 152, allocated bytes for code 716 (MethodHash=36e8fc18) for method System.Runtime.Intrinsics.Tests.Vectors.Vector128Tests:ConvertToInt32NativeTest():this (FullOpts)
; ============================================================ Unwind Info:
+4 (+1.17%) : 205320.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512SingleSumTest():this (FullOpts)
@@ -40,23 +40,23 @@ G_M17931_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=12 bbWeight=1 PerfScore 5.50 G_M17931_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups xmm0, xmmword ptr [reloc @RWD00]
- vpermilps xmm1, xmmword ptr [reloc @RWD00], -79
+ vpermilps xmm1, dword ptr [reloc @RWD00] {1to4}, -79
vaddps xmm0, xmm1, xmm0 vpermilps xmm1, xmm0, 78 vaddps xmm0, xmm1, xmm0 vmovups xmm1, xmmword ptr [reloc @RWD00]
- vpermilps xmm2, xmmword ptr [reloc @RWD00], -79
+ vpermilps xmm2, dword ptr [reloc @RWD00] {1to4}, -79
vaddps xmm1, xmm2, xmm1 vpermilps xmm2, xmm1, 78 vaddps xmm1, xmm2, xmm1 vaddss xmm0, xmm0, xmm1 vmovups xmm1, xmmword ptr [reloc @RWD00]
- vpermilps xmm2, xmmword ptr [reloc @RWD00], -79
+ vpermilps xmm2, dword ptr [reloc @RWD00] {1to4}, -79
vaddps xmm1, xmm2, xmm1 vpermilps xmm2, xmm1, 78 vaddps xmm1, xmm2, xmm1 vmovups xmm2, xmmword ptr [reloc @RWD00]
- vpermilps xmm3, xmmword ptr [reloc @RWD00], -79
+ vpermilps xmm3, dword ptr [reloc @RWD00] {1to4}, -79
vaddps xmm2, xmm3, xmm2 vpermilps xmm3, xmm2, 78 vaddps xmm2, xmm3, xmm2 @@ -123,7 +123,7 @@ G_M17931_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovss xmm1, dword ptr [rbp-0x1C] mov rdi, rbx ; gcrRegs +[rdi]
- ;; size=314 bbWeight=1 PerfScore 83.25
+ ;; size=318 bbWeight=1 PerfScore 83.25
G_M17931_IG03: ; bbWeight=1, epilog, nogc, extend add rsp, 8 pop rbx @@ -136,7 +136,7 @@ RWD00 dq 3F8000003F800000h, 3F8000003F800000h RWD16 dd 41800000h ; 16
-; Total bytes of code 342, prolog size 12, PerfScore 93.00, instruction count 70, allocated bytes for code 342 (MethodHash=0324b9f4) for method System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512SingleSumTest():this (FullOpts)
+; Total bytes of code 346, prolog size 12, PerfScore 93.00, instruction count 70, allocated bytes for code 346 (MethodHash=0324b9f4) for method System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512SingleSumTest():this (FullOpts)
; ============================================================ Unwind Info:
+4 (+1.32%) : 205212.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512DoubleSumTest():this (FullOpts)
@@ -36,17 +36,17 @@ G_M47044_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=12 bbWeight=1 PerfScore 5.50 G_M47044_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups xmm0, xmmword ptr [reloc @RWD00]
- vpermilpd xmm1, xmmword ptr [reloc @RWD00], 1
+ vpermilpd xmm1, qword ptr [reloc @RWD00] {1to2}, 1
vaddpd xmm0, xmm1, xmm0 vmovups xmm1, xmmword ptr [reloc @RWD00]
- vpermilpd xmm2, xmmword ptr [reloc @RWD00], 1
+ vpermilpd xmm2, qword ptr [reloc @RWD00] {1to2}, 1
vaddpd xmm1, xmm2, xmm1 vaddsd xmm0, xmm0, xmm1 vmovups xmm1, xmmword ptr [reloc @RWD00]
- vpermilpd xmm2, xmmword ptr [reloc @RWD00], 1
+ vpermilpd xmm2, qword ptr [reloc @RWD00] {1to2}, 1
vaddpd xmm1, xmm2, xmm1 vmovups xmm2, xmmword ptr [reloc @RWD00]
- vpermilpd xmm3, xmmword ptr [reloc @RWD00], 1
+ vpermilpd xmm3, qword ptr [reloc @RWD00] {1to2}, 1
vaddpd xmm2, xmm3, xmm2 vaddsd xmm1, xmm1, xmm2 vaddsd xmm1, xmm0, xmm1 @@ -111,7 +111,7 @@ G_M47044_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovsd xmm1, qword ptr [rbp-0x20] mov rdi, rbx ; gcrRegs +[rdi]
- ;; size=274 bbWeight=1 PerfScore 67.25
+ ;; size=278 bbWeight=1 PerfScore 67.25
G_M47044_IG03: ; bbWeight=1, epilog, nogc, extend add rsp, 8 pop rbx @@ -124,7 +124,7 @@ RWD00 dq 3FF0000000000000h, 3FF0000000000000h RWD16 dq 4020000000000000h ; 8
-; Total bytes of code 302, prolog size 12, PerfScore 77.00, instruction count 62, allocated bytes for code 302 (MethodHash=8bf4483b) for method System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512DoubleSumTest():this (FullOpts)
+; Total bytes of code 306, prolog size 12, PerfScore 77.00, instruction count 62, allocated bytes for code 306 (MethodHash=8bf4483b) for method System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512DoubleSumTest():this (FullOpts)
; ============================================================ Unwind Info:
libraries.pmi.linux.x64.checked.mch
+0 (0.00%) : 9181.dasm - Microsoft.FSharp.Collections.ArrayModule:RandomChoiceBy[short](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],short[]):short (FullOpts)
@@ -51,7 +51,7 @@ G_M8967_IG02: ; bbWeight=1, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd eax, xmm0 cmp eax, r15d jae G_M8967_IG06 @@ -119,7 +119,7 @@ G_M8967_IG06: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=6 bbWeight=0 PerfScore 0.00 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD32 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD32 dd 7FFFFFFFh
; Total bytes of code 263, prolog size 12, PerfScore 50.33, instruction count 61, allocated bytes for code 264 (MethodHash=9b04dcf8) for method Microsoft.FSharp.Collections.ArrayModule:RandomChoiceBy[short](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],short[]):short (FullOpts)
+0 (0.00%) : 9185.dasm - Microsoft.FSharp.Collections.ArrayModule:RandomChoiceBy[long](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],long[]):long (FullOpts)
@@ -51,7 +51,7 @@ G_M831_IG02: ; bbWeight=1, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, b vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd eax, xmm0 cmp eax, r15d jae G_M831_IG06 @@ -119,7 +119,7 @@ G_M831_IG06: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=6 bbWeight=0 PerfScore 0.00 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD32 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD32 dd 7FFFFFFFh
; Total bytes of code 262, prolog size 12, PerfScore 48.33, instruction count 61, allocated bytes for code 263 (MethodHash=e694fcc0) for method Microsoft.FSharp.Collections.ArrayModule:RandomChoiceBy[long](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],long[]):long (FullOpts)
+0 (0.00%) : 9201.dasm - Microsoft.FSharp.Collections.ArrayModule:RandomSampleBy[System.__Canon](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],int,System.__Canon[]):System.__Canon[] (FullOpts)
@@ -155,7 +155,7 @@ G_M49895_IG07: ; bbWeight=4, gcrefRegs=8008 {rbx r15}, byrefRegs=0000 {}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si edx, xmm0 vpbroadcastd xmm0, edx
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd ecx, xmm0 mov edx, ecx mov ecx, dword ptr [r15+0x08] @@ -245,7 +245,7 @@ G_M49895_IG13: ; bbWeight=4, gcrefRegs=A008 {rbx r13 r15}, byrefRegs=0000 vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si edx, xmm0 vpbroadcastd xmm0, edx
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd eax, xmm0 lea rdx, [rbp-0x48] mov rdi, r13 @@ -459,7 +459,7 @@ G_M49895_IG19: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=6 bbWeight=0 PerfScore 0.00 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD32 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD32 dd 7FFFFFFFh
; Total bytes of code 1112, prolog size 33, PerfScore 470.78, instruction count 252, allocated bytes for code 1114 (MethodHash=1a363d18) for method Microsoft.FSharp.Collections.ArrayModule:RandomSampleBy[System.__Canon](Microsoft.FSharp.Core.FSharpFunc`2[Microsoft.FSharp.Core.Unit,double],int,System.__Canon[]):System.__Canon[] (FullOpts)
+0 (0.00%) : 275964.dasm - System.Threading.Barrier:SignalAndWait(System.TimeSpan,System.Threading.CancellationToken):ubyte:this (FullOpts)
@@ -59,7 +59,7 @@ G_M43881_IG04: ; bbWeight=1, gcrefRegs=0084 {rdx rdi}, byrefRegs=0000 {}, vcmppd k1, xmm1, xmmword ptr [reloc @RWD48], 13 vcvttsd2si rsi, xmm1 vpbroadcastq xmm1, rsi
- vpblendmq xmm1 {k1}, xmm1, xmmword ptr [reloc @RWD64]
+ vpblendmq xmm1 {k1}, xmm1, qword ptr [reloc @RWD64] {1to2}
vmovd rsi, xmm1 cmp rsi, -1 jl G_M43881_IG12 @@ -78,7 +78,7 @@ G_M43881_IG06: ; bbWeight=1, gcrefRegs=0084 {rdx rdi}, byrefRegs=0000 {}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD80], 13 vcvttsd2si esi, xmm0 vpbroadcastd xmm0, esi
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD96]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD96] {1to4}
vmovd esi, xmm0 call [<unknown method>] ; gcrRegs -[rdx rdi] @@ -149,9 +149,10 @@ RWD16 dq C30A36E2EB1C4328h ; -9.22337204e+14 RWD24 dd 00000000h, 00000000h RWD32 dq 0000000000000088h, 0000000000000000h RWD48 dq 43E0000000000000h, 43E0000000000000h
-RWD64 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh
+RWD64 dq 7FFFFFFFFFFFFFFFh +RWD72 dd 00000000h, 00000000h
RWD80 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD96 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD96 dd 7FFFFFFFh
; Total bytes of code 357, prolog size 12, PerfScore 88.33, instruction count 74, allocated bytes for code 359 (MethodHash=58165496) for method System.Threading.Barrier:SignalAndWait(System.TimeSpan,System.Threading.CancellationToken):ubyte:this (FullOpts)
+0 (0.00%) : 276656.dasm - System.Text.RegularExpressions.RegexRunner:g__ConfigureTimeout|24_0(System.TimeSpan):this (FullOpts)
@@ -46,7 +46,7 @@ G_M45711_IG04: ; bbWeight=1, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD48], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD64]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4}
vmovd dword ptr [rbx+0x64], xmm0 call <unknown method> movsxd rcx, dword ptr [rbx+0x64] @@ -73,7 +73,7 @@ RWD16 dq C30A36E2EB1C4328h ; -9.22337204e+14 RWD24 dq 3FE0000000000000h ; 0.5 RWD32 dq 0000000000000088h, 0000000000000000h RWD48 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD64 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD64 dd 7FFFFFFFh
; Total bytes of code 154, prolog size 8, PerfScore 65.58, instruction count 33, allocated bytes for code 155 (MethodHash=c23a4d70) for method System.Text.RegularExpressions.RegexRunner:<InitializeTimeout>g__ConfigureTimeout|24_0(System.TimeSpan):this (FullOpts)
+0 (0.00%) : 279404.dasm - Microsoft.CodeAnalysis.Collections.SegmentedList`1[ubyte]:TrimExcess():this (FullOpts)
@@ -30,7 +30,7 @@ G_M63820_IG02: ; bbWeight=1, gcrefRegs=0080 {rdi}, byrefRegs=0000 {}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd eax, xmm0 cmp esi, eax jl SHORT G_M63820_IG04 @@ -47,7 +47,7 @@ RWD00 dq 3FECCCCCCCCCCCCDh ; 0.9 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 77, prolog size 0, PerfScore 38.08, instruction count 14, allocated bytes for code 78 (MethodHash=1c0206b3) for method Microsoft.CodeAnalysis.Collections.SegmentedList`1[ubyte]:TrimExcess():this (FullOpts)
smoke_tests.nativeaot.linux.x64.checked.mch
+0 (0.00%) : 20750.dasm - System.Convert:ToInt32(double):int (FullOpts)
@@ -40,12 +40,12 @@ G_M1064_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre vcmppd k1, xmm1, xmmword ptr [reloc @RWD32], 13 vcvttsd2si eax, xmm1 vpbroadcastd xmm1, eax
- vpblendmd xmm1 {k1}, xmm1, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm1 {k1}, xmm1, dword ptr [reloc @RWD48] {1to4}
vmovd eax, xmm1 vxorps xmm1, xmm1, xmm1 vcvtsi2sd xmm1, xmm1, eax vsubsd xmm0, xmm0, xmm1
- vmovsd xmm1, qword ptr [reloc @RWD64]
+ vmovsd xmm1, qword ptr [reloc @RWD56]
vucomisd xmm1, xmm0 ja SHORT G_M1064_IG04 vucomisd xmm0, xmm1 @@ -64,7 +64,7 @@ G_M1064_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre ret ;; size=7 bbWeight=0.50 PerfScore 1.12 G_M1064_IG06: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
- vmovsd xmm1, qword ptr [reloc @RWD72]
+ vmovsd xmm1, qword ptr [reloc @RWD64]
vucomisd xmm1, xmm0 jbe SHORT G_M1064_IG09 vmovaps xmm1, xmm0 @@ -72,12 +72,12 @@ G_M1064_IG06: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0000 vcmppd k1, xmm1, xmmword ptr [reloc @RWD32], 13 vcvttsd2si eax, xmm1 vpbroadcastd xmm1, eax
- vpblendmd xmm1 {k1}, xmm1, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm1 {k1}, xmm1, dword ptr [reloc @RWD48] {1to4}
vmovd eax, xmm1 vxorps xmm1, xmm1, xmm1 vcvtsi2sd xmm1, xmm1, eax vsubsd xmm0, xmm0, xmm1
- vmovsd xmm1, qword ptr [reloc @RWD80]
+ vmovsd xmm1, qword ptr [reloc @RWD72]
vucomisd xmm0, xmm1 ja SHORT G_M1064_IG07 vucomisd xmm0, xmm1 @@ -117,10 +117,11 @@ RWD00 dq C1E0000000100000h ; -2.14748365e+09 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh -RWD64 dq BFE0000000000000h ; -0.5 -RWD72 dq 41DFFFFFFFE00000h ; 2.14748365e+09 -RWD80 dq 3FE0000000000000h ; 0.5
+RWD48 dd 7FFFFFFFh +RWD52 dd 00000000h +RWD56 dq BFE0000000000000h ; -0.5 +RWD64 dq 41DFFFFFFFE00000h ; 2.14748365e+09 +RWD72 dq 3FE0000000000000h ; 0.5
; Total bytes of code 279, prolog size 8, PerfScore 54.67, instruction count 67, allocated bytes for code 281 (MethodHash=f4f2fbd7) for method System.Convert:ToInt32(double):int (FullOpts)
+0 (0.00%) : 20778.dasm - System.Collections.Hashtable:rehash(int):this (FullOpts)
@@ -113,7 +113,7 @@ G_M22183_IG06: ; bbWeight=1, gcrefRegs=4008 {rbx r14}, byrefRegs=0000 {}, vcmpps k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttss2si edi, xmm0 vpbroadcastd xmm0, edi
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd dword ptr [rbx+0x30], xmm0 inc dword ptr [rbx+0x38] mov byte ptr [rbx+0x3C], 0 @@ -151,7 +151,7 @@ G_M22183_IG10: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { ;; size=6 bbWeight=0 PerfScore 0.00 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 4F0000004F000000h, 4F0000004F000000h
-RWD32 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD32 dd 7FFFFFFFh
; Total bytes of code 269, prolog size 19, PerfScore 159.58, instruction count 73, allocated bytes for code 270 (MethodHash=1a4da958) for method System.Collections.Hashtable:rehash(int):this (FullOpts)
+0 (0.00%) : 19151.dasm - System.Threading.ProcessorIdCache:ProcessorNumberSpeedCheck():ubyte (FullOpts)
@@ -184,7 +184,7 @@ G_M1452_IG20: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre vcmppd k1, xmm0, xmmword ptr [reloc @RWD64], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD80]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD80] {1to4}
vmovd eax, xmm0 mov ecx, 0x1388 cmp eax, 0x1388 @@ -223,7 +223,7 @@ RWD32 dq 4014000000000000h ; 5 RWD40 dd 00000000h, 00000000h RWD48 dq 0000000000000088h, 0000000000000000h RWD64 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD80 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD80 dd 7FFFFFFFh
; Total bytes of code 563, prolog size 22, PerfScore 1311.08, instruction count 134, allocated bytes for code 564 (MethodHash=45e1fa53) for method System.Threading.ProcessorIdCache:ProcessorNumberSpeedCheck():ubyte (FullOpts)
+0 (0.00%) : 19711.dasm - System.Collections.HashHelpers:IsPrime(int):ubyte (FullOpts)
@@ -41,7 +41,7 @@ G_M62345_IG05: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000 vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd ecx, xmm0 mov esi, 3 cmp ecx, 3 @@ -73,7 +73,7 @@ G_M62345_IG10: ; bbWeight=0.50, epilog, nogc, extend ;; size=2 bbWeight=0.50 PerfScore 0.75 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD32 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD32 dd 7FFFFFFFh
; Total bytes of code 116, prolog size 4, PerfScore 139.17, instruction count 35, allocated bytes for code 117 (MethodHash=632a0c76) for method System.Collections.HashHelpers:IsPrime(int):ubyte (FullOpts)
+0 (0.00%) : 21060.dasm - System.Number:Dragon4(ulong,int,uint,ubyte,int,ubyte,System.Span`1[ubyte],byref):uint (FullOpts)
@@ -251,7 +251,7 @@ G_M41408_IG15: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=2000 {r13}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si edi, xmm0 vpbroadcastd xmm0, edi
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd r14d, xmm0 test r14d, r14d jg G_M41408_IG23 @@ -977,7 +977,7 @@ RWD00 dq 3FD34413509F79FFh ; 0.301029996 RWD08 dq 3FE6147AE147AE14h ; 0.69 RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 2655, prolog size 63, PerfScore 530.08, instruction count 567, allocated bytes for code 2656 (MethodHash=ef215e3f) for method System.Number:Dragon4(ulong,int,uint,ubyte,int,ubyte,System.Span`1[ubyte],byref):uint (FullOpts)
+0 (0.00%) : 21156.dasm - System.Number+Grisu3:GetCachedPowerForBinaryExponentRange(int,int,byref):System.Number+DiyFp (FullOpts)
@@ -92,7 +92,7 @@ G_M18819_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=8000 {r15}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si edi, xmm0 vpbroadcastd xmm0, edi
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd r13d, xmm0 add r13d, 347 mov r12d, r13d @@ -181,7 +181,7 @@ RWD00 dq 3FD34413509F79FFh ; 0.301029996 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 298, prolog size 16, PerfScore 82.83, instruction count 70, allocated bytes for code 299 (MethodHash=763bb67c) for method System.Number+Grisu3:GetCachedPowerForBinaryExponentRange(int,int,byref):System.Number+DiyFp (FullOpts)
Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
coreclr_tests.run.linux.x64.checked.mch 1,678 0 2 1,676 -0 +3
libraries.pmi.linux.x64.checked.mch 386 0 0 386 -0 +0
libraries_tests.run.linux.x64.Release.mch 1,193 0 2 1,191 -0 +12
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 402 0 6 396 -0 +24
libraries.pmi.linux.x64.checked.mch 386 0 0 386 -0 +0
smoke_tests.nativeaot.linux.x64.checked.mch 6 0 0 6 -0 +0
4,051 0 10 4,041 -0 +39

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
coreclr_tests.run.linux.x64.checked.mch 1,678 0 0 1,678 0.00% 0.00% 0.0000%
libraries.pmi.linux.x64.checked.mch 386 0 0 386 0.00% 0.00% 0.0000%
libraries_tests.run.linux.x64.Release.mch 1,193 0 0 1,193 0.00% 0.00% 0.0000%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 402 0 0 402 0.00% 0.00% 0.0000%
libraries.pmi.linux.x64.checked.mch 386 0 0 386 0.00% 0.00% 0.0000%
smoke_tests.nativeaot.linux.x64.checked.mch 6 0 0 6 0.00% 0.00% 0.0000%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
coreclr_tests.run.linux.x64.checked.mch 607,366 364,865 242,501 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.x64.checked.mch 284,123 6 284,117 0 (0.00%) 0 (0.00%)
libraries_tests.run.linux.x64.Release.mch 824,857 538,771 286,086 0 (0.00%) 0 (0.00%)
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 325,178 22,031 303,147 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.x64.checked.mch 284,123 6 284,117 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.linux.x64.checked.mch 26,445 10 26,435 0 (0.00%) 0 (0.00%)
2,352,092 925,689 1,426,403 0 (0.00%) 0 (0.00%)

jit-analyze output

coreclr_tests.run.linux.x64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 400125258 (overridden on cmd)
Total bytes of diff: 400125261 (overridden on cmd)
Total bytes of delta: 3 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
           2 : 514325.dasm (0.16 % of base)
           1 : 505847.dasm (0.18 % of base)

2 total files with Code Size differences (0 improved, 2 regressed), 58 unchanged.

Top method regressions (bytes):
           2 (0.16 % of base) : 514325.dasm - Packet256Tracer:CreateDefaultScene():Scene (FullOpts)
           1 (0.18 % of base) : 505847.dasm - VectorMathTests.Program:TestEntryPoint():int (FullOpts)

Top method regressions (percentages):
           1 (0.18 % of base) : 505847.dasm - VectorMathTests.Program:TestEntryPoint():int (FullOpts)
           2 (0.16 % of base) : 514325.dasm - Packet256Tracer:CreateDefaultScene():Scene (FullOpts)


libraries.pmi.linux.x64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59499926 (overridden on cmd)
Total bytes of diff: 59499926 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
Detail diffs


0 total files with Code Size differences (0 improved, 0 regressed), 59 unchanged.


libraries_tests.run.linux.x64.Release.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 372437145 (overridden on cmd)
Total bytes of diff: 372437157 (overridden on cmd)
Total bytes of delta: 12 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
           6 : 557002.dasm (4.26 % of base)
           6 : 557115.dasm (4.35 % of base)

2 total files with Code Size differences (0 improved, 2 regressed), 58 unchanged.

Top method regressions (bytes):
           6 (4.35 % of base) : 557115.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector128Tests:ConvertToInt32NativeTest():this (Tier0)
           6 (4.26 % of base) : 557002.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:ConvertToInt32NativeTest():this (Tier0)

Top method regressions (percentages):
           6 (4.35 % of base) : 557115.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector128Tests:ConvertToInt32NativeTest():this (Tier0)
           6 (4.26 % of base) : 557002.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:ConvertToInt32NativeTest():this (Tier0)


libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 144246049 (overridden on cmd)
Total bytes of diff: 144246073 (overridden on cmd)
Total bytes of delta: 24 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
           6 : 205695.dasm (0.85 % of base)
           6 : 205588.dasm (0.82 % of base)
           4 : 205212.dasm (1.32 % of base)
           4 : 205320.dasm (1.17 % of base)
           2 : 205395.dasm (0.80 % of base)
           2 : 205175.dasm (0.74 % of base)

6 total files with Code Size differences (0 improved, 6 regressed), 54 unchanged.

Top method regressions (bytes):
           6 (0.85 % of base) : 205695.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector128Tests:ConvertToInt32NativeTest():this (FullOpts)
           6 (0.82 % of base) : 205588.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:ConvertToInt32NativeTest():this (FullOpts)
           4 (1.32 % of base) : 205212.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512DoubleSumTest():this (FullOpts)
           4 (1.17 % of base) : 205320.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512SingleSumTest():this (FullOpts)
           2 (0.80 % of base) : 205395.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:Vector256DoubleSumTest():this (FullOpts)
           2 (0.74 % of base) : 205175.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:Vector256SingleSumTest():this (FullOpts)

Top method regressions (percentages):
           4 (1.32 % of base) : 205212.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512DoubleSumTest():this (FullOpts)
           4 (1.17 % of base) : 205320.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512SingleSumTest():this (FullOpts)
           6 (0.85 % of base) : 205695.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector128Tests:ConvertToInt32NativeTest():this (FullOpts)
           6 (0.82 % of base) : 205588.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:ConvertToInt32NativeTest():this (FullOpts)
           2 (0.80 % of base) : 205395.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:Vector256DoubleSumTest():this (FullOpts)
           2 (0.74 % of base) : 205175.dasm - System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests:Vector256SingleSumTest():this (FullOpts)


libraries.pmi.linux.x64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59499926 (overridden on cmd)
Total bytes of diff: 59499926 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
Detail diffs


0 total files with Code Size differences (0 improved, 0 regressed), 59 unchanged.


smoke_tests.nativeaot.linux.x64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 4141216 (overridden on cmd)
Total bytes of diff: 4141216 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
Detail diffs


0 total files with Code Size differences (0 improved, 0 regressed), 6 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).



osx arm64

Diffs are based on 933,703 contexts (390,234 MinOpts, 543,469 FullOpts).

MISSED contexts: 1 (0.00%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
coreclr_tests.run.osx.arm64.checked.mch 627,408 390,226 237,182 1 (0.00%) 1 (0.00%)
libraries.pmi.osx.arm64.checked.mch 283,362 6 283,356 0 (0.00%) 0 (0.00%)
realworld.run.osx.arm64.checked.mch 22,933 2 22,931 0 (0.00%) 0 (0.00%)
933,703 390,234 543,469 1 (0.00%) 1 (0.00%)

windows arm64

Diffs are based on 383,129 contexts (21,722 MinOpts, 361,407 FullOpts).

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 335,602 21,711 313,891 0 (0.00%) 0 (0.00%)
realworld.run.windows.arm64.checked.mch 24,736 2 24,734 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.windows.arm64.checked.mch 22,791 9 22,782 0 (0.00%) 0 (0.00%)
383,129 21,722 361,407 0 (0.00%) 0 (0.00%)

windows x64

Diffs are based on 54,760 contexts (14 MinOpts, 54,746 FullOpts).

Overall (+0 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
realworld.run.windows.x64.checked.mch 10,436,128 +0 0.00%
smoke_tests.nativeaot.windows.x64.checked.mch 4,723,203 +0 0.00%
FullOpts (+0 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
realworld.run.windows.x64.checked.mch 10,211,294 +0 0.00%
smoke_tests.nativeaot.windows.x64.checked.mch 4,722,114 +0 0.00%
Example diffs
realworld.run.windows.x64.checked.mch
+0 (0.00%) : 1325.dasm - BepuPhysics.BatchCompressor:Compress(BepuUtilities.Memory.BufferPool,BepuUtilities.IThreadDispatcher,ubyte):this (FullOpts)
@@ -211,7 +211,7 @@ G_M17847_IG06: ; bbWeight=0.50, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=0 vcmppd k1, xmm1, xmmword ptr [reloc @RWD32], 13 vcvttsd2si ecx, xmm1 vpbroadcastd xmm1, ecx
- vpblendmd xmm1 {k1}, xmm1, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm1 {k1}, xmm1, dword ptr [reloc @RWD48] {1to4}
vmovd r15d, xmm1 vmulss xmm0, xmm0, dword ptr [rbx+0x28] vcvtss2sd xmm0, xmm0, xmm0 @@ -221,7 +221,7 @@ G_M17847_IG06: ; bbWeight=0.50, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=0 vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si ecx, xmm0 vpbroadcastd xmm0, ecx
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd r13d, xmm0 mov dword ptr [rsp+0xC4], r13d lea r12, bword ptr [rbx+0x40] @@ -858,7 +858,7 @@ G_M17847_IG56: ; bbWeight=0, gcVars=00000000000000000000000000000000 {}, RWD00 dq 3FF0000000000000h, 0000000000000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 1790, prolog size 36, PerfScore 1633.96, instruction count 454, allocated bytes for code 1792 (MethodHash=f3e4ba48) for method BepuPhysics.BatchCompressor:Compress(BepuUtilities.Memory.BufferPool,BepuUtilities.IThreadDispatcher,ubyte):this (FullOpts)
+0 (0.00%) : 2601.dasm - SixLabors.ImageSharp.Formats.Png.PngEncoderOptionsHelpers:CalculateBitDepth[SixLabors.ImageSharp.PixelFormats.Rgba32](SixLabors.ImageSharp.Formats.Png.PngEncoderOptions,SixLabors.ImageSharp.IndexedImageFrame`1[SixLabors.ImageSharp.PixelFormats.Rgba32]):ubyte (FullOpts)
@@ -93,7 +93,7 @@ G_M10627_IG05: ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si ecx, xmm0 vpbroadcastd xmm0, ecx
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd ecx, xmm0 cmp ecx, 1 jg SHORT G_M10627_IG07 @@ -207,7 +207,7 @@ RWD00 dq 3FE62E42FEFA39EFh ; 0.693147181 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 391, prolog size 10, PerfScore 70.54, instruction count 104, allocated bytes for code 392 (MethodHash=9c54d67c) for method SixLabors.ImageSharp.Formats.Png.PngEncoderOptionsHelpers:CalculateBitDepth[SixLabors.ImageSharp.PixelFormats.Rgba32](SixLabors.ImageSharp.Formats.Png.PngEncoderOptions,SixLabors.ImageSharp.IndexedImageFrame`1[SixLabors.ImageSharp.PixelFormats.Rgba32]):ubyte (FullOpts)
+0 (0.00%) : 2725.dasm - SixLabors.ImageSharp.Formats.Gif.GifEncoderCore:Encode[SixLabors.ImageSharp.PixelFormats.Rgba32](SixLabors.ImageSharp.Image`1[SixLabors.ImageSharp.PixelFormats.Rgba32],System.IO.Stream,System.Threading.CancellationToken):this (FullOpts)
@@ -333,7 +333,7 @@ G_M17066_IG20: ; bbWeight=1, gcrefRegs=D0C8 {rbx rsi rdi r12 r14 r15}, by vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si edx, xmm0 vpbroadcastd xmm0, edx
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd edx, xmm0 mov ecx, 1 cmp edx, 1 @@ -587,7 +587,7 @@ RWD00 dq 3FE62E42FEFA39EFh ; 0.693147181 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 1055, prolog size 53, PerfScore 245.08, instruction count 268, allocated bytes for code 1056 (MethodHash=2577bd55) for method SixLabors.ImageSharp.Formats.Gif.GifEncoderCore:Encode[SixLabors.ImageSharp.PixelFormats.Rgba32](SixLabors.ImageSharp.Image`1[SixLabors.ImageSharp.PixelFormats.Rgba32],System.IO.Stream,System.Threading.CancellationToken):this (FullOpts)
+0 (0.00%) : 6012.dasm - System.Security.Cryptography.X509Certificates.ChainPal:BuildChain(ubyte,System.Security.Cryptography.X509Certificates.ICertificatePal,System.Security.Cryptography.X509Certificates.X509Certificate2Collection,System.Security.Cryptography.OidCollection,System.Security.Cryptography.OidCollection,int,int,System.Security.Cryptography.X509Certificates.X509Certificate2Collection,int,System.DateTime,System.TimeSpan,ubyte):System.Security.Cryptography.X509Certificates.IChainPal (FullOpts)
@@ -243,7 +243,7 @@ G_M12200_IG17: ; bbWeight=1, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD48], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD64]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD64] {1to4}
vmovd eax, xmm0 mov dword ptr [rbp-0x38], eax mov rcx, qword ptr [rbp+0x58] @@ -578,7 +578,7 @@ RWD16 dq C30A36E2EB1C4328h ; -9.22337204e+14 RWD24 dd 00000000h, 00000000h RWD32 dq 0000000000000088h, 0000000000000000h RWD48 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD64 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD64 dd 7FFFFFFFh
; Total bytes of code 1144, prolog size 61, PerfScore 271.50, instruction count 285, allocated bytes for code 1145 (MethodHash=bb45d057) for method System.Security.Cryptography.X509Certificates.ChainPal:BuildChain(ubyte,System.Security.Cryptography.X509Certificates.ICertificatePal,System.Security.Cryptography.X509Certificates.X509Certificate2Collection,System.Security.Cryptography.OidCollection,System.Security.Cryptography.OidCollection,int,int,System.Security.Cryptography.X509Certificates.X509Certificate2Collection,int,System.DateTime,System.TimeSpan,ubyte):System.Security.Cryptography.X509Certificates.IChainPal (FullOpts)
+0 (0.00%) : 6308.dasm - System.DateTimeParse:DoStrictParse(System.ReadOnlySpan`1[ushort],System.ReadOnlySpan`1[ushort],int,System.Globalization.DateTimeFormatInfo,byref):ubyte (FullOpts)
@@ -747,7 +747,7 @@ G_M56315_IG48: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0008 {rbx}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si rdx, xmm0 vpbroadcastq xmm0, rdx
- vpblendmq xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmq xmm0 {k1}, xmm0, qword ptr [reloc @RWD48] {1to2}
vmovd rdx, xmm0 lea rcx, bword ptr [rbx+0x40] ; byrRegs +[rcx] @@ -876,7 +876,7 @@ RWD00 dq 416312D000000000h ; 10000000 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 43E0000000000000h, 43E0000000000000h
-RWD48 dq 7FFFFFFFFFFFFFFFh, 7FFFFFFFFFFFFFFFh
+RWD48 dq 7FFFFFFFFFFFFFFFh
; Total bytes of code 1737, prolog size 74, PerfScore 4900.67, instruction count 463, allocated bytes for code 1738 (MethodHash=cd172404) for method System.DateTimeParse:DoStrictParse(System.ReadOnlySpan`1[ushort],System.ReadOnlySpan`1[ushort],int,System.Globalization.DateTimeFormatInfo,byref):ubyte (FullOpts)
+0 (0.00%) : 19380.dasm - Microsoft.Cci.FullMetadataWriter:Create(Microsoft.CodeAnalysis.Emit.EmitContext,Microsoft.CodeAnalysis.CommonMessageProvider,ubyte,ubyte,ubyte,ubyte,System.Threading.CancellationToken):Microsoft.Cci.MetadataWriter (FullOpts)
@@ -177,7 +177,7 @@ G_M25085_IG11: ; bbWeight=0.50, gcrefRegs=9080 {rdi r12 r15}, byrefRegs=0 vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si r8d, xmm0 vpbroadcastd xmm0, r8d
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd r8d, xmm0 mov edx, dword ptr [rbp-0x3C] mov rcx, gword ptr [rbp-0x48] @@ -249,7 +249,7 @@ RWD00 dq 3FF8000000000000h ; 1.5 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 477, prolog size 24, PerfScore 93.79, instruction count 124, allocated bytes for code 478 (MethodHash=74b69e02) for method Microsoft.Cci.FullMetadataWriter:Create(Microsoft.CodeAnalysis.Emit.EmitContext,Microsoft.CodeAnalysis.CommonMessageProvider,ubyte,ubyte,ubyte,ubyte,System.Threading.CancellationToken):Microsoft.Cci.MetadataWriter (FullOpts)
smoke_tests.nativeaot.windows.x64.checked.mch
+0 (0.00%) : 17661.dasm - System.Convert:ToInt32(double):int (FullOpts)
@@ -38,12 +38,12 @@ G_M1064_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre vcmppd k1, xmm1, xmmword ptr [reloc @RWD32], 13 vcvttsd2si eax, xmm1 vpbroadcastd xmm1, eax
- vpblendmd xmm1 {k1}, xmm1, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm1 {k1}, xmm1, dword ptr [reloc @RWD48] {1to4}
vmovd eax, xmm1 vxorps xmm1, xmm1, xmm1 vcvtsi2sd xmm1, xmm1, eax vsubsd xmm0, xmm0, xmm1
- vmovsd xmm1, qword ptr [reloc @RWD64]
+ vmovsd xmm1, qword ptr [reloc @RWD56]
vucomisd xmm1, xmm0 ja SHORT G_M1064_IG04 vucomisd xmm0, xmm1 @@ -61,7 +61,7 @@ G_M1064_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre ret ;; size=6 bbWeight=0.50 PerfScore 0.88 G_M1064_IG06: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
- vmovsd xmm1, qword ptr [reloc @RWD72]
+ vmovsd xmm1, qword ptr [reloc @RWD64]
vucomisd xmm1, xmm0 jbe SHORT G_M1064_IG09 vmovaps xmm1, xmm0 @@ -69,12 +69,12 @@ G_M1064_IG06: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0000 vcmppd k1, xmm1, xmmword ptr [reloc @RWD32], 13 vcvttsd2si eax, xmm1 vpbroadcastd xmm1, eax
- vpblendmd xmm1 {k1}, xmm1, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm1 {k1}, xmm1, dword ptr [reloc @RWD48] {1to4}
vmovd eax, xmm1 vxorps xmm1, xmm1, xmm1 vcvtsi2sd xmm1, xmm1, eax vsubsd xmm0, xmm0, xmm1
- vmovsd xmm1, qword ptr [reloc @RWD80]
+ vmovsd xmm1, qword ptr [reloc @RWD72]
vucomisd xmm0, xmm1 ja SHORT G_M1064_IG07 vucomisd xmm0, xmm1 @@ -116,10 +116,11 @@ RWD00 dq C1E0000000100000h ; -2.14748365e+09 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh -RWD64 dq BFE0000000000000h ; -0.5 -RWD72 dq 41DFFFFFFFE00000h ; 2.14748365e+09 -RWD80 dq 3FE0000000000000h ; 0.5
+RWD48 dd 7FFFFFFFh +RWD52 dd 00000000h +RWD56 dq BFE0000000000000h ; -0.5 +RWD64 dq 41DFFFFFFFE00000h ; 2.14748365e+09 +RWD72 dq 3FE0000000000000h ; 0.5
; Total bytes of code 274, prolog size 5, PerfScore 51.92, instruction count 63, allocated bytes for code 276 (MethodHash=f4f2fbd7) for method System.Convert:ToInt32(double):int (FullOpts)
+0 (0.00%) : 17869.dasm - System.Number:Dragon4(ulong,int,uint,ubyte,int,ubyte,System.Span`1[ubyte],byref):uint (FullOpts)
@@ -240,7 +240,7 @@ G_M41408_IG13: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0020 {rbp}, byr vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si ecx, xmm0 vpbroadcastd xmm0, ecx
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd r15d, xmm0 test r15d, r15d jg G_M41408_IG19 @@ -929,7 +929,7 @@ RWD00 dq 3FD34413509F79FFh ; 0.301029996 RWD08 dq 3FE6147AE147AE14h ; 0.69 RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 2498, prolog size 64, PerfScore 457.33, instruction count 534, allocated bytes for code 2499 (MethodHash=ef215e3f) for method System.Number:Dragon4(ulong,int,uint,ubyte,int,ubyte,System.Span`1[ubyte],byref):uint (FullOpts)
+0 (0.00%) : 16106.dasm - System.Threading.ProcessorIdCache:ProcessorNumberSpeedCheck():ubyte (FullOpts)
@@ -161,7 +161,7 @@ G_M1452_IG16: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre vcmppd k1, xmm0, xmmword ptr [reloc @RWD64], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD80]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD80] {1to4}
vmovd eax, xmm0 mov ecx, 0x1388 cmp eax, 0x1388 @@ -211,7 +211,7 @@ RWD32 dq 4014000000000000h ; 5 RWD40 dd 00000000h, 00000000h RWD48 dq 0000000000000088h, 0000000000000000h RWD64 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD80 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD80 dd 7FFFFFFFh
; Total bytes of code 497, prolog size 28, PerfScore 986.58, instruction count 121, allocated bytes for code 498 (MethodHash=45e1fa53) for method System.Threading.ProcessorIdCache:ProcessorNumberSpeedCheck():ubyte (FullOpts)
+0 (0.00%) : 17930.dasm - System.Number+Grisu3:GetCachedPowerForBinaryExponentRange(int,int,byref):System.Number+DiyFp (FullOpts)
@@ -94,7 +94,7 @@ G_M18819_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0088 {rbx rdi}, vcmppd k1, xmm0, xmmword ptr [reloc @RWD32], 13 vcvttsd2si ecx, xmm0 vpbroadcastd xmm0, ecx
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD48]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD48] {1to4}
vmovd r14d, xmm0 add r14d, 347 mov r15d, r14d @@ -191,7 +191,7 @@ RWD00 dq 3FD34413509F79FFh ; 0.301029996 RWD08 dd 00000000h, 00000000h RWD16 dq 0000000000000088h, 0000000000000000h RWD32 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD48 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD48 dd 7FFFFFFFh
; Total bytes of code 302, prolog size 12, PerfScore 84.08, instruction count 73, allocated bytes for code 303 (MethodHash=763bb67c) for method System.Number+Grisu3:GetCachedPowerForBinaryExponentRange(int,int,byref):System.Number+DiyFp (FullOpts)
+0 (0.00%) : 16747.dasm - System.Collections.HashHelpers:IsPrime(int):ubyte (FullOpts)
@@ -38,7 +38,7 @@ G_M62345_IG05: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000 vcmppd k1, xmm0, xmmword ptr [reloc @RWD16], 13 vcvttsd2si eax, xmm0 vpbroadcastd xmm0, eax
- vpblendmd xmm0 {k1}, xmm0, xmmword ptr [reloc @RWD32]
+ vpblendmd xmm0 {k1}, xmm0, dword ptr [reloc @RWD32] {1to4}
vmovd r8d, xmm0 mov r10d, 3 cmp r8d, 3 @@ -68,7 +68,7 @@ G_M62345_IG10: ; bbWeight=0.50, epilog, nogc, extend ;; size=1 bbWeight=0.50 PerfScore 0.50 RWD00 dq 0000000000000088h, 0000000000000000h RWD16 dq 41DFFFFFFFC00000h, 41DFFFFFFFC00000h
-RWD32 dq 7FFFFFFF7FFFFFFFh, 7FFFFFFF7FFFFFFFh
+RWD32 dd 7FFFFFFFh
; Total bytes of code 114, prolog size 0, PerfScore 137.17, instruction count 30, allocated bytes for code 115 (MethodHash=632a0c76) for method System.Collections.HashHelpers:IsPrime(int):ubyte (FullOpts)
Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
realworld.run.windows.x64.checked.mch 66 0 0 66 -0 +0
smoke_tests.nativeaot.windows.x64.checked.mch 5 0 0 5 -0 +0
71 0 0 71 -0 +0

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
realworld.run.windows.x64.checked.mch 66 0 0 66 0.00% 0.00% 0.0000%
smoke_tests.nativeaot.windows.x64.checked.mch 5 0 0 5 0.00% 0.00% 0.0000%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
realworld.run.windows.x64.checked.mch 24,842 2 24,840 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.windows.x64.checked.mch 29,918 12 29,906 0 (0.00%) 0 (0.00%)
54,760 14 54,746 0 (0.00%) 0 (0.00%)

jit-analyze output

realworld.run.windows.x64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 10436128 (overridden on cmd)
Total bytes of diff: 10436128 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
Detail diffs


0 total files with Code Size differences (0 improved, 0 regressed), 48 unchanged.


smoke_tests.nativeaot.windows.x64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 4723203 (overridden on cmd)
Total bytes of diff: 4723203 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
Detail diffs


0 total files with Code Size differences (0 improved, 0 regressed), 5 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).



@saucecontrol
Copy link
Member Author

FWIW, I think this is a net negative change code quality wise since the current broadcast logic only handles constant vectors. An aligned full-width vector load is cheaper than a broadcast, so I don't think you'd normally do it unless optimizing for code size.

This does, however, make it possible to test embedded broadcast encoding of more instructions (I added it while testing the GFNI implementation for correctness) and makes it easier to improve centrally later. I'd like to look at containing normal data broadcasts, where containment actually saves a broadcast instruction -- unless someone else has that planned.

@tannergooding
Copy link
Member

FWIW, I think this is a net negative change code quality wise since the current broadcast logic only handles constant vectors. An aligned full-width vector load is cheaper than a broadcast, so I don't think you'd normally do it unless optimizing for code size.

That's not the case for embedded broadcasts. Both Intel and AMD document full-width memory loads and embedded broadcast memory loads as being the same cost, sites such as uops.info similarly show latencies and throughput measurements as identical (for almost all instructions).

Additionally, it also helps cache locality and other factors, so it is typically viewed as a net win and is the preferred default.

@tannergooding
Copy link
Member

I'd like to look at containing normal data broadcasts, where containment actually saves a broadcast instruction

We already contain some of these as well, when they come from memory:
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCUBXAOwwEt8YLAJI8ovLrl5hcAbhr0mrTj36CRGMRKm4WADQAcSObQDMDbrmwAzGE1IMAwgwDeNBu6ZmAajDAZo5KT6ADxWADYQ2BgAfAwAsgAU4ZEYAFQMACZR2ACUbh4FALyxPn4BQaERUdEsAFowUBAMANQMAIIAbggsAEKN2BlgeBgAykNh2FAAKhCl/lCB+glZGLnGAL5AA

For example:

public Vector128<float> M(float* data)
        => Vector128<float>.Zero + Avx.BroadcastScalarToVector128(data);

generates:

    L0000: vzeroupper
    L0003: vxorps xmm0, xmm0, xmm0
    L0007: vaddps xmm0, xmm0, [r8]{1to4}
    L000d: vmovups [rdx], xmm0
    L0011: mov rax, rdx
    L0014: ret

There's likely additional patterns that need to be recognized. For example, Vector128<float>.Zero + Vector128.Create(*data) doesn't currently trigger this, but it could.

@saucecontrol
Copy link
Member Author

Good to know, thanks!

@BruceForstall BruceForstall merged commit 30dabfd into dotnet:main Nov 5, 2024
108 checks passed
@saucecontrol saucecontrol deleted the evex-broadcast branch November 5, 2024 00:30
@github-actions github-actions bot locked and limited conversation to collaborators Dec 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants