Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GetIndexOfFirstNonAsciiChar intrinsic on AArch64 #71637

Conversation

SwapnilGaikwad
Copy link
Contributor

Fixes #41292

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jul 5, 2022
@ghost
Copy link

ghost commented Jul 5, 2022

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #41292

Author: SwapnilGaikwad
Assignees: -
Labels:

area-System.Text.Encoding

Milestone: -

@SwapnilGaikwad
Copy link
Contributor Author

I will post the benchmarking numbers and initial performance analysis shortly.

@SwapnilGaikwad
Copy link
Contributor Author

The performance improvements with the patch are small. However, the patch does refactoring to unify x86 and AArch64 versions to use the Vector API.

Benchmarking numbers for the patch are following.
Arm64

|       Method |        Job |                                                                                                 Toolchain | size | encName |         Mean |     Error |    StdDev |       Median |          Min |          Max | Ratio | MannWhitney(2%) |  Gen 0 | Allocated | Alloc Ratio |
|------------- |----------- |---------------------------------------------------------------------------------------------------------- |----- |-------- |-------------:|----------:|----------:|-------------:|-------------:|-------------:|------:|---------------- |-------:|----------:|------------:|
|     GetBytes | Job-LFVTIV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |   16 |   ascii |    47.263 ns | 0.1143 ns | 0.0892 ns |    47.245 ns |    47.150 ns |    47.421 ns |  1.00 |            Base | 0.0023 |      40 B |        1.00 |
|     GetBytes | Job-SGBIFH | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |   16 |   ascii |    46.514 ns | 0.2236 ns | 0.1867 ns |    46.442 ns |    46.371 ns |    46.936 ns |  0.98 |            Same | 0.0023 |      40 B |        1.00 |
|              |            |                                                                                                           |      |         |              |           |           |              |              |              |       |                 |        |           |             |
|     GetBytes | Job-LFVTIV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |   16 |   utf-8 |    58.582 ns | 0.0531 ns | 0.0415 ns |    58.573 ns |    58.532 ns |    58.698 ns |  1.00 |            Base | 0.0023 |      40 B |        1.00 |
|     GetBytes | Job-SGBIFH | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |   16 |   utf-8 |    55.934 ns | 0.2763 ns | 0.2449 ns |    55.827 ns |    55.728 ns |    56.478 ns |  0.95 |          Faster | 0.0022 |      40 B |        1.00 |
|              |            |                                                                                                           |      |         |              |           |           |              |              |              |       |                 |        |           |             |
|     GetBytes | Job-LFVTIV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 |   ascii |   203.600 ns | 1.0755 ns | 0.9534 ns |   203.162 ns |   202.350 ns |   205.320 ns |  1.00 |            Base | 0.0315 |     536 B |        1.00 |
|     GetBytes | Job-SGBIFH | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 |   ascii |   205.021 ns | 1.2562 ns | 1.1136 ns |   204.471 ns |   204.049 ns |   207.002 ns |  1.01 |            Same | 0.0318 |     536 B |        1.00 |
|              |            |                                                                                                           |      |         |              |           |           |              |              |              |       |                 |        |           |             |
|     GetBytes | Job-LFVTIV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 |   utf-8 |   322.225 ns | 1.4594 ns | 1.2937 ns |   321.827 ns |   320.871 ns |   325.137 ns |  1.00 |            Base | 0.0313 |     536 B |        1.00 |
|     GetBytes | Job-SGBIFH | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 |   utf-8 |   315.684 ns | 1.7892 ns | 1.4940 ns |   314.997 ns |   314.407 ns |   318.729 ns |  0.98 |            Same | 0.0319 |     536 B |        1.00 |
|              |            |                                                                                                           |      |         |              |           |           |              |              |              |       |                 |        |           |             |
|     GetBytes | Job-LFVTIV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun | 1024 |   ascii |   347.003 ns | 1.7417 ns | 1.5440 ns |   346.311 ns |   345.508 ns |   349.566 ns |  1.00 |            Base | 0.0616 |    1048 B |        1.00 |
|     GetBytes | Job-SGBIFH | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun | 1024 |   ascii |   339.395 ns | 1.6222 ns | 1.4380 ns |   338.871 ns |   337.981 ns |   341.735 ns |  0.98 |            Same | 0.0617 |    1048 B |        1.00 |
|              |            |                                                                                                           |      |         |              |           |           |              |              |              |       |                 |        |           |             |
|     GetBytes | Job-LFVTIV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun | 1024 |   utf-8 |   555.654 ns | 3.0607 ns | 2.7132 ns |   554.221 ns |   553.483 ns |   560.610 ns |  1.00 |            Base | 0.0606 |    1048 B |        1.00 |
|     GetBytes | Job-SGBIFH | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun | 1024 |   utf-8 |   563.409 ns | 2.3790 ns | 1.9866 ns |   562.619 ns |   561.971 ns |   568.817 ns |  1.01 |            Same | 0.0611 |    1048 B |        1.00 |
|              |            |                                                                                                           |      |         |              |           |           |              |              |              |       |                 |        |           |             |
|     GetBytes | Job-LFVTIV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun | 2048 |   ascii |   637.088 ns | 2.6175 ns | 2.1858 ns |   636.165 ns |   634.775 ns |   642.616 ns |  1.00 |            Base | 0.1225 |    2072 B |        1.00 |
|     GetBytes | Job-SGBIFH | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun | 2048 |   ascii |   625.226 ns | 3.0954 ns | 2.7439 ns |   623.931 ns |   622.933 ns |   631.333 ns |  0.98 |            Same | 0.1227 |    2072 B |        1.00 |
|              |            |                                                                                                           |      |         |              |           |           |              |              |              |       |                 |        |           |             |
|     GetBytes | Job-LFVTIV |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun | 2048 |   utf-8 | 1,047.044 ns | 9.9021 ns | 8.2687 ns | 1,046.708 ns | 1,037.919 ns | 1,063.421 ns |  1.00 |            Base | 0.1209 |    2072 B |        1.00 |
|     GetBytes | Job-SGBIFH | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/7.0.0/corerun | 2048 |   utf-8 | 1,030.213 ns | 4.2178 ns | 3.5221 ns | 1,028.903 ns | 1,027.614 ns | 1,039.445 ns |  0.98 |            Same | 0.1236 |    2072 B |        1.00 |

x86

|       Method |        Job |                                                                                               Toolchain | size | encName |      Mean |    Error |   StdDev |    Median |       Min |       Max | Ratio | MannWhitney(2%) |  Gen 0 | Allocated | Alloc Ratio |
|------------- |----------- |-------------------------------------------------------------------------------------------------------- |----- |-------- |----------:|---------:|---------:|----------:|----------:|----------:|------:|---------------- |-------:|----------:|------------:|
|     GetBytes | Job-FLTCZI |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |   16 |   ascii |  57.36 ns | 0.148 ns | 0.116 ns |  57.36 ns |  57.23 ns |  57.56 ns |  1.00 |            Base | 0.0023 |      40 B |        1.00 |
|     GetBytes | Job-MHOLRA | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |   16 |   ascii |  58.97 ns | 0.025 ns | 0.021 ns |  58.96 ns |  58.95 ns |  59.02 ns |  1.03 |          Slower | 0.0023 |      40 B |        1.00 |
|              |            |                                                                                                         |      |         |           |          |          |           |           |           |       |                 |        |           |             |
|     GetBytes | Job-FLTCZI |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |   16 |   utf-8 |  70.77 ns | 0.259 ns | 0.230 ns |  70.84 ns |  70.20 ns |  70.91 ns |  1.00 |            Base | 0.0024 |      40 B |        1.00 |
|     GetBytes | Job-MHOLRA | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |   16 |   utf-8 |  71.47 ns | 0.038 ns | 0.032 ns |  71.48 ns |  71.41 ns |  71.53 ns |  1.01 |            Same | 0.0025 |      40 B |        1.00 |
|              |            |                                                                                                         |      |         |           |          |          |           |           |           |       |                 |        |           |             |
|     GetBytes | Job-FLTCZI |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 |   ascii | 235.84 ns | 0.039 ns | 0.032 ns | 235.83 ns | 235.79 ns | 235.91 ns |  1.00 |            Base | 0.0339 |     536 B |        1.00 |
|     GetBytes | Job-MHOLRA | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 |   ascii | 234.51 ns | 0.032 ns | 0.027 ns | 234.51 ns | 234.48 ns | 234.56 ns |  0.99 |            Same | 0.0336 |     536 B |        1.00 |
|              |            |                                                                                                         |      |         |           |          |          |           |           |           |       |                 |        |           |             |
|     GetBytes | Job-FLTCZI |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 |   utf-8 | 299.05 ns | 0.303 ns | 0.253 ns | 298.96 ns | 298.76 ns | 299.52 ns |  1.00 |            Base | 0.0335 |     536 B |        1.00 |
|     GetBytes | Job-MHOLRA | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 |   utf-8 | 311.11 ns | 0.399 ns | 0.354 ns | 310.99 ns | 310.67 ns | 311.89 ns |  1.04 |          Slower | 0.0339 |     536 B |        1.00 |
|              |            |                                                                                                         |      |         |           |          |          |           |           |           |       |                 |        |           |             |
|     GetBytes | Job-FLTCZI |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 1024 |   ascii | 437.89 ns | 0.130 ns | 0.109 ns | 437.90 ns | 437.75 ns | 438.08 ns |  1.00 |            Base | 0.0656 |    1048 B |        1.00 |
|     GetBytes | Job-MHOLRA | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 1024 |   ascii | 437.00 ns | 0.124 ns | 0.103 ns | 437.00 ns | 436.84 ns | 437.19 ns |  1.00 |            Same | 0.0656 |    1048 B |        1.00 |
|              |            |                                                                                                         |      |         |           |          |          |           |           |           |       |                 |        |           |             |
|     GetBytes | Job-FLTCZI |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 1024 |   utf-8 | 547.17 ns | 0.414 ns | 0.346 ns | 547.07 ns | 546.43 ns | 547.58 ns |  1.00 |            Base | 0.0642 |    1048 B |        1.00 |
|     GetBytes | Job-MHOLRA | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 1024 |   utf-8 | 540.51 ns | 0.269 ns | 0.225 ns | 540.60 ns | 540.19 ns | 540.88 ns |  0.99 |            Same | 0.0656 |    1048 B |        1.00 |
|              |            |                                                                                                         |      |         |           |          |          |           |           |           |       |                 |        |           |             |
|     GetBytes | Job-FLTCZI |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 2048 |   ascii | 798.65 ns | 0.297 ns | 0.248 ns | 798.63 ns | 798.19 ns | 799.15 ns |  1.00 |            Base | 0.1291 |    2072 B |        1.00 |
|     GetBytes | Job-MHOLRA | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 2048 |   ascii | 800.34 ns | 0.767 ns | 0.641 ns | 800.44 ns | 799.12 ns | 801.48 ns |  1.00 |            Same | 0.1287 |    2072 B |        1.00 |
|              |            |                                                                                                         |      |         |           |          |          |           |           |           |       |                 |        |           |             |
|     GetBytes | Job-FLTCZI |    /base_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 2048 |   utf-8 | 978.33 ns | 0.203 ns | 0.159 ns | 978.32 ns | 978.11 ns | 978.62 ns |  1.00 |            Base | 0.1299 |    2072 B |        1.00 |
|     GetBytes | Job-MHOLRA | /runtime_src/artifacts/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 2048 |   utf-8 | 976.44 ns | 1.731 ns | 1.352 ns | 975.86 ns | 975.03 ns | 978.60 ns |  1.00 |            Same | 0.1299 |    2072 B |        1.00 |
  • Performance on x86
    The slight slowdown on x86 can be attributed to the load of asciiMaskForTestZ from memory instead of from a register in the loop.
    There is also unaligned loads instead of aligned ones while loading data chunks.

HEAD

G_M26287_IG06:
       vmovdqa  xmm1, xmmword ptr [rbx]
       vmovdqa  xmm2, xmmword ptr [rbx+16]
       vpor     xmm3, xmm1, xmm2
       vptest   xmm3, xmm0
       jne      SHORT G_M26287_IG12
       add      rbx, 32
       cmp      rbx, rax
       jbe      SHORT G_M26287_IG06
						            ;; size=29 bbWeight=4    PerfScore 55.33
...
RWD00  	dq	FF80FF80FF80FF80h, FF80FF80FF80FF80h

Patch

G_M21315_IG06:
       vmovdqu  xmm0, xmmword ptr [rbx]
       vmovdqu  xmm2, xmmword ptr [rbx+16]
       vpor     xmm3, xmm0, xmm2
       vptest   xmm3, xmmword ptr [reloc @RWD00]
       jne      SHORT G_M21315_IG12
       add      rbx, 32
       cmp      rbx, rax
       jbe      SHORT G_M21315_IG06
						            ;; size=33 bbWeight=4    PerfScore 63.33
...
RWD00  	dq	FF80FF80FF80FF80h, FF80FF80FF80FF80h

Full assembly dump for HEAD and the patch.

For the patch, the mask is already loaded in the previous basic block and available in xmm1 but explicitly loaded in the loop. Not sure if the JIT has missed it.

  • Performance on Arm64
    Having an existing version (in HEAD) using SIMD instructions, the performance improvements are small.
    Full assembly dump for HEAD and the patch.

We can improve performance on Arm64 further using pairwise load with post-index increment support in
the vector API. When emitted a pairwise load explicitly using the following sequence

if (Sse2.IsSupported)
{
    firstVector = Vector128.LoadUnsafe(ref *(ushort*)pBuffer);
    secondVector = Vector128.LoadUnsafe(ref *(ushort*)pBuffer, SizeOfVector128InChars);
}
else
{
    (firstVector, secondVector) = AdvSimd.Arm64.LoadPairVector128((ushort*)pBuffer);
}

we get a suboptimal sequence with unnecessary load/stores from/to stack.

ldp     q16, q17, [x19]
str     q16, [fp,#16]
str     q17, [fp,#32]
ldr     q16, [fp,#16]
ldr     q17, [fp,#32]

Full assembly here.

@kunalspathak
Copy link
Member

@tannergooding

@kunalspathak
Copy link
Member

the mask is already loaded in the previous basic block and available in xmm1 but explicitly loaded in the loop

I see it being reused.

x64 Assembly code
; Assembly listing for method System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiChar_Intrinsifed(long,long):long
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 15 single block inlinees; 16 inlinees without PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] ( 34, 35.50)    long  ->  rsi        
;  V01 arg1         [V01,T01] ( 19, 11   )    long  ->  rdi        
;* V02 loc0         [V02,T19] (  0,  0   )     int  ->  zero-ref   
;* V03 loc1         [V03,T20] (  0,  0   )     int  ->  zero-ref   
;  V04 loc2         [V04,T21] ( 11, 12.50)  simd16  ->  mm6        
;  V05 loc3         [V05,T22] (  3,  8.50)  simd16  ->  mm8        
;  V06 loc4         [V06,T06] (  5,  2.50)     int  ->  rdi        
;  V07 loc5         [V07,T05] (  8,  4   )    long  ->  rbx        
;  V08 loc6         [V08,T07] (  4,  2   )    long  ->  r14        
;* V09 loc7         [V09    ] (  0,  0   )    long  ->  zero-ref   
;  V10 loc8         [V10,T10] (  3,  1.50)     int  ->  rcx        
;  V11 loc9         [V11,T04] (  2,  4.50)    long  ->  rcx        
;* V12 loc10        [V12    ] (  0,  0   )  simd16  ->  zero-ref   
;* V13 loc11        [V13    ] (  0,  0   )  simd16  ->  zero-ref   
;* V14 loc12        [V14    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;* V15 loc13        [V15    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;* V16 loc14        [V16    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;  V17 loc15        [V17,T08] (  4,  2   )    long  ->  rcx        
;  V18 OutArgs      [V18    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;* V19 tmp1         [V19    ] (  0,  0   )     int  ->  zero-ref   
;  V20 tmp2         [V20,T11] (  2,  1   )     int  ->  rcx        
;* V21 tmp3         [V21    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V22 tmp4         [V22    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V23 tmp5         [V23    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V24 tmp6         [V24,T15] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V25 tmp7         [V25    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V26 tmp8         [V26    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V27 tmp9         [V27    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V28 tmp10        [V28    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V29 tmp11        [V29    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V30 tmp12        [V30    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V31 tmp13        [V31,T02] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V32 tmp14        [V32    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V33 tmp15        [V33    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V34 tmp16        [V34,T16] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V35 tmp17        [V35    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V36 tmp18        [V36    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V37 tmp19        [V37,T17] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V38 tmp20        [V38    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V39 tmp21        [V39    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V40 tmp22        [V40    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V41 tmp23        [V41    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V42 tmp24        [V42,T18] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V43 tmp25        [V43    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V44 tmp26        [V44    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V45 tmp27        [V45    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V46 tmp28        [V46    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V47 tmp29        [V47    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
;* V48 tmp30        [V48    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V49 tmp31        [V49    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V50 tmp32        [V50    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V51 tmp33        [V51    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
;  V52 cse0         [V52,T12] (  3,  1.50)    long  ->  rcx         "CSE - moderate"
;  V53 cse1         [V53,T03] ( 11,  5.50)     ref  ->  rdx         "CSE - aggressive"
;  V54 cse2         [V54,T09] (  4,  2   )    long  ->  rbp         "CSE - moderate"
;  V55 cse3         [V55,T13] (  3,  1.50)    long  ->  rsi         "CSE - moderate"
;  V56 cse4         [V56,T14] (  2,  1   )     ref  ->  rcx         "CSE - moderate"
;  V57 cse5         [V57,T23] (  6,  6.50)  simd16  ->  mm7         "CSE - aggressive"
;
; Lcl frame size = 80

G_M56429_IG01:
       push     r14
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 80
       vzeroupper 
       vmovaps  qword ptr [rsp+40H], xmm6
       vmovaps  qword ptr [rsp+30H], xmm7
       vmovaps  qword ptr [rsp+20H], xmm8
       mov      rsi, rcx
       mov      rdi, rdx
						;; size=37 bbWeight=1    PerfScore 12.75
G_M56429_IG02:
       test     rdi, rdi
       jne      SHORT G_M56429_IG05
						;; size=5 bbWeight=1    PerfScore 1.25
G_M56429_IG03:
       xor      eax, eax
						;; size=2 bbWeight=0.50 PerfScore 0.12
G_M56429_IG04:
       vmovaps  xmm6, qword ptr [rsp+40H]
       vmovaps  xmm7, qword ptr [rsp+30H]
       vmovaps  xmm8, qword ptr [rsp+20H]
       add      rsp, 80
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r14
       ret      
						;; size=29 bbWeight=0.50 PerfScore 7.88
G_M56429_IG05:
       mov      rbx, rsi
       cmp      rdi, 8
       jb       G_M56429_IG21
       test     rdi, rdi
       jge      SHORT G_M56429_IG06
       mov      rcx, 0xD1FFAB1E      ; ""
       mov      rdx, gword ptr [rcx]
       mov      rcx, rdx
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=40 bbWeight=0.50 PerfScore 4.12
G_M56429_IG06:
       vmovdqu  xmm6, xmmword ptr [rbx]
       vmovupd  xmm7, xmmword ptr [reloc @RWD00]
       vptest   xmm6, xmm7
       jne      G_M56429_IG18
       add      rdi, rdi
       cmp      rdi, 32
       jb       G_M56429_IG12
       lea      rsi, [rbx+16]
       and      rsi, -16
       mov      rbp, rsi
       sub      rbp, rbx
       mov      r14, rbp
       shr      r14, 63
       add      r14, rbp
       sar      r14, 1
       test     r14, r14
       jle      SHORT G_M56429_IG07
       xor      ecx, ecx
       cmp      r14, 8
       setle    cl
       test     cl, cl
       jne      SHORT G_M56429_IG08
						;; size=81 bbWeight=0.50 PerfScore 9.62
G_M56429_IG07:
       mov      rcx, 0xD1FFAB1E      ; "We should've made forward progress of at least one char."
       mov      rcx, gword ptr [rcx]
       mov      rdx, 0xD1FFAB1E      ; ""
       mov      rdx, gword ptr [rdx]
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=32 bbWeight=0.50 PerfScore 3.75
G_M56429_IG08:
       cmp      r14, rdi
       jbe      SHORT G_M56429_IG09
       mov      rcx, 0xD1FFAB1E      ; "We shouldn't have read past the end of the input buffer."
       mov      rcx, gword ptr [rcx]
       mov      rdx, 0xD1FFAB1E      ; ""
       mov      rdx, gword ptr [rdx]
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=37 bbWeight=0.50 PerfScore 4.38
G_M56429_IG09:
       sub      rdi, rbp
       cmp      rdi, 32
       jb       SHORT G_M56429_IG11
       lea      rcx, [rsi+rdi]
       sub      rcx, 32
       align    [0 bytes for IG10]
						;; size=17 bbWeight=0.50 PerfScore 1.12
G_M56429_IG10:
       vmovdqu  xmm6, xmmword ptr [rsi]
       vmovdqu  xmm8, xmmword ptr [rsi+16]
       vpor     xmm0, xmm6, xmm8
       vptest   xmm0, xmm7
       jne      G_M56429_IG17
       add      rsi, 32
       cmp      rsi, rcx
       jbe      SHORT G_M56429_IG10
						;; size=34 bbWeight=4    PerfScore 55.33
G_M56429_IG11:
       test     dil, 16
       je       SHORT G_M56429_IG13
       vmovdqu  xmm6, xmmword ptr [rsi]
       vptest   xmm6, xmm7
       jne      G_M56429_IG18
						;; size=21 bbWeight=0.50 PerfScore 4.62
G_M56429_IG12:
       add      rsi, 16
						;; size=4 bbWeight=0.50 PerfScore 0.12
G_M56429_IG13:
       movzx    rcx, dil
       test     cl, 15
       je       SHORT G_M56429_IG14
       mov      rcx, rdi
       and      rcx, 15
       add      rcx, rsi
       mov      rsi, rcx
       sub      rsi, 16
       vmovdqu  xmm6, xmmword ptr [rsi]
       vptest   xmm6, xmm7
       jne      SHORT G_M56429_IG18
       add      rsi, 16
						;; size=41 bbWeight=0.50 PerfScore 5.50
G_M56429_IG14:
       sub      rsi, rbx
       test     sil, 1
       je       SHORT G_M56429_IG15
       mov      rcx, 0xD1FFAB1E      ; "Shouldn't have incremented any pointer by an odd byte count."
       mov      rcx, gword ptr [rcx]
       mov      rdx, 0xD1FFAB1E      ; ""
       mov      rdx, gword ptr [rdx]
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=41 bbWeight=0.50 PerfScore 4.50
G_M56429_IG15:
       mov      rax, rsi
       shr      rax, 1
						;; size=6 bbWeight=0.50 PerfScore 0.38
G_M56429_IG16:
       vmovaps  xmm6, qword ptr [rsp+40H]
       vmovaps  xmm7, qword ptr [rsp+30H]
       vmovaps  xmm8, qword ptr [rsp+20H]
       add      rsp, 80
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r14
       ret      
						;; size=29 bbWeight=0.50 PerfScore 7.88
G_M56429_IG17:
       vptest   xmm6, xmm7
       jne      SHORT G_M56429_IG18
       add      rsi, 16
       vmovaps  xmm6, xmm8
						;; size=16 bbWeight=0.50 PerfScore 2.25
G_M56429_IG18:
       vpaddusw xmm0, xmm6, xmmword ptr [reloc @RWD16]
       vpmovmskb edi, xmm0
       and      edi, 0xAAAA
       jne      SHORT G_M56429_IG19
       mov      rcx, 0xD1FFAB1E      ; "Shouldn't be here unless we see non-ASCII data."
       mov      rcx, gword ptr [rcx]
       mov      rdx, 0xD1FFAB1E      ; ""
       mov      rdx, gword ptr [rdx]
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=52 bbWeight=0.50 PerfScore 6.38
G_M56429_IG19:
       xor      ecx, ecx
       tzcnt    ecx, edi
       lea      rsi, [rsi+rcx-1]
       jmp      G_M56429_IG14
						;; size=16 bbWeight=0.50 PerfScore 2.62
G_M56429_IG20:
       call     [System.Text.ASCIIUtility:FirstCharInUInt32IsAscii(int):bool]
       test     eax, eax
       je       G_M56429_IG14
       add      rsi, 2
       jmp      G_M56429_IG14
						;; size=23 bbWeight=0.50 PerfScore 3.25
G_M56429_IG21:
       test     dil, 4
       je       SHORT G_M56429_IG23
       mov      rcx, qword ptr [rbx]
       mov      rax, 0xD1FFAB1E
       and      rcx, rax
       je       SHORT G_M56429_IG22
       tzcnt    rcx, rcx
       sar      ecx, 3
       movsxd   rsi, ecx
       and      rsi, -2
       add      rsi, rbx
       jmp      G_M56429_IG14
						;; size=47 bbWeight=0.50 PerfScore 5.00
G_M56429_IG22:
       lea      rsi, [rbx+8]
						;; size=4 bbWeight=0.50 PerfScore 0.25
G_M56429_IG23:
       test     dil, 2
       je       SHORT G_M56429_IG24
       mov      ecx, dword ptr [rsi]
       test     ecx, 0xD1FFAB1E
       jne      SHORT G_M56429_IG20
       add      rsi, 4
						;; size=20 bbWeight=0.50 PerfScore 2.38
G_M56429_IG24:
       test     dil, 1
       je       G_M56429_IG14
       cmp      word  ptr [rsi], 127
       ja       G_M56429_IG14
       add      rsi, 2
       jmp      G_M56429_IG14
						;; size=29 bbWeight=0.50 PerfScore 3.75
RWD00  	dq	FF80FF80FF80FF80h, FF80FF80FF80FF80h
RWD16  	dq	7F807F807F807F80h, 7F807F807F807F80h


; Total bytes of code 663, prolog size 37, PerfScore 217.21, instruction count 167, allocated bytes for code 680 (MethodHash=9f6a2392) for method System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiChar_Intrinsifed(long,long):long
; ============================================================

@SwapnilGaikwad
Copy link
Contributor Author

I see it being reused.

I see it too when I explicitly disable the asserts. However, see slightly different sequence depending on how I execute it.

vmovdqu  xmm0, xmmword ptr [rbx]
vmovdqu  xmm2, xmmword ptr [rbx+16]
vpor     xmm3, xmm0, xmm2
vptest   xmm3, xmm1
jne ...
x64 with explicitly removed asserts
; Assembly listing for method System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiChar_Intrinsifed(long,long):long
  ; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
  ; Tier-1 compilation
  ; optimized code
  ; rbp based frame
  ; fully interruptible
  ; No PGO data
  ; 2 inlinees with PGO data; 4 single block inlinees; 5 inlinees without PGO data
  ; Final local variable assignments
  ;
  ;  V00 arg0         [V00,T00] ( 34, 35.50)    long  ->  rbx
  ;  V01 arg1         [V01,T01] ( 17, 10   )    long  ->  rsi
  ;* V02 loc0         [V02,T13] (  0,  0   )     int  ->  zero-ref
  ;* V03 loc1         [V03,T14] (  0,  0   )     int  ->  zero-ref
  ;  V04 loc2         [V04,T15] ( 11, 12.50)  simd16  ->  mm0
  ;  V05 loc3         [V05,T16] (  3,  8.50)  simd16  ->  mm2
  ;  V06 loc4         [V06,T05] (  4,  2   )     int  ->  rsi
  ;  V07 loc5         [V07,T04] (  8,  4   )    long  ->  r14
  ;* V08 loc6         [V08    ] (  0,  0   )    long  ->  zero-ref
  ;* V09 loc7         [V09    ] (  0,  0   )    long  ->  zero-ref
  ;  V10 loc8         [V10,T07] (  3,  1.50)     int  ->  rdi
  ;  V11 loc9         [V11,T03] (  2,  4.50)    long  ->  rax
  ;* V12 loc10        [V12    ] (  0,  0   )  simd16  ->  zero-ref
  ;* V13 loc11        [V13    ] (  0,  0   )  simd16  ->  zero-ref
  ;* V14 loc12        [V14    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
  ;* V15 loc13        [V15    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
  ;* V16 loc14        [V16    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
  ;  V17 loc15        [V17,T06] (  4,  2   )    long  ->  rdi
  ;# V18 OutArgs      [V18    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
  ;* V19 tmp1         [V19,T09] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
  ;* V20 tmp2         [V20    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
  ;* V21 tmp3         [V21    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
  ;* V22 tmp4         [V22,T02] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
  ;* V23 tmp5         [V23    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
  ;* V24 tmp6         [V24    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
  ;* V25 tmp7         [V25,T10] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
  ;* V26 tmp8         [V26    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
  ;* V27 tmp9         [V27    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
  ;* V28 tmp10        [V28,T11] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
  ;* V29 tmp11        [V29    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
  ;* V30 tmp12        [V30    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
  ;* V31 tmp13        [V31,T12] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
  ;* V32 tmp14        [V32    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
  ;* V33 tmp15        [V33    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
  ;* V34 tmp16        [V34    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
  ;* V35 tmp17        [V35    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
  ;  V36 cse0         [V36,T08] (  3,  1.50)    long  ->  rdi         "CSE - moderate"
  ;  V37 cse1         [V37,T17] (  6,  6.50)  simd16  ->  mm1         "CSE - aggressive"
  ;
  ; Lcl frame size = 0

  G_M56429_IG01:
         push     rbp
         push     r14
         push     rbx
         vzeroupper
         lea      rbp, [rsp+10H]
         mov      rbx, rdi
  						;; size=15 bbWeight=1    PerfScore 4.75
  G_M56429_IG02:
         test     rsi, rsi
         jne      SHORT G_M56429_IG05
  						;; size=5 bbWeight=1    PerfScore 1.25
  G_M56429_IG03:
         xor      eax, eax
  						;; size=2 bbWeight=0.50 PerfScore 0.12
  G_M56429_IG04:
         pop      rbx
         pop      r14
         pop      rbp
         ret
  						;; size=5 bbWeight=0.50 PerfScore 1.25
  G_M56429_IG05:
         mov      r14, rbx
         cmp      rsi, 8
         jb       G_M56429_IG15
         vmovdqu  xmm0, xmmword ptr [r14]
         vmovupd  xmm1, xmmword ptr [reloc @RWD00]
         vptest   xmm0, xmm1
         jne      G_M56429_IG13
         add      rsi, rsi
         cmp      rsi, 32
         jb       SHORT G_M56429_IG08
         lea      rbx, [r14+16]
         and      rbx, -16
         mov      rax, rbx
         sub      rax, r14
         sub      rsi, rax
         cmp      rsi, 32
         jb       SHORT G_M56429_IG07
         lea      rax, [rbx+rsi]
         sub      rax, 32
         align    [7 bytes for IG06]
  						;; size=85 bbWeight=0.50 PerfScore 8.88
  G_M56429_IG06:
         vmovdqu  xmm0, xmmword ptr [rbx]
         vmovdqu  xmm2, xmmword ptr [rbx+16]
         vpor     xmm3, xmm0, xmm2
         vptest   xmm3, xmm1
         jne      SHORT G_M56429_IG12
         add      rbx, 32
         cmp      rbx, rax
         jbe      SHORT G_M56429_IG06
  						;; size=29 bbWeight=4    PerfScore 55.33
  G_M56429_IG07:
         test     sil, 16
         je       SHORT G_M56429_IG09
         vmovdqu  xmm0, xmmword ptr [rbx]
         vptest   xmm0, xmm1
         jne      SHORT G_M56429_IG13
  						;; size=17 bbWeight=0.50 PerfScore 4.62
  G_M56429_IG08:
         add      rbx, 16
  						;; size=4 bbWeight=0.50 PerfScore 0.12
  G_M56429_IG09:
         movzx    rax, sil
         test     al, 15
         je       SHORT G_M56429_IG10
         mov      rax, rsi
         and      rax, 15
         add      rax, rbx
         mov      rbx, rax
         sub      rbx, 16
         vmovdqu  xmm0, xmmword ptr [rbx]
         vptest   xmm0, xmm1
         jne      SHORT G_M56429_IG13
         add      rbx, 16
  						;; size=40 bbWeight=0.50 PerfScore 5.50
  G_M56429_IG10:
         mov      rax, rbx
         sub      rax, r14
         shr      rax, 1
  						;; size=9 bbWeight=0.50 PerfScore 0.50
  G_M56429_IG11:
         pop      rbx
         pop      r14
         pop      rbp
         ret
  						;; size=5 bbWeight=0.50 PerfScore 1.25
  G_M56429_IG12:
         vptest   xmm0, xmm1
         jne      SHORT G_M56429_IG13
         add      rbx, 16
         vmovaps  xmm0, xmm2
  						;; size=15 bbWeight=0.50 PerfScore 2.25
  G_M56429_IG13:
         vpaddusw xmm0, xmm0, xmmword ptr [reloc @RWD16]
         vpmovmskb esi, xmm0
         and      esi, 0xAAAA
         xor      edi, edi
         tzcnt    edi, esi
         lea      rbx, [rbx+rdi-1]
         jmp      SHORT G_M56429_IG10
  						;; size=31 bbWeight=0.50 PerfScore 4.75
  G_M56429_IG14:
         call     [System.Text.ASCIIUtility:FirstCharInUInt32IsAscii(int):bool]
         test     eax, eax
         je       SHORT G_M56429_IG10
         add      rbx, 2
         jmp      SHORT G_M56429_IG10
  						;; size=16 bbWeight=0.50 PerfScore 3.25
  G_M56429_IG15:
         test     sil, 4
         je       SHORT G_M56429_IG18
         mov      rdi, qword ptr [r14]
         mov      rax, 0xD1FFAB1E
         and      rdi, rax
         je       SHORT G_M56429_IG17
  						;; size=24 bbWeight=0.50 PerfScore 2.38
  G_M56429_IG16:
         xor      esi, esi
         tzcnt    rsi, rdi
         mov      edi, esi
         sar      edi, 3
         movsxd   rbx, edi
         and      rbx, -2
         add      rbx, r14
         jmp      SHORT G_M56429_IG10
  						;; size=24 bbWeight=0.50 PerfScore 2.88
  G_M56429_IG17:
         lea      rbx, [r14+8]
  						;; size=4 bbWeight=0.50 PerfScore 0.25
  G_M56429_IG18:
         test     sil, 2
         je       SHORT G_M56429_IG19
         mov      edi, dword ptr [rbx]
         test     edi, 0xD1FFAB1E
         jne      SHORT G_M56429_IG14
         add      rbx, 4
  						;; size=20 bbWeight=0.50 PerfScore 2.38
  G_M56429_IG19:
         test     sil, 1
         je       G_M56429_IG10
         cmp      word  ptr [rbx], 127
         ja       G_M56429_IG10
         add      rbx, 2
         jmp      G_M56429_IG10
  						;; size=29 bbWeight=0.50 PerfScore 3.75
  RWD00  	dq	FF80FF80FF80FF80h, FF80FF80FF80FF80h
  RWD16  	dq	7F807F807F807F80h, 7F807F807F807F80h


  ; Total bytes of code 379, prolog size 15, PerfScore 144.16, instruction count 109, allocated bytes for code 387 (MethodHash=9f6a2392) for method System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiChar_Intrinsifed(long,long):long
  ; ============================================================

With asserts, the mask is spilled on the stack.

vmovdqu  xmm0, xmmword ptr [rbx]
vmovdqu  xmm2, xmmword ptr [rbx+16]
vpor     xmm3, xmm0, xmm2
vmovapd  xmm1, xmmword ptr [rbp-40H]
jne ...
x64 with asserts
; Assembly listing for method System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiChar_Intrinsifed(long,long):long
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; Tier-1 compilation
; optimized code
; rbp based frame
; fully interruptible
; No PGO data
; 2 inlinees with PGO data; 15 single block inlinees; 14 inlinees without PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] ( 34, 35.50)    long  ->  rbx
;  V01 arg1         [V01,T01] ( 19, 11   )    long  ->  r14
;* V02 loc0         [V02,T19] (  0,  0   )     int  ->  zero-ref
;* V03 loc1         [V03,T20] (  0,  0   )     int  ->  zero-ref
;  V04 loc2         [V04,T21] ( 11, 12.50)  simd16  ->  mm0
;  V05 loc3         [V05,T22] (  3,  8.50)  simd16  ->  mm2
;  V06 loc4         [V06,T06] (  5,  2.50)     int  ->  r14
;  V07 loc5         [V07,T05] (  8,  4   )    long  ->  r15
;  V08 loc6         [V08,T07] (  4,  2   )    long  ->  r13
;* V09 loc7         [V09    ] (  0,  0   )    long  ->  zero-ref
;  V10 loc8         [V10,T10] (  3,  1.50)     int  ->  rdi
;  V11 loc9         [V11,T04] (  2,  4.50)    long  ->  rdi
;* V12 loc10        [V12    ] (  0,  0   )  simd16  ->  zero-ref
;* V13 loc11        [V13    ] (  0,  0   )  simd16  ->  zero-ref
;* V14 loc12        [V14    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;* V15 loc13        [V15    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;* V16 loc14        [V16    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;  V17 loc15        [V17,T08] (  4,  2   )    long  ->  rdi
;# V18 OutArgs      [V18    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
;* V19 tmp1         [V19    ] (  0,  0   )     int  ->  zero-ref
;  V20 tmp2         [V20,T11] (  2,  1   )     int  ->  rdi
;* V21 tmp3         [V21    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V22 tmp4         [V22    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V23 tmp5         [V23    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V24 tmp6         [V24,T15] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V25 tmp7         [V25    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V26 tmp8         [V26    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V27 tmp9         [V27    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V28 tmp10        [V28    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V29 tmp11        [V29    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V30 tmp12        [V30    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V31 tmp13        [V31,T02] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V32 tmp14        [V32    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V33 tmp15        [V33    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V34 tmp16        [V34,T16] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V35 tmp17        [V35    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V36 tmp18        [V36    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V37 tmp19        [V37,T17] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V38 tmp20        [V38    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V39 tmp21        [V39    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V40 tmp22        [V40    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V41 tmp23        [V41    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V42 tmp24        [V42,T18] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V43 tmp25        [V43    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V44 tmp26        [V44    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V45 tmp27        [V45    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V46 tmp28        [V46    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V47 tmp29        [V47    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
;* V48 tmp30        [V48    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V49 tmp31        [V49    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V50 tmp32        [V50    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V51 tmp33        [V51    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
;  V52 cse0         [V52,T12] (  3,  1.50)    long  ->  rdi         "CSE - moderate"
;  V53 cse1         [V53,T03] ( 11,  5.50)     ref  ->  rsi         "CSE - aggressive"
;  V54 cse2         [V54,T09] (  4,  2   )    long  ->  r12         "CSE - moderate"
;  V55 cse3         [V55,T13] (  3,  1.50)    long  ->  rbx         "CSE - moderate"
;  V56 cse4         [V56,T14] (  2,  1   )     ref  ->  rdi         "CSE - moderate"
;  V57 cse5         [V57,T23] (  6,  6.50)  simd16  ->  [rbp-40H]   spill-single-def "CSE - aggressive"
;
; Lcl frame size = 24

G_M56429_IG01:
       push     rbp
       push     r15
       push     r14
       push     r13
       push     r12
       push     rbx
       sub      rsp, 24
       vzeroupper
       lea      rbp, [rsp+40H]
       mov      rbx, rdi
       mov      r14, rsi
						;; size=28 bbWeight=1    PerfScore 8.25
G_M56429_IG02:
       test     r14, r14
       jne      SHORT G_M56429_IG05
						;; size=5 bbWeight=1    PerfScore 1.25
G_M56429_IG03:
       xor      eax, eax
						;; size=2 bbWeight=0.50 PerfScore 0.12
G_M56429_IG04:
       add      rsp, 24
       pop      rbx
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       pop      rbp
       ret
						;; size=15 bbWeight=0.50 PerfScore 2.12
G_M56429_IG05:
       mov      r15, rbx
       cmp      r14, 8
       jb       G_M56429_IG23
       test     r14, r14
       jge      SHORT G_M56429_IG06
       mov      rdi, 0xD1FFAB1E      ; string handle
       mov      rsi, gword ptr [rdi]
       mov      rdi, rsi
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=40 bbWeight=0.50 PerfScore 4.12
G_M56429_IG06:
       vmovdqu  xmm0, xmmword ptr [r15]
       vmovupd  xmm1, xmmword ptr [reloc @RWD00]
       vmovapd  xmmword ptr [rbp-40H], xmm1
       vptest   xmm0, xmm1
       jne      G_M56429_IG20
       add      r14, r14
       cmp      r14, 32
       jb       G_M56429_IG12
       lea      rbx, [r15+16]
       and      rbx, -16
       mov      r12, rbx
       sub      r12, r15
       mov      r13, r12
       shr      r13, 63
       add      r13, r12
       sar      r13, 1
       test     r13, r13
       jle      SHORT G_M56429_IG07
       xor      edi, edi
       cmp      r13, 8
       setle    dil
       test     dil, dil
       jne      SHORT G_M56429_IG08
						;; size=89 bbWeight=0.50 PerfScore 10.12
G_M56429_IG07:
       mov      rdi, 0xD1FFAB1E      ; string handle
       mov      rdi, gword ptr [rdi]
       mov      rsi, 0xD1FFAB1E      ; string handle
       mov      rsi, gword ptr [rsi]
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=32 bbWeight=0.50 PerfScore 3.75
G_M56429_IG08:
       cmp      r13, r14
       jbe      SHORT G_M56429_IG09
       mov      rdi, 0xD1FFAB1E      ; string handle
       mov      rdi, gword ptr [rdi]
       mov      rsi, 0xD1FFAB1E      ; string handle
       mov      rsi, gword ptr [rsi]
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=37 bbWeight=0.50 PerfScore 4.38
G_M56429_IG09:
       sub      r14, r12
       cmp      r14, 32
       jb       G_M56429_IG18
       lea      rdi, [rbx+r14]
       sub      rdi, 32
       align    [0 bytes for IG10]
						;; size=21 bbWeight=0.50 PerfScore 1.12
G_M56429_IG10:
       vmovdqu  xmm0, xmmword ptr [rbx]
       vmovdqu  xmm2, xmmword ptr [rbx+16]
       vpor     xmm3, xmm0, xmm2
       vmovapd  xmm1, xmmword ptr [rbp-40H]
       vptest   xmm3, xmm1
       jne      G_M56429_IG19
       add      rbx, 32
       cmp      rbx, rdi
       jbe      G_M56429_IG17
						;; size=42 bbWeight=4    PerfScore 67.33
G_M56429_IG11:
       test     r14b, 16
       je       SHORT G_M56429_IG13
       vmovdqu  xmm0, xmmword ptr [rbx]
       vptest   xmm0, xmm1
       jne      G_M56429_IG20
						;; size=21 bbWeight=0.50 PerfScore 4.62
G_M56429_IG12:
       add      rbx, 16
       vmovapd  xmm1, xmmword ptr [rbp-40H]
						;; size=9 bbWeight=0.50 PerfScore 1.62
G_M56429_IG13:
       movzx    rdi, r14b
       test     dil, 15
       je       SHORT G_M56429_IG14
       mov      rdi, r14
       and      rdi, 15
       add      rdi, rbx
       mov      rbx, rdi
       sub      rbx, 16
       vmovdqu  xmm0, xmmword ptr [rbx]
       vptest   xmm0, xmm1
       jne      SHORT G_M56429_IG20
       add      rbx, 16
						;; size=42 bbWeight=0.50 PerfScore 5.50
G_M56429_IG14:
       sub      rbx, r15
       test     bl, 1
       je       SHORT G_M56429_IG15
       mov      rdi, 0xD1FFAB1E      ; string handle
       mov      rdi, gword ptr [rdi]
       mov      rsi, 0xD1FFAB1E      ; string handle
       mov      rsi, gword ptr [rsi]
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=40 bbWeight=0.50 PerfScore 4.50
G_M56429_IG15:
       mov      rax, rbx
       shr      rax, 1
						;; size=6 bbWeight=0.50 PerfScore 0.38
G_M56429_IG16:
       add      rsp, 24
       pop      rbx
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       pop      rbp
       ret
						;; size=15 bbWeight=0.50 PerfScore 2.12
G_M56429_IG17:
       jmp      G_M56429_IG10
						;; size=5 bbWeight=2    PerfScore 4.00
G_M56429_IG18:
       vmovapd  xmm1, xmmword ptr [rbp-40H]
       jmp      G_M56429_IG11
						;; size=10 bbWeight=0.25 PerfScore 1.25
G_M56429_IG19:
       vptest   xmm0, xmm1
       jne      SHORT G_M56429_IG20
       add      rbx, 16
       vmovaps  xmm0, xmm2
						;; size=15 bbWeight=0.50 PerfScore 2.25
G_M56429_IG20:
       vpaddusw xmm0, xmm0, xmmword ptr [reloc @RWD16]
       vpmovmskb r14d, xmm0
       and      r14d, 0xAAAA
       jne      SHORT G_M56429_IG21
       mov      rdi, 0xD1FFAB1E      ; string handle
       mov      rdi, gword ptr [rdi]
       mov      rsi, 0xD1FFAB1E      ; string handle
       mov      rsi, gword ptr [rsi]
       call     [System.Diagnostics.Debug:Fail(System.String,System.String)]
						;; size=53 bbWeight=0.50 PerfScore 6.38
G_M56429_IG21:
       xor      edi, edi
       tzcnt    edi, r14d
       lea      rbx, [rbx+rdi-1]
       jmp      G_M56429_IG14
						;; size=17 bbWeight=0.50 PerfScore 2.62
G_M56429_IG22:
       call     [System.Text.ASCIIUtility:FirstCharInUInt32IsAscii(int):bool]
       test     eax, eax
       je       G_M56429_IG14
       add      rbx, 2
       jmp      G_M56429_IG14
						;; size=23 bbWeight=0.50 PerfScore 3.25
G_M56429_IG23:
       test     r14b, 4
       je       SHORT G_M56429_IG26
       mov      rdi, qword ptr [r15]
       mov      rax, 0xD1FFAB1E
       and      rdi, rax
       je       SHORT G_M56429_IG25
						;; size=24 bbWeight=0.50 PerfScore 2.38
G_M56429_IG24:
       tzcnt    rdi, rdi
       sar      edi, 3
       movsxd   rbx, edi
       and      rbx, -2
       add      rbx, r15
       jmp      G_M56429_IG14
						;; size=23 bbWeight=0.50 PerfScore 2.63
G_M56429_IG25:
       lea      rbx, [r15+8]
						;; size=4 bbWeight=0.50 PerfScore 0.25
G_M56429_IG26:
       test     r14b, 2
       je       SHORT G_M56429_IG27
       mov      edi, dword ptr [rbx]
       test     edi, 0xD1FFAB1E
       jne      SHORT G_M56429_IG22
       add      rbx, 4
						;; size=20 bbWeight=0.50 PerfScore 2.38
G_M56429_IG27:
       test     r14b, 1
       je       G_M56429_IG14
       cmp      word  ptr [rbx], 127
       ja       G_M56429_IG14
       add      rbx, 2
       jmp      G_M56429_IG14
						;; size=29 bbWeight=0.50 PerfScore 3.75
RWD00  	dq	FF80FF80FF80FF80h, FF80FF80FF80FF80h
RWD16  	dq	7F807F807F807F80h, 7F807F807F807F80h


; Total bytes of code 667, prolog size 28, PerfScore 220.46, instruction count 168, allocated bytes for code 680 (MethodHash=9f6a2392) for method System.Text.ASCIIUtility:GetIndexOfFirstNonAsciiChar_Intrinsifed(long,long):long
; ============================================================

On executing intrinsic as a regular method.

vmovdqu  xmm0, xmmword ptr [rbx]
vmovdqu  xmm2, xmmword ptr [rbx+16]
vpor     xmm3, xmm0, xmm2
vptest   xmm3, xmmword ptr [reloc @RWD00]
jne ...
x64 with intrinsic executed as a regular method
; Assembly listing for method System.Text.Tests.AsciiUtilityTests:GetIndexOfFirstNonAsciiChar_Intrinsifed(long,long):long
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; optimized code
; rbp based frame
; fully interruptible
; No PGO data
; 2 inlinees with PGO data; 9 single block inlinees; 5 inlinees without PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] ( 34, 35.50)    long  ->  rbx
;  V01 arg1         [V01,T01] ( 17, 10   )    long  ->  rsi
;* V02 loc0         [V02,T13] (  0,  0   )     int  ->  zero-ref    single-def
;* V03 loc1         [V03,T14] (  0,  0   )     int  ->  zero-ref    single-def
;  V04 loc2         [V04,T16] ( 11, 12.50)  simd16  ->  mm0
;  V05 loc3         [V05,T17] (  3,  8.50)  simd16  ->  mm2
;  V06 loc4         [V06,T05] (  4,  2   )     int  ->  rsi
;  V07 loc5         [V07,T04] (  8,  4   )    long  ->  r14         single-def
;* V08 loc6         [V08    ] (  0,  0   )    long  ->  zero-ref
;  V09 loc7         [V09,T07] (  3,  1.50)     int  ->  rdi
;  V10 loc8         [V10,T03] (  2,  4.50)    long  ->  rax         single-def
;* V11 loc9         [V11    ] (  0,  0   )  simd16  ->  zero-ref
;* V12 loc10        [V12    ] (  0,  0   )  simd16  ->  zero-ref
;* V13 loc11        [V13    ] (  0,  0   )    long  ->  zero-ref
;  V14 loc12        [V14,T06] (  4,  2   )    long  ->  rdi
;* V15 loc13        [V15    ] (  0,  0   )     int  ->  zero-ref
;# V16 OutArgs      [V16    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
;* V17 tmp1         [V17,T09] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V18 tmp2         [V18    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V19 tmp3         [V19    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V20 tmp4         [V20,T02] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V21 tmp5         [V21,T19] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;  V22 tmp6         [V22,T15] (  2, 16   )  simd16  ->  mm3         "Inlining Arg"
;* V23 tmp7         [V23,T10] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V24 tmp8         [V24    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V25 tmp9         [V25    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V26 tmp10        [V26,T11] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V27 tmp11        [V27    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V28 tmp12        [V28    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V29 tmp13        [V29,T12] (  0,  0   )    bool  ->  zero-ref    "Inline return value spill temp"
;* V30 tmp14        [V30,T20] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
;* V31 tmp15        [V31    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg"
;* V32 tmp16        [V32    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
;* V33 tmp17        [V33    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
;* V34 tmp18        [V34    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;* V35 tmp19        [V35    ] (  0,  0   )     int  ->  zero-ref    "Inlining Arg"
;  V36 cse0         [V36,T08] (  3,  1.50)    long  ->  rdi         "CSE - moderate"
;  V37 cse1         [V37,T18] (  4,  2   )  simd16  ->  mm1         "CSE - aggressive"
;
; Lcl frame size = 0

G_M21315_IG01:
       push     rbp
       push     r14
       push     rbx
       vzeroupper
       lea      rbp, [rsp+10H]
       mov      rbx, rdi
						;; size=15 bbWeight=1    PerfScore 4.75
G_M21315_IG02:
       test     rsi, rsi
       jne      SHORT G_M21315_IG05
						;; size=5 bbWeight=1    PerfScore 1.25
G_M21315_IG03:
       xor      eax, eax
						;; size=2 bbWeight=0.50 PerfScore 0.12
G_M21315_IG04:
       pop      rbx
       pop      r14
       pop      rbp
       ret
						;; size=5 bbWeight=0.50 PerfScore 1.25
G_M21315_IG05:
       mov      r14, rbx
       cmp      rsi, 8
       jb       G_M21315_IG15
       vmovdqu  xmm0, xmmword ptr [r14]
       vmovupd  xmm1, xmmword ptr [reloc @RWD00]
       vptest   xmm0, xmm1
       jne      G_M21315_IG13
       add      rsi, rsi
       cmp      rsi, 32
       jb       SHORT G_M21315_IG08
       lea      rbx, [r14+16]
       and      rbx, -16
       mov      rax, rbx
       sub      rax, r14
       sub      rsi, rax
       cmp      rsi, 32
       jb       SHORT G_M21315_IG07
       lea      rax, [rbx+rsi]
       sub      rax, 32
       align    [0 bytes for IG06]
						;; size=77 bbWeight=0.50 PerfScore 8.75
G_M21315_IG06:
       vmovdqu  xmm0, xmmword ptr [rbx]
       vmovdqu  xmm2, xmmword ptr [rbx+16]
       vpor     xmm3, xmm0, xmm2
       vptest   xmm3, xmmword ptr [reloc @RWD00]
       jne      SHORT G_M21315_IG12
       add      rbx, 32
       cmp      rbx, rax
       jbe      SHORT G_M21315_IG06
						;; size=33 bbWeight=4    PerfScore 63.33
G_M21315_IG07:
       test     sil, 16
       je       SHORT G_M21315_IG09
       vmovdqu  xmm0, xmmword ptr [rbx]
       vptest   xmm0, xmm1
       jne      SHORT G_M21315_IG13
						;; size=17 bbWeight=0.50 PerfScore 4.62
G_M21315_IG08:
       add      rbx, 16
						;; size=4 bbWeight=0.50 PerfScore 0.12
G_M21315_IG09:
       movzx    rax, sil
       test     al, 15
       je       SHORT G_M21315_IG10
       mov      rax, rsi
       and      rax, 15
       add      rax, rbx
       mov      rbx, rax
       sub      rbx, 16
       vmovdqu  xmm0, xmmword ptr [rbx]
       vptest   xmm0, xmm1
       jne      SHORT G_M21315_IG13
       add      rbx, 16
						;; size=40 bbWeight=0.50 PerfScore 5.50
G_M21315_IG10:
       mov      rax, rbx
       sub      rax, r14
       shr      rax, 1
						;; size=9 bbWeight=0.50 PerfScore 0.50
G_M21315_IG11:
       pop      rbx
       pop      r14
       pop      rbp
       ret
						;; size=5 bbWeight=0.50 PerfScore 1.25
G_M21315_IG12:
       vptest   xmm0, xmmword ptr [reloc @RWD00]
       jne      SHORT G_M21315_IG13
       add      rbx, 16
       vmovaps  xmm0, xmm2
						;; size=19 bbWeight=0.50 PerfScore 3.25
G_M21315_IG13:
       vpaddusw xmm0, xmm0, xmmword ptr [reloc @RWD16]
       vpmovmskb esi, xmm0
       and      esi, 0xAAAA
       xor      edi, edi
       tzcnt    edi, esi
       lea      rbx, [rbx+rdi-1]
       jmp      SHORT G_M21315_IG10
						;; size=31 bbWeight=0.50 PerfScore 4.75
G_M21315_IG14:
       call     [System.Text.Tests.AsciiUtilityTests:FirstCharInUInt32IsAscii(int):bool]
       test     eax, eax
       je       SHORT G_M21315_IG10
       add      rbx, 2
       jmp      SHORT G_M21315_IG10
						;; size=16 bbWeight=0.50 PerfScore 3.25
G_M21315_IG15:
       test     sil, 4
       je       SHORT G_M21315_IG18
       mov      rdi, qword ptr [r14]
       mov      rax, 0xD1FFAB1E
       and      rdi, rax
       je       SHORT G_M21315_IG17
						;; size=24 bbWeight=0.50 PerfScore 2.38
G_M21315_IG16:
       xor      esi, esi
       tzcnt    rsi, rdi
       mov      edi, esi
       sar      edi, 3
       movsxd   rbx, edi
       and      rbx, -2
       add      rbx, r14
       jmp      SHORT G_M21315_IG10
						;; size=24 bbWeight=0.50 PerfScore 2.88
G_M21315_IG17:
       lea      rbx, [r14+8]
						;; size=4 bbWeight=0.50 PerfScore 0.25
G_M21315_IG18:
       test     sil, 2
       je       SHORT G_M21315_IG19
       mov      edi, dword ptr [rbx]
       test     edi, 0xD1FFAB1E
       jne      SHORT G_M21315_IG14
       add      rbx, 4
						;; size=20 bbWeight=0.50 PerfScore 2.38
G_M21315_IG19:
       test     sil, 1
       je       G_M21315_IG10
       cmp      word  ptr [rbx], 127
       ja       G_M21315_IG10
       add      rbx, 2
       jmp      G_M21315_IG10
						;; size=29 bbWeight=0.50 PerfScore 3.75
RWD00  	dq	FF80FF80FF80FF80h, FF80FF80FF80FF80h
RWD16  	dq	7F807F807F807F80h, 7F807F807F807F80h


; Total bytes of code 379, prolog size 15, PerfScore 153.43, instruction count 109, allocated bytes for code 391 (MethodHash=6043acbc) for method System.Text.Tests.AsciiUtilityTests:GetIndexOfFirstNonAsciiChar_Intrinsifed(long,long):long
; ============================================================

The previous results were from Gold-6152 machine. On Gold-5120, there is no observable performance difference.

|   Method |        Job |                                                                                     Toolchain | size | encName |      Mean |    Error |   StdDev |    Median |       Min |       Max | Ratio | MannWhitney(2%) | RatioSD |  Gen 0 |  Gen 1 | Allocated | Alloc Ratio |
|--------- |----------- |-----------------------------------------------------------------------------------------------|----- |-------- |----------:|---------:|---------:|----------:|----------:|----------:|------:|---------------- |--------:|-------:|-------:|----------:|------------:|
| GetBytes | Job-USYTJH |    /base_src/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |   16 |   ascii |  26.29 ns | 0.186 ns | 0.155 ns |  26.28 ns |  26.01 ns |  26.55 ns |  1.00 |            Base |    0.00 | 0.0040 |      - |      40 B |        1.00 |
| GetBytes | Job-HOIJPE | /runtime_src/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |   16 |   ascii |  25.87 ns | 0.285 ns | 0.238 ns |  25.84 ns |  25.56 ns |  26.36 ns |  0.98 |            Same |    0.01 | 0.0039 |      - |      40 B |        1.00 |
|          |            |                                                                                               |      |         |           |          |          |           |           |           |       |                 |         |        |        |           |             |
| GetBytes | Job-USYTJH |    /base_src/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 |   ascii |  98.00 ns | 0.845 ns | 0.790 ns |  98.00 ns |  96.80 ns |  99.28 ns |  1.00 |            Base |    0.00 | 0.0532 |      - |     536 B |        1.00 |
| GetBytes | Job-HOIJPE | /runtime_src/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun |  512 |   ascii |  98.39 ns | 0.978 ns | 0.915 ns |  98.29 ns |  96.92 ns | 100.15 ns |  1.00 |            Same |    0.01 | 0.0532 |      - |     536 B |        1.00 |
|          |            |                                                                                               |      |         |           |          |          |           |           |           |       |                 |         |        |        |           |             |
| GetBytes | Job-USYTJH |    /base_src/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 1024 |   ascii | 174.19 ns | 1.710 ns | 1.600 ns | 173.95 ns | 172.18 ns | 177.00 ns |  1.00 |            Base |    0.00 | 0.1039 |      - |    1048 B |        1.00 |
| GetBytes | Job-HOIJPE | /runtime_src/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 1024 |   ascii | 174.88 ns | 2.475 ns | 2.194 ns | 174.16 ns | 173.08 ns | 181.16 ns |  1.00 |            Same |    0.02 | 0.1039 |      - |    1048 B |        1.00 |
|          |            |                                                                                               |      |         |           |          |          |           |           |           |       |                 |         |        |        |           |             |
| GetBytes | Job-USYTJH |    /base_src/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 2048 |   ascii | 331.69 ns | 3.808 ns | 3.562 ns | 330.93 ns | 326.70 ns | 339.67 ns |  1.00 |            Base |    0.00 | 0.2051 | 0.0013 |    2072 B |        1.00 |
| GetBytes | Job-HOIJPE | /runtime_src/bin/testhost/net7.0-Linux-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun | 2048 |   ascii | 336.35 ns | 2.095 ns | 1.858 ns | 336.07 ns | 333.80 ns | 340.32 ns |  1.01 |            Same |    0.01 | 0.2048 | 0.0013 |    2072 B |        1.00 |

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will let @tannergooding review and merge this PR.

@tannergooding tannergooding merged commit 85d638b into dotnet:main Jul 7, 2022
@EgorBo
Copy link
Member

EgorBo commented Jul 14, 2022

@SwapnilGaikwad SwapnilGaikwad deleted the github-GetIndexOfFirstNonAsciiChar-intrinsic branch July 15, 2022 08:46
@ghost ghost locked as resolved and limited conversation to collaborators Aug 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Encoding community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize System.Text.ASCIIUtility for arm64 using cross-platform intrinsics
4 participants