Porting additional SIMD Intrinsics to use SimdAsHWIntrinsic #37882

tannergooding · 2020-06-15T03:55:36Z

This ports additional intrinsics such as SIMDIntrinsicInit, SIMDIntrinsicGetOne, and SIMDIntrinsicDot

tannergooding · 2020-06-19T22:10:56Z

CC. @CarolEidt, @echesakovMSFT

tannergooding · 2020-06-19T22:11:21Z

Benchmarks x64

Total bytes of diff: -146 (-0.029% of base)
    diff is an improvement.

Top file improvements (bytes):
         -63 : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm (-0.214% of base)
         -51 : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm (-0.210% of base)
         -32 : diff\BenchmarksGame\mandelbrot\mandelbrot-7\mandelbrot-7.dasm (-1.812% of base)

3 total files with Code Size differences (3 improved, 0 regressed), 79 unchanged.

Top method improvements (bytes):
         -32 (-5.024% of base) : diff\BenchmarksGame\mandelbrot\mandelbrot-7\mandelbrot-7.dasm - MandelBrot_7:DoBench(int,int):ref
         -12 (-1.469% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Camera:Create(Vector,Vector):Camera
         -12 (-3.183% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Sphere:Intersect(Ray):ISect:this
          -9 (-0.532% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass5_0:<RenderMultiThreadedNoADT>b__1(int):this (3 methods)
          -7 (-0.657% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
          -7 (-0.657% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
          -7 (-1.220% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass6_0:<RenderMultiThreadedWithADT>b__1(int):this
          -6 (-0.560% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorFloatRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
          -6 (-0.546% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorFloatRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
          -6 (-1.056% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass4_0:<RenderMultiThreadedNoADT>b__1(int):this
          -6 (-1.047% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass5_0:<RenderMultiThreadedWithADT>b__1(int):this
          -4 (-1.408% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - RayTracer:GetPoint(double,double,Camera):Vector:this
          -4 (-2.899% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Sphere:Normal(Vector):Vector:this
          -4 (-9.524% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:Dot(Vector,Vector):float
          -4 (-10.000% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:Mag(Vector):float
          -4 (-3.509% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:Norm(Vector):Vector
          -4 (-7.018% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:Equals(Vector,Vector):bool
          -3 (-0.275% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - RayTracer:GetNaturalColor(SceneObject,Vector,Vector,Vector,Scene):Color:this
          -2 (-0.176% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass4_0:<RenderMultiThreadedWithADT>b__1(int):this (2 methods)
          -1 (-2.564% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleRenderer:.cctor()

Top method improvements (percentages):
          -4 (-10.000% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:Mag(Vector):float
          -4 (-9.524% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:Dot(Vector,Vector):float
          -4 (-7.018% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:Equals(Vector,Vector):bool
         -32 (-5.024% of base) : diff\BenchmarksGame\mandelbrot\mandelbrot-7\mandelbrot-7.dasm - MandelBrot_7:DoBench(int,int):ref
          -4 (-3.509% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Vector:Norm(Vector):Vector
         -12 (-3.183% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Sphere:Intersect(Ray):ISect:this
          -4 (-2.899% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Sphere:Normal(Vector):Vector:this
          -1 (-2.564% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleRenderer:.cctor()
          -1 (-2.564% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleStrictRenderer:.cctor()
          -1 (-2.564% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorFloatStrictRenderer:.cctor()
         -12 (-1.469% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - Camera:Create(Vector,Vector):Camera
          -4 (-1.408% of base) : diff\SIMD\RayTracer\RayTracer\RayTracer.dasm - RayTracer:GetPoint(double,double,Camera):Vector:this
          -7 (-1.220% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass6_0:<RenderMultiThreadedWithADT>b__1(int):this
          -6 (-1.056% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass4_0:<RenderMultiThreadedNoADT>b__1(int):this
          -6 (-1.047% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass5_0:<RenderMultiThreadedWithADT>b__1(int):this
          -7 (-0.657% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
          -7 (-0.657% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorDoubleRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
          -6 (-0.560% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorFloatRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
          -6 (-0.546% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - VectorFloatRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
          -9 (-0.532% of base) : diff\SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass5_0:<RenderMultiThreadedNoADT>b__1(int):this (3 methods)

Frameworks x64

Found 266 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -616 (-0.001% of base)
    diff is an improvement.

Top file improvements (bytes):
        -547 : diff\System.Private.CoreLib.dasm (-0.012% of base)
         -63 : diff\System.Collections.dasm (-0.014% of base)
          -6 : diff\System.Text.Json.dasm (-0.001% of base)

3 total files with Code Size differences (3 improved, 0 regressed), 260 unchanged.

Top method regressions (bytes):
          43 (860.000% of base) : diff\System.Private.CoreLib.dasm - Vector:Dot(Vector`1,Vector`1):short
          29 (580.000% of base) : diff\System.Private.CoreLib.dasm - Vector:Dot(Vector`1,Vector`1):Vector`1
           2 (0.699% of base) : diff\System.Private.CoreLib.dasm - ASCIIUtility:GetIndexOfFirstNonAsciiChar_Default(long,long):long
           2 (0.697% of base) : diff\System.Private.CoreLib.dasm - Latin1Utility:GetIndexOfFirstNonLatin1Char_Default(long,long):long

Top method improvements (bytes):
        -106 (-23.297% of base) : diff\System.Private.CoreLib.dasm - Vector:Multiply(Vector`1,Vector`1):Vector`1 (8 methods)
         -89 (-9.966% of base) : diff\System.Private.CoreLib.dasm - Vector`1:.cctor() (6 methods)
         -52 (-5.640% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_Multiply(Vector`1,Vector`1):Vector`1 (8 methods)
         -47 (-24.227% of base) : diff\System.Collections.dasm - BitArray:.cctor()
         -24 (-2.067% of base) : diff\System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4
         -24 (-1.489% of base) : diff\System.Private.CoreLib.dasm - Matrix4x4:Decompose(Matrix4x4,byref,byref,byref):bool
         -20 (-18.182% of base) : diff\System.Private.CoreLib.dasm - Vector128`1:get_AllBitsSet():Vector128`1 (6 methods)
         -20 (-16.000% of base) : diff\System.Private.CoreLib.dasm - Vector256`1:get_AllBitsSet():Vector256`1 (6 methods)
         -12 (-0.787% of base) : diff\System.Collections.dasm - BitArray:CopyTo(Array,int):this
          -8 (-1.292% of base) : diff\System.Private.CoreLib.dasm - Matrix4x4:CreateBillboard(Vector3,Vector3,Vector3,Vector3):Matrix4x4
          -8 (-1.235% of base) : diff\System.Private.CoreLib.dasm - Matrix4x4:CreateLookAt(Vector3,Vector3,Vector3):Matrix4x4
          -8 (-1.569% of base) : diff\System.Private.CoreLib.dasm - Matrix4x4:CreateWorld(Vector3,Vector3,Vector3):Matrix4x4
          -8 (-2.817% of base) : diff\System.Private.CoreLib.dasm - Plane:CreateFromVertices(Vector3,Vector3,Vector3):Plane
          -8 (-30.769% of base) : diff\System.Private.CoreLib.dasm - Vector2:Length():float:this
          -8 (-36.364% of base) : diff\System.Private.CoreLib.dasm - Vector2:LengthSquared():float:this
          -8 (-22.222% of base) : diff\System.Private.CoreLib.dasm - Vector2:Distance(Vector2,Vector2):float
          -8 (-25.000% of base) : diff\System.Private.CoreLib.dasm - Vector2:DistanceSquared(Vector2,Vector2):float
          -8 (-14.815% of base) : diff\System.Private.CoreLib.dasm - Vector3:Distance(Vector3,Vector3):float
          -8 (-16.000% of base) : diff\System.Private.CoreLib.dasm - Vector3:DistanceSquared(Vector3,Vector3):float
          -8 (-30.769% of base) : diff\System.Private.CoreLib.dasm - Vector4:Length():float:this

Top method regressions (percentages):
          43 (860.000% of base) : diff\System.Private.CoreLib.dasm - Vector:Dot(Vector`1,Vector`1):short
          29 (580.000% of base) : diff\System.Private.CoreLib.dasm - Vector:Dot(Vector`1,Vector`1):Vector`1
           2 (0.699% of base) : diff\System.Private.CoreLib.dasm - ASCIIUtility:GetIndexOfFirstNonAsciiChar_Default(long,long):long
           2 (0.697% of base) : diff\System.Private.CoreLib.dasm - Latin1Utility:GetIndexOfFirstNonLatin1Char_Default(long,long):long

Top method improvements (percentages):
          -8 (-36.364% of base) : diff\System.Private.CoreLib.dasm - Vector2:LengthSquared():float:this
          -8 (-36.364% of base) : diff\System.Private.CoreLib.dasm - Vector4:LengthSquared():float:this
          -8 (-30.769% of base) : diff\System.Private.CoreLib.dasm - Vector2:Length():float:this
          -8 (-30.769% of base) : diff\System.Private.CoreLib.dasm - Vector4:Length():float:this
          -8 (-30.769% of base) : diff\System.Private.CoreLib.dasm - Vector4:DistanceSquared(Vector4,Vector4):float
          -8 (-26.667% of base) : diff\System.Private.CoreLib.dasm - Vector4:Distance(Vector4,Vector4):float
          -8 (-25.000% of base) : diff\System.Private.CoreLib.dasm - Vector2:DistanceSquared(Vector2,Vector2):float
         -47 (-24.227% of base) : diff\System.Collections.dasm - BitArray:.cctor()
        -106 (-23.297% of base) : diff\System.Private.CoreLib.dasm - Vector:Multiply(Vector`1,Vector`1):Vector`1 (8 methods)
          -8 (-22.222% of base) : diff\System.Private.CoreLib.dasm - Vector2:Distance(Vector2,Vector2):float
         -20 (-18.182% of base) : diff\System.Private.CoreLib.dasm - Vector128`1:get_AllBitsSet():Vector128`1 (6 methods)
          -8 (-16.000% of base) : diff\System.Private.CoreLib.dasm - Vector3:DistanceSquared(Vector3,Vector3):float
         -20 (-16.000% of base) : diff\System.Private.CoreLib.dasm - Vector256`1:get_AllBitsSet():Vector256`1 (6 methods)
          -8 (-14.815% of base) : diff\System.Private.CoreLib.dasm - Vector3:Distance(Vector3,Vector3):float
          -4 (-14.286% of base) : diff\System.Private.CoreLib.dasm - Vector3:LengthSquared():float:this
          -4 (-12.500% of base) : diff\System.Private.CoreLib.dasm - Vector3:Length():float:this
          -4 (-12.121% of base) : diff\System.Private.CoreLib.dasm - Vector:Dot(Vector`1,Vector`1):double
          -4 (-10.526% of base) : diff\System.Private.CoreLib.dasm - Vector2:Equals(Vector2):bool:this
          -4 (-10.526% of base) : diff\System.Private.CoreLib.dasm - Vector4:Normalize(Vector4):Vector4
          -4 (-10.256% of base) : diff\System.Private.CoreLib.dasm - Vector2:op_Inequality(Vector2,Vector2):bool

tannergooding · 2020-06-19T22:11:46Z

Working on getting perf numbers but this should resolve the following perf regression #37425

tannergooding · 2020-06-19T22:15:21Z

Most of the diffs are essentially the following:

-       vmovaps  xmm1, xmm0
-       vdpps    xmm1, xmm0, 113
-       vmovaps  xmm0, xmm1
+       vdpps    xmm0, xmm0, xmm0, 113

There are also a few similar too:

        vmovmskps xrax, xmm0
-       mov      edx, 7
-       and      eax, edx
+       and      eax, 7

Likewise a few that go from:
```diff
-       mov      eax, 0xD1FFAB1E
-       vmovd    xmm0, eax
-       vpbroadcastd ymm0, ymm0
+       vmovupd  ymm0, ymmword ptr[reloc @RWD32]

A more extreme example:

       vxorps   xmm0, xmm0
       vmovdqu  xmmword ptr [rsp+48H], xmm0
       vmovdqu  xmmword ptr [rsp+58H], xmm0
       lea      rcx, bword ptr [rsp+48H]
       call     Vector`1:.ctor(Vector`1):this
       mov      rcx, rdi
       vmovdqu  xmm0, xmmword ptr [rsp+48H]
       vmovdqu  xmmword ptr [rsp+28H], xmm0
       vmovdqu  xmm0, xmmword ptr [rsp+58H]
       vmovdqu  xmmword ptr [rsp+38H], xmm0
       lea      rdx, bword ptr [rsp+28H]
       mov      r8, rsi
       call     Vector`1:op_Multiply(Vector`1,Vector`1):Vector`1
       mov      rax, rdi

To:

       vmovupd  ymm0, ymmword ptr[rdx]
       vbroadcastss ymm0, ymm0
       vmovupd  ymmword ptr[rsp+50H], ymm0
       mov      rcx, rsi
       vmovupd  ymm0, ymmword ptr[rsp+50H]
       vmovupd  ymmword ptr[rsp+20H], ymm0
       lea      rdx, bword ptr [rsp+20H]
       call     Vector`1:op_Multiply(Vector`1,Vector`1):Vector`1
       mov      rax, rsi

The Vector:Dot regressions are because ushort is recognized as an intrinsic on x86, where is wasn't previously.

tannergooding · 2020-06-19T23:04:19Z

src/coreclr/src/jit/simdintrinsiclist.h

-SIMD_INTRINSIC("get_One",                   false,       GetOne,                   "one",                    TYP_STRUCT,     0,      {TYP_VOID, TYP_UNDEF, TYP_UNDEF},      {TYP_INT, TYP_FLOAT, TYP_DOUBLE, TYP_LONG, TYP_USHORT, TYP_UBYTE, TYP_BYTE, TYP_SHORT, TYP_UINT, TYP_ULONG})
-SIMD_INTRINSIC("get_Zero",                  false,       GetZero,                  "zero",                   TYP_STRUCT,     0,      {TYP_VOID, TYP_UNDEF, TYP_UNDEF},      {TYP_INT, TYP_FLOAT, TYP_DOUBLE, TYP_LONG, TYP_USHORT, TYP_UBYTE, TYP_BYTE, TYP_SHORT, TYP_UINT, TYP_ULONG})
-SIMD_INTRINSIC("get_AllOnes",               false,       GetAllOnes,               "allOnes",                TYP_STRUCT,     0,      {TYP_VOID, TYP_UNDEF, TYP_UNDEF},      {TYP_INT, TYP_FLOAT, TYP_DOUBLE, TYP_LONG, TYP_USHORT, TYP_UBYTE, TYP_BYTE, TYP_SHORT, TYP_UINT, TYP_ULONG})
-
 // .ctor call or newobj - there are four forms.
 // This form takes the object plus a value of the base (element) type:
 SIMD_INTRINSIC(".ctor",                     true,        Init,                     "init",                   TYP_VOID,       2,      {TYP_BYREF, TYP_UNKNOWN, TYP_UNDEF},   {TYP_INT, TYP_FLOAT, TYP_DOUBLE, TYP_LONG, TYP_USHORT, TYP_UBYTE, TYP_BYTE, TYP_SHORT, TYP_UINT, TYP_ULONG})


Fully removing SIMDIntrinsicInit and removing gtGetSIMDZero requires a bit more work. I logged #37043 as the more general issue.

I think that, beyond this PR, any other improvements should likely hold off for .NET 6.

tannergooding · 2020-06-23T16:47:59Z

CC. @dotnet/jit-contrib

This should be ready for review.

CarolEidt

Mostly minor comment stuff.

CarolEidt · 2020-07-02T00:03:53Z

src/coreclr/src/jit/hwintrinsicxarch.cpp

+                }
+            }
+
+            if (isSupported && (intrinsic == NI_Vector256_ToScalar))


It seems like it would make more sense to just split out this case (as it was previously), even with a separate check for the long types.

Maybe the confusing part here is that the AVX check isn't checking for some instruction support, it's just a "we support Vector256<T>" check.

The instruction emitted is the same for 128 or 256-bit, it's just that we support Vector256<T> if AVX is supported, so the code really is identical (we even emit the register access as a 128-bit access)

My point is that (AFAICT) the previous calls to compExactlyDependsOn are unnecessary for the NI_Vector256_ToScalar case. So perhaps just checking for that first would make more sense.

That's generally true except for the SSE2_X64 case, as we won't have AVX_X64 until #38460 goes in (although you also can't disable SSE2.X64, you can only disable SSE2 itself).

If duplicated, the compExactlyDependsOn(SSE2) checks would become compExactlyDependsOn(AVX) and the compExactlyDependsOn(SSE2_X64) check would become compExactlyDependsOn(AVX) && compExactlyDependsOn(SSE2_X64). Once #38460 is merged, it could be simplified to just compExactlyDependsOn(AVX_X64)

So is it correct that only the SSE2_X64 check is non-redundant for the AVX case? It still seems confusing to me, and a lot of unnecessary checks for that case.

Right, only the SSE2_X64 case is non redundant. I just pushed a fix that breaks it apart and which should make that distinction clearer.

src/coreclr/src/jit/importer.cpp

src/coreclr/src/jit/lowerarmarch.cpp

src/coreclr/src/jit/lowerxarch.cpp

…get_AllBitsSet

…MDIntrinsicDiv

…AllOnes, and SIMDIntrinsicGetZero

…AsHWIntrinsic

…UBLE and fixing VectorT128_get_One

…tType

Co-authored-by: Carol Eidt <carol.eidt@microsoft.com>

CarolEidt

LGTM - thanks for the comments and code shufflings!

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 15, 2020

tannergooding force-pushed the simd-as-hwintrinsic branch 3 times, most recently from 4044add to e98213d Compare June 17, 2020 19:16

tannergooding closed this Jun 17, 2020

tannergooding reopened this Jun 17, 2020

tannergooding force-pushed the simd-as-hwintrinsic branch from e98213d to defea13 Compare June 17, 2020 20:08

tannergooding closed this Jun 17, 2020

tannergooding reopened this Jun 17, 2020

tannergooding force-pushed the simd-as-hwintrinsic branch from defea13 to 4105ab7 Compare June 17, 2020 20:56

jaredpar mentioned this pull request Jun 18, 2020

OSX machines are de-provisioned during CI / PR runs leading to failures #34472

Closed

tannergooding force-pushed the simd-as-hwintrinsic branch from 8ac8ead to 63ec8f0 Compare June 18, 2020 19:51

tannergooding marked this pull request as ready for review June 19, 2020 22:08

tannergooding force-pushed the simd-as-hwintrinsic branch from bfb7472 to d080102 Compare June 19, 2020 22:10

tannergooding commented Jun 19, 2020

View reviewed changes

tannergooding mentioned this pull request Jun 23, 2020

Code generation of HW intrinsics / loading struct into vector register #31692

Open

jaredpar mentioned this pull request Jun 23, 2020

PayloadGroup0 is timing out #38284

Closed

BruceForstall requested review from CarolEidt and echesakov June 27, 2020 04:10

CarolEidt suggested changes Jul 2, 2020

View reviewed changes

tannergooding added 5 commits July 2, 2020 09:48

Moving SIMDIntrinsicInit to use SimdAsHWIntrinsic

542d9a8

Optimize the simple case of Create(0) and Create(-1) to get_Zero and …

4aa6464

…get_AllBitsSet

Adding a new gtNewSimdCreateBroadcastNode method

6f923ea

Updating SIMDIntrinsicGetOne to use SimdAsHWIntrinsic

38824c6

Fixing build errors

7c00186

tannergooding and others added 18 commits July 2, 2020 09:48

Ensure all forms of ToScalar are intrinsic on x86

2b1e9a7

Moving SIMDIntrinsicDot to use SimdAsHWIntrinsic

e5ec98a

Removing SIMDIntrinsicDot, SIMDIntrinsicAdd, SIMDIntrinsicMul, and SI…

d96174e

…MDIntrinsicDiv

Removing SIMDIntrinsicGetCount, SIMDIntrinsicGetOne, SIMDIntrinsicGet…

987b77e

…AllOnes, and SIMDIntrinsicGetZero

Default TYP_SIMD12 constants to be 16 bytes

7bc2c10

Get the simdType from the size for LowerHWIntrinsicDot

6897bc4

Applying formatting patch

ba35606

Use AddPairwise for floating-point dot product

636b31a

Applying formatting patch

92fc030

Correctly handle decomposed long constants on x86

305d00d

Check JitConfig.EnableHWIntrinsic when in impSIMDIntrinsic or impSimd…

786f61d

…AsHWIntrinsic

Ensure AdvSimd.Arm64.Multiply is used for TYP_DOUBLE

08460ef

Applying formatting patch

290bd56

Updating LowerHWIntrinsicCreate to handle TYP_SIMD8 retyped as TYP_DO…

73700d4

…UBLE and fixing VectorT128_get_One

Ensure the CreateBroadcast SimdAsHWIntrinsic nodes have the correct g…

95ce9c9

…tType

Applying formatting patch

3bdcdf9

Apply suggestions from code review

d3afbdf

Co-authored-by: Carol Eidt <carol.eidt@microsoft.com>

Adding some additional clarifying comments

3ac2b4a

tannergooding force-pushed the simd-as-hwintrinsic branch from 6db4ec3 to 3ac2b4a Compare July 2, 2020 20:00

tannergooding added 2 commits July 2, 2020 14:15

Breaking NI_Vector256_ToScalar importation logic into its own checks

5560f44

Applying formatting patch

a954af2

CarolEidt approved these changes Jul 3, 2020

View reviewed changes

tannergooding merged commit 9df0247 into dotnet:master Jul 3, 2020

saucecontrol mentioned this pull request Jul 20, 2020

AdvSimd support for System.Text.Unicode.Utf8Utility.TranscodeToUtf8 #39041

Merged

BruceForstall mentioned this pull request Jul 29, 2020

[Perf -50%] System.Numerics.Tests.Perf_Vector2.LengthSquaredBenchmark #37425

Closed

tannergooding mentioned this pull request Aug 10, 2020

COMPlus_EnableHWIntrinsic=0 no longer disables SSE+ #35605

Closed

ghost locked as resolved and limited conversation to collaborators Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Porting additional SIMD Intrinsics to use SimdAsHWIntrinsic #37882

Porting additional SIMD Intrinsics to use SimdAsHWIntrinsic #37882

tannergooding commented Jun 15, 2020

tannergooding commented Jun 19, 2020

tannergooding commented Jun 19, 2020

tannergooding commented Jun 19, 2020 •

edited

Loading

tannergooding commented Jun 19, 2020 •

edited

Loading

tannergooding Jun 19, 2020

tannergooding Jun 19, 2020

tannergooding commented Jun 23, 2020

CarolEidt left a comment

CarolEidt Jul 2, 2020

tannergooding Jul 2, 2020

CarolEidt Jul 2, 2020

tannergooding Jul 2, 2020 •

edited

Loading

CarolEidt Jul 2, 2020

tannergooding Jul 2, 2020

CarolEidt left a comment

Porting additional SIMD Intrinsics to use SimdAsHWIntrinsic #37882

Porting additional SIMD Intrinsics to use SimdAsHWIntrinsic #37882

Conversation

tannergooding commented Jun 15, 2020

tannergooding commented Jun 19, 2020

tannergooding commented Jun 19, 2020

tannergooding commented Jun 19, 2020 • edited Loading

tannergooding commented Jun 19, 2020 • edited Loading

tannergooding Jun 19, 2020

Choose a reason for hiding this comment

tannergooding Jun 19, 2020

Choose a reason for hiding this comment

tannergooding commented Jun 23, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

CarolEidt Jul 2, 2020

Choose a reason for hiding this comment

tannergooding Jul 2, 2020

Choose a reason for hiding this comment

CarolEidt Jul 2, 2020

Choose a reason for hiding this comment

tannergooding Jul 2, 2020 • edited Loading

Choose a reason for hiding this comment

CarolEidt Jul 2, 2020

Choose a reason for hiding this comment

tannergooding Jul 2, 2020

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

tannergooding commented Jun 19, 2020 •

edited

Loading

tannergooding commented Jun 19, 2020 •

edited

Loading

tannergooding Jul 2, 2020 •

edited

Loading