Optimize Vector128/256.Equals via TestZ #55875

EgorBo · 2021-07-17T18:35:48Z

This PR optimizes Vector128/256.Equals via the following improvements:

Substitutes Vector_.Zero during inlining
Optimizes Xor(x, zero) to just x in morph
Adds Avx2/Sse41 TestZ path in Vector_.Equals

// Compare two vectors
bool Test128(Vector128<int> v1, Vector128<int> v2) => v1.Equals(v2);
bool Test256(Vector256<int> v1, Vector256<int> v2) => v1.Equals(v2);

// Compare against Zero:
bool Test128(Vector128<int> v1) => v1.Equals(Vector128<int>.Zero);
bool Test256(Vector256<int> v1) => v1.Equals(Vector256<int>.Zero);

Codegen diff: https://www.diffchecker.com/zJhfk9yq
According to benchmarks, it makes it 10-15% faster

jit-diff is quite small:

Total bytes of base: 61261912
Total bytes of diff: 61261864
Total bytes of delta: -48 (-0.00% of base)
Total relative delta: -0.19
    diff is an improvement.
    relative diff is an improvement.


Top file improvements (bytes):
         -48 : System.Private.CoreLib.dasm (-0.00% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 272 unchanged.

Top method improvements (bytes):
         -16 (-10.74% of base) : System.Private.CoreLib.dasm - Vector128`1:Equals(Vector128`1):bool:this (6 methods)
         -16 (-2.48% of base) : System.Private.CoreLib.dasm - Vector128`1:Equals(Object):bool:this (6 methods)
          -8 (-5.06% of base) : System.Private.CoreLib.dasm - Vector256`1:Equals(Vector256`1):bool:this (6 methods)
          -8 (-1.18% of base) : System.Private.CoreLib.dasm - Vector256`1:Equals(Object):bool:this (6 methods)

Top method improvements (percentages):
         -16 (-10.74% of base) : System.Private.CoreLib.dasm - Vector128`1:Equals(Vector128`1):bool:this (6 methods)
          -8 (-5.06% of base) : System.Private.CoreLib.dasm - Vector256`1:Equals(Vector256`1):bool:this (6 methods)
         -16 (-2.48% of base) : System.Private.CoreLib.dasm - Vector128`1:Equals(Object):bool:this (6 methods)
          -8 (-1.18% of base) : System.Private.CoreLib.dasm - Vector256`1:Equals(Object):bool:this (6 methods)

Superpmi:

asm.coreclr_tests.pmi.windows.x64.checked

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 4987
Total bytes of diff: 4976
Total bytes of delta: -11 (-0.22% of base)
Total relative delta: -0.01
    diff is an improvement.
    relative diff is an improvement.


Top file improvements (bytes):
         -11 : 251366.dasm (-0.76% of base)
1 total files with Code Size differences (1 improved, 0 regressed), 4 unchanged.

Top method improvements (bytes):
         -11 (-0.76% of base) : 251366.dasm - GitHub_18144:Main(System.String[]):int

Top method improvements (percentages):
         -11 (-0.76% of base) : 251366.dasm - GitHub_18144:Main(System.String[]):int

1 total methods with Code Size differences (1 improved, 0 regressed), 4 unchanged.



asm.libraries_tests.pmi.windows.x64.checked

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 1558
Total bytes of diff: 1522
Total bytes of delta: -36 (-2.31% of base)
Total relative delta: -0.20
    diff is an improvement.
    relative diff is an improvement.


Top file improvements (bytes):
         -12 : 309432.dasm (-7.02% of base)
         -12 : 309515.dasm (-6.56% of base)
         -12 : 309609.dasm (-6.56% of base)

3 total files with Code Size differences (3 improved, 0 regressed), 1 unchanged.

Top method improvements (bytes):
         -12 (-7.02% of base) : 309432.dasm - System.Numerics.Tests.Vector2Tests:Vector2ZeroTest():this
         -12 (-6.56% of base) : 309515.dasm - System.Numerics.Tests.Vector3Tests:Vector3ZeroTest():this
         -12 (-6.56% of base) : 309609.dasm - System.Numerics.Tests.Vector4Tests:Vector4ZeroTest():this

Top method improvements (percentages):
         -12 (-7.02% of base) : 309432.dasm - System.Numerics.Tests.Vector2Tests:Vector2ZeroTest():this
         -12 (-6.56% of base) : 309515.dasm - System.Numerics.Tests.Vector3Tests:Vector3ZeroTest():this
         -12 (-6.56% of base) : 309609.dasm - System.Numerics.Tests.Vector4Tests:Vector4ZeroTest():this

MichalPetryka · 2021-07-17T22:51:11Z

Partially fixes #55343 since that also talks about optimizing AllBitsSet to a TestC.

src/coreclr/jit/morph.cpp

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128_1.cs

src/coreclr/jit/morph.cpp

…tor-create

EgorBo · 2021-07-23T16:59:01Z

@tannergooding @dotnet/jit-contrib does it look good?

src/coreclr/jit/importer.cpp

src/coreclr/jit/morph.cpp

…tor-create

Co-authored-by: Tanner Gooding <tagoo@outlook.com>

… into opt-vector-create

EgorBo · 2021-09-07T20:50:58Z

@dotnet/jit-contrib PTAL, a small improvement for Vector128/256.Create

Failed CI job is unrelated (broken gcc build)

MichalPetryka · 2021-09-07T22:36:12Z

src/coreclr/jit/morph.cpp

+                    GenTree* op2 = hw->gtGetOp2();
+                    if (!gtIsActiveCSE_Candidate(tree))
+                    {
+                        if (op1->IsIntegralConstVector(0) && !gtIsActiveCSE_Candidate(op1))


I think that IsIntegralConstVector will only work for integer vectors, so all floating point vectors will be rejected here.

Xor is not used in Equals for floats and doubles so it's not a big deal

but it currently handles this (Xor for floats) just fine:

static Vector128<float> Foo(Vector128<float> v) => Sse.Xor(v, Vector128.Create(0).AsSingle());

for zero float it might be tricky as e.g. -0.0 can't be used in this optimization so is not worth the effort

Why would it be tricky? You can account for -0.0 by checking the bitwise (rather than floating-point) value is 0.

Speaking of 0.0 vs -0.0, it looks like there might be existing bugs in IsFPZero and IsSIMDZero since they don't take this into account.

…tor-create

BruceForstall

LGTM. But would like @tannergooding to also approve.

EgorBo · 2021-09-23T10:32:13Z

@tannergooding PTAL

tannergooding · 2021-09-23T15:26:18Z

src/coreclr/jit/importer.cpp

+                }
+#endif
+
+                // TODO: Enable substitution for CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE (typeof(T))


Do we have an existing issue for this TODO?

That's why I want to merge it - to start experimenting with it 🙂 the issue is #40381

Thanks. I just wanted to make sure we had some github issue corresponding to the TODO, so its not just a comment we'll forget about

tannergooding

Changes LGTM. Would be good to ensure floating-point is equally well handled.

EgorBo added 3 commits July 17, 2021 20:43

Optimize VectorX.Create via TestZ

4558287

Clean up

c129424

Clean up

16cc234

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 17, 2021

Add missing break

b846016

EgorBo mentioned this pull request Jul 18, 2021

Optimize Vector128<T> equality against Zero/AllBitsSet #55343

Closed

SingleAccretion reviewed Jul 18, 2021

View reviewed changes

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Jul 19, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128_1.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jul 19, 2021

View reviewed changes

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved

EgorBo added 2 commits July 20, 2021 15:11

Merge branch 'main' of https://github.com/dotnet/runtime into opt-vec…

65fbda2

…tor-create

Address feedback

cd32850

karelz mentioned this pull request Jul 28, 2021

MsQuic tests hang / long running #56487

Closed

tannergooding reviewed Aug 2, 2021

View reviewed changes

src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Aug 2, 2021

View reviewed changes

src/coreclr/jit/morph.cpp Show resolved Hide resolved

EgorBo and others added 4 commits September 7, 2021 11:24

Merge branch 'main' of https://github.com/dotnet/runtime into opt-vec…

661a31c

…tor-create

Update src/coreclr/jit/importer.cpp

b892666

Co-authored-by: Tanner Gooding <tagoo@outlook.com>

Address feedback

33e304a

Merge branch 'opt-vector-create' of https://github.com/EgorBo/runtime-1…

6bc3a50

… into opt-vector-create

JulieLeeMSFT assigned EgorBo Sep 7, 2021

MichalPetryka reviewed Sep 7, 2021

View reviewed changes

EgorBo mentioned this pull request Sep 12, 2021

Inliner: next steps #59002

Open

Merge branch 'main' of https://github.com/dotnet/runtime into opt-vec…

eff49e1

…tor-create

BruceForstall approved these changes Sep 18, 2021

View reviewed changes

tannergooding reviewed Sep 23, 2021

View reviewed changes

tannergooding approved these changes Sep 23, 2021

View reviewed changes

EgorBo merged commit fcee44a into dotnet:main Oct 4, 2021

ghost locked as resolved and limited conversation to collaborators Nov 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Vector128/256.Equals via TestZ #55875

Optimize Vector128/256.Equals via TestZ #55875

EgorBo commented Jul 17, 2021 •

edited

Loading

MichalPetryka commented Jul 17, 2021

EgorBo commented Jul 23, 2021

EgorBo commented Sep 7, 2021 •

edited

Loading

MichalPetryka Sep 7, 2021

EgorBo Sep 17, 2021

EgorBo Sep 17, 2021

tannergooding Sep 23, 2021

BruceForstall left a comment

EgorBo commented Sep 23, 2021

tannergooding Sep 23, 2021

EgorBo Sep 23, 2021

tannergooding Sep 23, 2021

tannergooding left a comment

Optimize Vector128/256.Equals via TestZ #55875

Optimize Vector128/256.Equals via TestZ #55875

Conversation

EgorBo commented Jul 17, 2021 • edited Loading

MichalPetryka commented Jul 17, 2021

EgorBo commented Jul 23, 2021

EgorBo commented Sep 7, 2021 • edited Loading

MichalPetryka Sep 7, 2021

Choose a reason for hiding this comment

EgorBo Sep 17, 2021

Choose a reason for hiding this comment

EgorBo Sep 17, 2021

Choose a reason for hiding this comment

tannergooding Sep 23, 2021

Choose a reason for hiding this comment

BruceForstall left a comment

Choose a reason for hiding this comment

EgorBo commented Sep 23, 2021

tannergooding Sep 23, 2021

Choose a reason for hiding this comment

EgorBo Sep 23, 2021

Choose a reason for hiding this comment

tannergooding Sep 23, 2021

Choose a reason for hiding this comment

tannergooding left a comment

Choose a reason for hiding this comment

EgorBo commented Jul 17, 2021 •

edited

Loading

EgorBo commented Sep 7, 2021 •

edited

Loading