Lower SSE compare scalar and test nodes #22043

mikedn · 2019-01-17T20:28:46Z

This prevents the materialization of a bool value when the intrinsic node is used by a JTRUE node:

        C5F91001             vmovupd  xmm0, xmmword ptr [rcx]
        C4E2791702           vptest   xmm0, xmmword ptr [rdx]
-       0F97C0               seta     al
-       0FB6C0               movzx    rax, al
-       85C0                 test     eax, eax
-       750C                 jne      SHORT G_M18062_IG04
-       E8E1E2FFFF           call     Program:False():bool
+       770C                 ja       SHORT G_M18062_IG04
+       E8E9E5FFFF           call     Program:False():bool
        90                   nop

Diff from the autogenerated test: https://gist.github.com/mikedn/35f8e21be5c9da22ade98241598b5243

Fixes #17073
Fixes #21247

src/jit/hwintrinsiclistxarch.h

mikedn · 2019-01-25T19:21:01Z

src/jit/gentree.h

@@ -5626,8 +5626,8 @@ struct GenCondition
        C    = Unsigned | S,    // = 14
        NC   = Unsigned | NS,   // = 15

-        FEQ  = Float | EQ,      // = 16
-        FNE  = Float | NE,      // = 17
+        FEQ  = Float | 0,       // = 16


This was a mistake that slipped in with my JCC PR a while ago. Since nothing used these constants before this went unnoticed.

src/jit/lowerxarch.cpp

mikedn · 2019-01-26T18:40:08Z

src/jit/lower.cpp

+//
+GenTreeCC* Lowering::LowerNodeCC(GenTree* node, GenCondition condition)
+{
+    // Skip over a chain of EQ/NE(x, 0) relops. This may be present either


The code in LowerSIMD that this replaces also checked for EQ/NE(x, 1). I added that code in #14027 but now I can't find any use for that extra complexity. Even if both the JIT and the C# compiler sometimes assume 0/1 bools when performing some optimizations, I've never seen a case that involves testing bools and assuming 0/1, testing is always 0/non-zero.

I've never seen a case that involves testing bools and assuming 0/1, testing is always 0/non-zero.

Not sure what you would consider "testing", but for C# code there are a number of places where the generated IL doesn't strictly do 0 or not-zero checks. For example, C# returns false for bool == bool where the underlying bits are not identical and a switch statement can hit the default case for a bool which is not 0/1. I believe the csharplang repo had a few more oddities commented at one time.

By testing I mean something like

if (ptest(x, y)) ... if (!ptest(x, y)) ... bool b = ptest(x, y)... bool b = !ptest(x, y)...

I've never seen the C# compiler generating anything other than 0 compares in all these cases. In particular, logical negation (the subject of #21247) is implemented using ceq(x, 0) or brfalse.

I've never seen the C# compiler generating anything other than 0 compares in all these cases

Ah, I think the one case where it does an explicit ceq(x, 1) (rather than ceq(x, 0, brfalse, or brtrue) is for switch statements. I doubt such a code-pattern is that common, however.

In this case:

switch (condition) { case false: Console.WriteLine("0"); break; case true: Console.WriteLine("1"); break; default: Console.WriteLine("?"); break; }

becomes:

IL_0000: ldarg.1 IL_0001: brfalse.s IL_0009 IL_0003: ldarg.1 IL_0004: ldc.i4.1 IL_0005: beq.s IL_0014 IL_0007: br.s IL_001f IL_0009: ldstr "0" IL_000e: call void [mscorlib]System.Console::WriteLine(string) IL_0013: ret IL_0014: ldstr "1" IL_0019: call void [mscorlib]System.Console::WriteLine(string) IL_001e: ret IL_001f: ldstr "?" IL_0024: call void [mscorlib]System.Console::WriteLine(string) IL_0029: ret

mikedn · 2019-01-26T18:41:51Z

src/jit/lowerxarch.cpp

-            // node.
-            //
+        simdNode->gtType = TYP_VOID;
+        simdNode->ClearUnusedValue();


The old code didn't do this. Assuming one can produce such an unused SIMD node, that could have lead to an assert.

tannergooding · 2019-03-13T21:05:57Z

Is this currently pending anything or just review/sign-off? It would be nice to have for 3.0

fiigii · 2019-03-13T21:08:34Z

It would be nice to have for 3.0

Second. I really want this opt in 3.0.
@CarolEidt Could you please take a look at this PR?

mikedn · 2019-03-13T22:07:15Z

Hrm, this already picked up conflicts. I'm not sure what happened with the compare/test intrinsics, there were some API issues related to those.

BruceForstall · 2019-09-21T00:54:27Z

/azp run coreclr-ci

azure-pipelines · 2019-09-21T00:54:41Z

Azure Pipelines successfully started running 1 pipeline(s).

BruceForstall · 2019-09-21T00:55:04Z

@mikedn Looks like this is ready to go now?

@tannergooding @CarolEidt @sandreenko How about another code review?

mikedn · 2019-09-21T06:11:18Z

@BruceForstall Yes, this is done.

CarolEidt

Sorry we weren't able to get this in for 3.0.
I have one question and one request for a clarifying comment or two.

CarolEidt · 2019-09-23T17:14:01Z

src/jit/lowerxarch.cpp

+        case NI_SSE2_COMISD:
+        case NI_SSE2_UCOMISD:
+            // Using the preferred condition saves a branch so it's probably better
+            // to always use it, even if that means loosing containment.


I am missing where this handles the case where it needs to lose containment

If we have [mem], reg we may be able to contain the mem operand by swapping the operands to have reg, [mem]. But that also requires changing the comparison condition and unfortunately some FP conditions have complicated codegen on xarch. So the code below gives priority to "preferrable" FP conditions by blocking operands swapping and thus indirectly blocking containment in some cases.

I'll try to expand this comment a bit tomorrow, together with the other comment improvement suggestion you've made.

I think I now see my point of confusion. When we get here, we haven't yet done containment analysis, correct? That's why we don't need to "undo" anything.

Yes. Hopefully the new comments is more clear.

CarolEidt · 2019-09-23T17:50:23Z

src/jit/lower.cpp

+        {
+            // This should always be true, otherwise it means that JTRUE's relop is somewhere
+            // before `node` and that shouldn't happen. Still, if it happens it's not our problem,
+            // it simply means that `node` isn't used and we don't have to do anything.


This comment is a bit confusing, especially wrt to what "This" is referring to (I mistakenly read it as refering to the condition above at first). I would reword as:

\\ If the instruction immediately following 'relop', i.e. 'next' is a conditional branch, it should always have 'relop' as its 'op1'. \\ If it doesn't, then we have improperly constructed IL (the setting of a condition code should always immediately \\ precede its use, since the JIT doesn't track dataflow for condition codes). Still, if it happens ...

Or something to that effect. It might even be worth mentioning that restriction (i.e. that the setting and use of condition
codes must always be contiguous) in the header.

CarolEidt

LGTM - thanks!

CarolEidt · 2019-09-24T20:02:49Z

The x86 failures are the same as those from #26551, and are, in addition, in tests that have no HW intrinsics.

GrabYourPitchforks · 2019-09-27T19:57:34Z

This is awesome to see! Thanks @mikedn for driving this - much appreciated. :)

fiigii reviewed Jan 18, 2019

View reviewed changes

src/jit/hwintrinsiclistxarch.h Show resolved Hide resolved

mikedn force-pushed the simd-cc branch 2 times, most recently from 0d47cda to cea13af Compare January 19, 2019 11:57

mikedn closed this Jan 21, 2019

mikedn reopened this Jan 21, 2019

mikedn force-pushed the simd-cc branch 3 times, most recently from afd3ae4 to c67eaa1 Compare January 25, 2019 19:04

mikedn commented Jan 25, 2019

View reviewed changes

src/jit/lowerxarch.cpp Outdated Show resolved Hide resolved

mikedn commented Jan 25, 2019

View reviewed changes

src/jit/lowerxarch.cpp Show resolved Hide resolved

mikedn force-pushed the simd-cc branch from c67eaa1 to f4f0b98 Compare January 26, 2019 12:18

mikedn commented Jan 26, 2019

View reviewed changes

mikedn changed the title ~~[WIP] Lower SSE compare scalar and test nodes~~ Lower SSE compare scalar and test nodes Jan 26, 2019

mikedn mentioned this pull request Feb 28, 2019

Allow containing ExtractVector128 into Store #22896

Merged

fiigii mentioned this pull request Mar 13, 2019

Replace slow implementations in ASCIIUtility with fast implementations #22516

Merged

mikedn changed the title ~~Lower SSE compare scalar and test nodes~~ [WIP] Lower SSE compare scalar and test nodes Mar 27, 2019

mikedn force-pushed the simd-cc branch 2 times, most recently from ee0c65c to 0ab1254 Compare April 17, 2019 05:09

mikedn force-pushed the simd-cc branch 2 times, most recently from 12224ab to 7fd1a25 Compare May 18, 2019 15:36

jkotas added the area-CodeGen label May 23, 2019

mikedn force-pushed the simd-cc branch from 7fd1a25 to e5921b0 Compare June 3, 2019 20:26

mikedn changed the title ~~[WIP] Lower SSE compare scalar and test nodes~~ Lower SSE compare scalar and test nodes Jun 4, 2019

mikedn force-pushed the simd-cc branch from e5921b0 to c12c1ac Compare August 27, 2019 04:54

CarolEidt reviewed Sep 23, 2019

View reviewed changes

mikedn added 5 commits September 24, 2019 20:34

Lower SSE compare scalar and test nodes

1ddd6a3

Remove bogus instructions from intrinsic table

05e488b

Cleanup genHWIntrinsic_R_RM

f282fd9

Add tests

b8c7626

Adjust comments

8f01691

mikedn force-pushed the simd-cc branch from 243f8d9 to 8f01691 Compare September 24, 2019 17:37

CarolEidt approved these changes Sep 24, 2019

View reviewed changes

CarolEidt merged commit 013e941 into dotnet:master Sep 24, 2019

mikedn deleted the simd-cc branch September 28, 2019 19:07

tannergooding mentioned this pull request Feb 1, 2020

Take another look at the COMISS and UCOMISS hardware intrinisics dotnet/runtime#28533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower SSE compare scalar and test nodes #22043

Lower SSE compare scalar and test nodes #22043

mikedn commented Jan 17, 2019 •

edited

Loading

mikedn Jan 25, 2019

mikedn Jan 26, 2019

tannergooding Jan 26, 2019

mikedn Jan 26, 2019

tannergooding Jan 26, 2019

mikedn Jan 26, 2019

tannergooding commented Mar 13, 2019

fiigii commented Mar 13, 2019

mikedn commented Mar 13, 2019

BruceForstall commented Sep 21, 2019

azure-pipelines bot commented Sep 21, 2019

BruceForstall commented Sep 21, 2019

mikedn commented Sep 21, 2019

CarolEidt left a comment

CarolEidt Sep 23, 2019

mikedn Sep 23, 2019

CarolEidt Sep 23, 2019

mikedn Sep 24, 2019

CarolEidt Sep 23, 2019

CarolEidt left a comment

CarolEidt commented Sep 24, 2019

GrabYourPitchforks commented Sep 27, 2019

Lower SSE compare scalar and test nodes #22043

Lower SSE compare scalar and test nodes #22043

Conversation

mikedn commented Jan 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Mar 13, 2019

fiigii commented Mar 13, 2019

mikedn commented Mar 13, 2019

BruceForstall commented Sep 21, 2019

azure-pipelines bot commented Sep 21, 2019

BruceForstall commented Sep 21, 2019

mikedn commented Sep 21, 2019

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

CarolEidt commented Sep 24, 2019

GrabYourPitchforks commented Sep 27, 2019

mikedn commented Jan 17, 2019 •

edited

Loading