Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Lower SSE compare scalar and test nodes #22043

Merged
merged 5 commits into from
Sep 24, 2019
Merged

Conversation

mikedn
Copy link

@mikedn mikedn commented Jan 17, 2019

This prevents the materialization of a bool value when the intrinsic node is used by a JTRUE node:

        C5F91001             vmovupd  xmm0, xmmword ptr [rcx]
        C4E2791702           vptest   xmm0, xmmword ptr [rdx]
-       0F97C0               seta     al
-       0FB6C0               movzx    rax, al
-       85C0                 test     eax, eax
-       750C                 jne      SHORT G_M18062_IG04
-       E8E1E2FFFF           call     Program:False():bool
+       770C                 ja       SHORT G_M18062_IG04
+       E8E9E5FFFF           call     Program:False():bool
        90                   nop      

Diff from the autogenerated test: https://gist.github.com/mikedn/35f8e21be5c9da22ade98241598b5243

Fixes #17073
Fixes #21247

@mikedn mikedn force-pushed the simd-cc branch 2 times, most recently from 0d47cda to cea13af Compare January 19, 2019 11:57
@mikedn mikedn closed this Jan 21, 2019
@mikedn mikedn reopened this Jan 21, 2019
@mikedn mikedn force-pushed the simd-cc branch 3 times, most recently from afd3ae4 to c67eaa1 Compare January 25, 2019 19:04
@@ -5626,8 +5626,8 @@ struct GenCondition
C = Unsigned | S, // = 14
NC = Unsigned | NS, // = 15

FEQ = Float | EQ, // = 16
FNE = Float | NE, // = 17
FEQ = Float | 0, // = 16
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a mistake that slipped in with my JCC PR a while ago. Since nothing used these constants before this went unnoticed.

src/jit/lowerxarch.cpp Outdated Show resolved Hide resolved
//
GenTreeCC* Lowering::LowerNodeCC(GenTree* node, GenCondition condition)
{
// Skip over a chain of EQ/NE(x, 0) relops. This may be present either
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in LowerSIMD that this replaces also checked for EQ/NE(x, 1). I added that code in #14027 but now I can't find any use for that extra complexity. Even if both the JIT and the C# compiler sometimes assume 0/1 bools when performing some optimizations, I've never seen a case that involves testing bools and assuming 0/1, testing is always 0/non-zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen a case that involves testing bools and assuming 0/1, testing is always 0/non-zero.

Not sure what you would consider "testing", but for C# code there are a number of places where the generated IL doesn't strictly do 0 or not-zero checks. For example, C# returns false for bool == bool where the underlying bits are not identical and a switch statement can hit the default case for a bool which is not 0/1. I believe the csharplang repo had a few more oddities commented at one time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By testing I mean something like

if (ptest(x, y)) ...
if (!ptest(x, y)) ...
bool b = ptest(x, y)...
bool b = !ptest(x, y)...

I've never seen the C# compiler generating anything other than 0 compares in all these cases. In particular, logical negation (the subject of #21247) is implemented using ceq(x, 0) or brfalse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen the C# compiler generating anything other than 0 compares in all these cases

Ah, I think the one case where it does an explicit ceq(x, 1) (rather than ceq(x, 0, brfalse, or brtrue) is for switch statements. I doubt such a code-pattern is that common, however.

In this case:

switch (condition)
{
    case false:
        Console.WriteLine("0");
        break;

    case true:
        Console.WriteLine("1");
        break;

    default:
        Console.WriteLine("?");
        break;
}

becomes:

        IL_0000: ldarg.1
        IL_0001: brfalse.s IL_0009

        IL_0003: ldarg.1
        IL_0004: ldc.i4.1
        IL_0005: beq.s IL_0014

        IL_0007: br.s IL_001f

        IL_0009: ldstr "0"
        IL_000e: call void [mscorlib]System.Console::WriteLine(string)
        IL_0013: ret

        IL_0014: ldstr "1"
        IL_0019: call void [mscorlib]System.Console::WriteLine(string)
        IL_001e: ret

        IL_001f: ldstr "?"
        IL_0024: call void [mscorlib]System.Console::WriteLine(string)
        IL_0029: ret

// node.
//
simdNode->gtType = TYP_VOID;
simdNode->ClearUnusedValue();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code didn't do this. Assuming one can produce such an unused SIMD node, that could have lead to an assert.

@mikedn mikedn changed the title [WIP] Lower SSE compare scalar and test nodes Lower SSE compare scalar and test nodes Jan 26, 2019
@tannergooding
Copy link
Member

Is this currently pending anything or just review/sign-off? It would be nice to have for 3.0

@fiigii
Copy link

fiigii commented Mar 13, 2019

It would be nice to have for 3.0

Second. I really want this opt in 3.0.
@CarolEidt Could you please take a look at this PR?

@mikedn
Copy link
Author

mikedn commented Mar 13, 2019

Hrm, this already picked up conflicts. I'm not sure what happened with the compare/test intrinsics, there were some API issues related to those.

@mikedn mikedn changed the title Lower SSE compare scalar and test nodes [WIP] Lower SSE compare scalar and test nodes Mar 27, 2019
@mikedn mikedn force-pushed the simd-cc branch 2 times, most recently from ee0c65c to 0ab1254 Compare April 17, 2019 05:09
@mikedn mikedn force-pushed the simd-cc branch 2 times, most recently from 12224ab to 7fd1a25 Compare May 18, 2019 15:36
@mikedn mikedn changed the title [WIP] Lower SSE compare scalar and test nodes Lower SSE compare scalar and test nodes Jun 4, 2019
@BruceForstall
Copy link
Member

/azp run coreclr-ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Member

@mikedn Looks like this is ready to go now?

@tannergooding @CarolEidt @sandreenko How about another code review?

@mikedn
Copy link
Author

mikedn commented Sep 21, 2019

@BruceForstall Yes, this is done.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry we weren't able to get this in for 3.0.
I have one question and one request for a clarifying comment or two.

case NI_SSE2_COMISD:
case NI_SSE2_UCOMISD:
// Using the preferred condition saves a branch so it's probably better
// to always use it, even if that means loosing containment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing where this handles the case where it needs to lose containment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have [mem], reg we may be able to contain the mem operand by swapping the operands to have reg, [mem]. But that also requires changing the comparison condition and unfortunately some FP conditions have complicated codegen on xarch. So the code below gives priority to "preferrable" FP conditions by blocking operands swapping and thus indirectly blocking containment in some cases.

I'll try to expand this comment a bit tomorrow, together with the other comment improvement suggestion you've made.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I now see my point of confusion. When we get here, we haven't yet done containment analysis, correct? That's why we don't need to "undo" anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Hopefully the new comments is more clear.

{
// This should always be true, otherwise it means that JTRUE's relop is somewhere
// before `node` and that shouldn't happen. Still, if it happens it's not our problem,
// it simply means that `node` isn't used and we don't have to do anything.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a bit confusing, especially wrt to what "This" is referring to (I mistakenly read it as refering to the condition above at first). I would reword as:

\\ If the instruction immediately following 'relop', i.e. 'next' is a conditional branch, it should always have 'relop' as its 'op1'. 
\\ If it doesn't, then we have improperly constructed IL (the setting of a condition code should always immediately
\\ precede its use, since the JIT doesn't track dataflow for condition codes). Still, if it happens ...

Or something to that effect. It might even be worth mentioning that restriction (i.e. that the setting and use of condition
codes must always be contiguous) in the header.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@CarolEidt
Copy link

The x86 failures are the same as those from #26551, and are, in addition, in tests that have no HW intrinsics.

@CarolEidt CarolEidt merged commit 013e941 into dotnet:master Sep 24, 2019
@GrabYourPitchforks
Copy link
Member

This is awesome to see! Thanks @mikedn for driving this - much appreciated. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

!Avx.Test{Z,C} suboptimal codegen CQ issues with HW intrinsics that return bool
7 participants