-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Lower SSE compare scalar and test nodes #22043
Conversation
0d47cda
to
cea13af
Compare
afd3ae4
to
c67eaa1
Compare
@@ -5626,8 +5626,8 @@ struct GenCondition | |||
C = Unsigned | S, // = 14 | |||
NC = Unsigned | NS, // = 15 | |||
|
|||
FEQ = Float | EQ, // = 16 | |||
FNE = Float | NE, // = 17 | |||
FEQ = Float | 0, // = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a mistake that slipped in with my JCC PR a while ago. Since nothing used these constants before this went unnoticed.
// | ||
GenTreeCC* Lowering::LowerNodeCC(GenTree* node, GenCondition condition) | ||
{ | ||
// Skip over a chain of EQ/NE(x, 0) relops. This may be present either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in LowerSIMD
that this replaces also checked for EQ/NE(x, 1)
. I added that code in #14027 but now I can't find any use for that extra complexity. Even if both the JIT and the C# compiler sometimes assume 0/1 bools when performing some optimizations, I've never seen a case that involves testing bools and assuming 0/1, testing is always 0/non-zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never seen a case that involves testing bools and assuming 0/1, testing is always 0/non-zero.
Not sure what you would consider "testing", but for C# code there are a number of places where the generated IL doesn't strictly do 0
or not-zero
checks. For example, C# returns false
for bool == bool
where the underlying bits are not identical and a switch statement can hit the default case for a bool which is not 0/1. I believe the csharplang repo had a few more oddities commented at one time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By testing I mean something like
if (ptest(x, y)) ...
if (!ptest(x, y)) ...
bool b = ptest(x, y)...
bool b = !ptest(x, y)...
I've never seen the C# compiler generating anything other than 0 compares in all these cases. In particular, logical negation (the subject of #21247) is implemented using ceq(x, 0)
or brfalse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never seen the C# compiler generating anything other than 0 compares in all these cases
Ah, I think the one case where it does an explicit ceq(x, 1)
(rather than ceq(x, 0
, brfalse
, or brtrue
) is for switch statements. I doubt such a code-pattern is that common, however.
In this case:
switch (condition)
{
case false:
Console.WriteLine("0");
break;
case true:
Console.WriteLine("1");
break;
default:
Console.WriteLine("?");
break;
}
becomes:
IL_0000: ldarg.1
IL_0001: brfalse.s IL_0009
IL_0003: ldarg.1
IL_0004: ldc.i4.1
IL_0005: beq.s IL_0014
IL_0007: br.s IL_001f
IL_0009: ldstr "0"
IL_000e: call void [mscorlib]System.Console::WriteLine(string)
IL_0013: ret
IL_0014: ldstr "1"
IL_0019: call void [mscorlib]System.Console::WriteLine(string)
IL_001e: ret
IL_001f: ldstr "?"
IL_0024: call void [mscorlib]System.Console::WriteLine(string)
IL_0029: ret
// node. | ||
// | ||
simdNode->gtType = TYP_VOID; | ||
simdNode->ClearUnusedValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code didn't do this. Assuming one can produce such an unused SIMD node, that could have lead to an assert.
Is this currently pending anything or just review/sign-off? It would be nice to have for 3.0 |
Second. I really want this opt in 3.0. |
Hrm, this already picked up conflicts. I'm not sure what happened with the compare/test intrinsics, there were some API issues related to those. |
ee0c65c
to
0ab1254
Compare
12224ab
to
7fd1a25
Compare
/azp run coreclr-ci |
Azure Pipelines successfully started running 1 pipeline(s). |
@mikedn Looks like this is ready to go now? @tannergooding @CarolEidt @sandreenko How about another code review? |
@BruceForstall Yes, this is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry we weren't able to get this in for 3.0.
I have one question and one request for a clarifying comment or two.
src/jit/lowerxarch.cpp
Outdated
case NI_SSE2_COMISD: | ||
case NI_SSE2_UCOMISD: | ||
// Using the preferred condition saves a branch so it's probably better | ||
// to always use it, even if that means loosing containment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am missing where this handles the case where it needs to lose containment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have [mem], reg
we may be able to contain the mem operand by swapping the operands to have reg, [mem]
. But that also requires changing the comparison condition and unfortunately some FP conditions have complicated codegen on xarch. So the code below gives priority to "preferrable" FP conditions by blocking operands swapping and thus indirectly blocking containment in some cases.
I'll try to expand this comment a bit tomorrow, together with the other comment improvement suggestion you've made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I now see my point of confusion. When we get here, we haven't yet done containment analysis, correct? That's why we don't need to "undo" anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Hopefully the new comments is more clear.
src/jit/lower.cpp
Outdated
{ | ||
// This should always be true, otherwise it means that JTRUE's relop is somewhere | ||
// before `node` and that shouldn't happen. Still, if it happens it's not our problem, | ||
// it simply means that `node` isn't used and we don't have to do anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a bit confusing, especially wrt to what "This" is referring to (I mistakenly read it as refering to the condition above at first). I would reword as:
\\ If the instruction immediately following 'relop', i.e. 'next' is a conditional branch, it should always have 'relop' as its 'op1'.
\\ If it doesn't, then we have improperly constructed IL (the setting of a condition code should always immediately
\\ precede its use, since the JIT doesn't track dataflow for condition codes). Still, if it happens ...
Or something to that effect. It might even be worth mentioning that restriction (i.e. that the setting and use of condition
codes must always be contiguous) in the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks!
The x86 failures are the same as those from #26551, and are, in addition, in tests that have no HW intrinsics. |
This is awesome to see! Thanks @mikedn for driving this - much appreciated. :) |
This prevents the materialization of a bool value when the intrinsic node is used by a JTRUE node:
Diff from the autogenerated test: https://gist.github.com/mikedn/35f8e21be5c9da22ade98241598b5243
Fixes #17073
Fixes #21247