-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chained comparison of two integers against a constant is not coalesced #102103
Conversation
Seems to have regressions on arm64 too 😢 I suppose Clang/LLVM reasons about these in a more generalized way... 612735.dasm - Program:InstanceMethodTest():ubyte (FullOpts)
@@ -174,11 +174,14 @@ G_M34320_IG08: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
ldr x3, [x3]
blr x3
; gcrRegs -[x1-x2 x22]
- cmn w20, #1
+ movn w0, #41
+ eor w0, w21, w0
+ movn w1, #0
+ eor w1, w20, w1
+ orr w0, w0, w1
+ cmp w0, #0
cset x0, eq
- cmn w21, #42
- csel w0, wzr, w0, ne
- ;; size=88 bbWeight=1 PerfScore 16.00
+ ;; size=100 bbWeight=1 PerfScore 17.50 |
It seems the regressions happen on ARM because it already supports conditional select in assembly which prevents from creating branches. Our optimization seems to confuse the JIT from recognizing the pattern and using the csel in certain methods. |
@TIHan, please review this community PR. |
/azp run runtime-coreclr superpmi-diffs |
Azure Pipelines successfully started running 1 pipeline(s). |
src/coreclr/jit/optimizebools.cpp
Outdated
#if defined(TARGET_ARM) || defined(TARGET_ARM64) | ||
return false; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this check? There should be a comment why the transformation isn't profitable on these platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pedrobsaila as well..
The JIT already supports creating conditional chains. It was implemented by @a74nh in #79283. It is only enabled on arm64 because only arm64 has conditional compare.
Conditional compare is going to be better than the xor pattern from the example, that's probably why this is a regression on arm64. Also, x64/x86 is getting conditional compare as part of Intel APX, so it is expected that we will enable the same logic for x86/x64 in the future. Given this and the small diffs of this PR I'm not sure the complexity is warranted.
If we want to have this transformation then it should be done by enabling optOptimizeCompareChainCondBlock
for x86/x64. Then the backend should be taught how to translate this pattern to something more profitable on x86/x64. Currently it only knows how to do that for arm64. The transformation is done by TryLowerAndOrToCCMP
. x86/x64 should have a similar version that is translated by using xor
instead. Most likely optOptimizeCompareChainCondBlock
will have to be restricted on the patterns it allows to combine on x86/x64 since there is no true conditional compare yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I removed my comment because I misread this feedback as asking the author rather than a nudge to provide context on disabling on ARM64, though I was hoping to see much larger amount of diffs - this pattern is pretty common after all)
Indeed ccmp is emitted by both Clang and GCC: https://godbolt.org/z/srhqPrK4v
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this check?
yes the ARM support for branchless conditional select is what motivated the if, the xor pattern ends up being more expensive in this case.
If we want to have this transformation then it should be done by enabling
optOptimizeCompareChainCondBlock
for x86/x64
If I do understand the code well, this would optimize comparison checks for conditional blocks but not for return ones. Do I need to delete the optimization for the latest ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to add some handling for return blocks if the existing handling doesn't cover it. But I think you should transform it into bitwise ops (i.e. return (x == 0x80) && (y == 0x80)
=> return (x == 0x80) & (y == 0x80)
so that the remaining handling falls out naturally from TryLowerAndOrToCCMP
(and the x64 specific variant being introduced).
Sorry for the delay @jakobbotsch, it took me sometime to figure it out. Can you reopen my PR ? |
@@ -421,7 +424,10 @@ bool OptBoolsDsc::FindCompareChain(GenTree* condition, bool* isTestCondition) | |||
*isTestCondition = true; | |||
} | |||
else if (condOp1->OperIs(GT_AND) && isPow2(static_cast<target_size_t>(condOp2Value)) && | |||
condOp1->gtGetOp2()->IsIntegralConst(condOp2Value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't IsIntegralConst() be used here? It looks the same to me.
I'm assuming this PR has no code diffs on Arm64? |
if (!lastNode->OperIs(GT_JTRUE, GT_RETURN)) | ||
{ | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not make sense to me. This transformation should be beneficial to do regardless of how the and/or is used.
if (!op1->OperIs(GT_NE, GT_EQ) || !op2->OperIs(GT_NE, GT_EQ) || | ||
(lastNode->OperIs(GT_JTRUE) && ((tree->OperIs(GT_OR) && (!op1->OperIs(GT_NE) || !op2->OperIs(GT_NE))) || | ||
(tree->OperIs(GT_AND) && (!op1->OperIs(GT_EQ) || !op2->OperIs(GT_EQ))))) || | ||
(lastNode->OperIs(GT_RETURN) && ((tree->OperIs(GT_OR) && (!op1->OperIs(GT_EQ) || !op2->OperIs(GT_EQ))) || | ||
(tree->OperIs(GT_AND) && (!op1->OperIs(GT_NE) || !op2->OperIs(GT_NE)))))) | ||
{ | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too the JTRUE/RETURN
checks do not make sense to me. Imagine having
bool foo = (x == 0) & (y == 0).
bool bar = (x != 0) | (y != 0);
in any basic block. The transformation should be able to kick in for this pattern regardless of what type of basic block it appears in.
This pull request has been automatically marked |
This pull request will now be closed since it had been marked |
Fixes #101347