-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64-SVE: Avoid containing non-embedded conditional select #105719 #105812
Conversation
@dotnet/arm64-contrib @kunalspathak |
Added testing. |
cc @dotnet/jit-contrib @TIHan @amanasifkhalid @tannergooding Can one of you review? |
src/tests/JIT/Regression/JitBlue/Runtime_105719/Runtime_105719.csproj
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@a74nh Build Analysis is blocked by "experimental feature" warnings for the test you added; could you please add |
(You may want to make it |
src/coreclr/jit/lowerarmarch.cpp
Outdated
if (op3->IsVectorZero() && op1->IsMaskAllBitsSet() && op2->IsEmbMaskOp()) | ||
{ | ||
// When we are merging with zero, we can specialize | ||
// and avoid instantiating the vector constant. | ||
// Do this only if op1 was AllTrueMask | ||
// Do this only if op1 was AllTrueMask and op2 is embedded. | ||
MakeSrcContained(node, op3); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right fix. The zero
(and in fact any constant or other value here) is still containable.
op1
is AllBitsSet
so therefore nothing from op3
can ever be selected, so it is "unused". This means it is valid to drop the sel
entirely and just emit op2
directly.
The only time we should be hitting this path is when op2
is an operation that requires a mask to be specified (even if unused) or some manually written user code in minopts
. For the latter, it's still fine to contain the zero constant and just use the op2 register for both inputs (containment is a basic operation that happens at all codegen levels).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op1
isAllBitsSet
so therefore nothing fromop3
can ever be selected, so it is "unused". This means it is valid to drop thesel
entirely and just emitop2
directly.
Updated to do this instead.
Required the embedded HWIntrinsic check, otherwise lowering goes into an infinite loop adding and removing conditional selects.
@@ -15,6 +15,7 @@ pr: | |||
- src/coreclr/jit/emitarm64sve.cpp | |||
- src/coreclr/jit/emitfmtsarm64sve.h | |||
- src/coreclr/jit/lsraarm64.cpp | |||
- src/coreclr/jit/lowerarmarch.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add entries for following?
- lsraarmarch.cpp
- codegenarmarch.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Also alphabetically ordered the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #105719
CONDSELECT(TRUE, EMBBEDMASKOP(), 0)
For this scenario, during codegeneration, the SELECT will not be generated and instead just generate the embedded mask operation. To do this, op3 can be contained.
CONDSELECT(TRUE, VECTOR, 0)
For this senario, the SELECT operation will be generated (and then in emit a MOV will be generated instead). To do this, op3 cannot be contained and must be generated into a register.