-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64/SVE: JitStress/JitStressRegs fixes #102543
Conversation
…s LOW_PREDICATE flag
…_LowMaskedOperation
@dotnet/arm64-contrib |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
if falseReg == embMaskOp2Reg, we simply generate: ``` sel z16.s, p7, z9.s, z10.s mla z16.s, p7/m, z10.s, z11.s ``` Here `z10` holds `falseReg` and `embMaskOp2Reg`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Glad we are catching these.
* handle case for FMA where falseReg == embMaskOp1Reg * workaround because predicateRegister/vectorRegister are same * When intrinsic is wrapped in ConditionalSelect, make sure to check its LOW_PREDICATE flag * Mark AddAcross with HW_Flag_LowMaskedOperation * double check if ConditionalSelect's op2 is hwintrinsic * Mark Max with HW_Flag_LowMaskedOperation * Mark MaxAcross with HW_Flag_LowMaskedOperation * Mark MinNumber/MaxNumber/MinNumberAcross/MaxNumberAcross with HW_Flag_LowMaskedOperation * Mark Min/MinAcross with HW_Flag_LowMaskedOperation * fix the workaround for predicate vs. vector register * fix the predicate mask preference * Introduce INS_SCALABLE_OPTS_PREDICATE_MERGE_MOV * jit format * revert change to csproj * remove assert that can not happen for FMA if falseReg == embMaskOp2Reg, we simply generate: ``` sel z16.s, p7, z9.s, z10.s mla z16.s, p7/m, z10.s, z11.s ``` Here `z10` holds `falseReg` and `embMaskOp2Reg`. * revert a condition added for workaround of predicate == vector register * remove the extra comment
* handle case for FMA where falseReg == embMaskOp1Reg * workaround because predicateRegister/vectorRegister are same * When intrinsic is wrapped in ConditionalSelect, make sure to check its LOW_PREDICATE flag * Mark AddAcross with HW_Flag_LowMaskedOperation * double check if ConditionalSelect's op2 is hwintrinsic * Mark Max with HW_Flag_LowMaskedOperation * Mark MaxAcross with HW_Flag_LowMaskedOperation * Mark MinNumber/MaxNumber/MinNumberAcross/MaxNumberAcross with HW_Flag_LowMaskedOperation * Mark Min/MinAcross with HW_Flag_LowMaskedOperation * fix the workaround for predicate vs. vector register * fix the predicate mask preference * Introduce INS_SCALABLE_OPTS_PREDICATE_MERGE_MOV * jit format * revert change to csproj * remove assert that can not happen for FMA if falseReg == embMaskOp2Reg, we simply generate: ``` sel z16.s, p7, z9.s, z10.s mla z16.s, p7/m, z10.s, z11.s ``` Here `z10` holds `falseReg` and `embMaskOp2Reg`. * revert a condition added for workaround of predicate == vector register * remove the extra comment
falseReg == embMaskOp1Reg
mov
when used as alias forsel
because predicateRegister/vectorRegister are sameConditionalSelect
, check the intrinsic flag if it needs low register maskHW_Flag_LowMaskedOperation
There are still some functional failures with JitStress/JitStressRegs, but wanted to send this out.