-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable simd_extmul_* for AArch64 #3070
Enable simd_extmul_* for AArch64 #3070
Conversation
00b264c
to
ff2cd0f
Compare
@sparker-arm @akirilov-arm I'm currently out on vacation until Mon Jul 26; I can review this when I'm back if you prefer, or perhaps one of @abrown or @alexcrichton could review? |
@cfallin I pinged you because historically you had done most of the AArch64-related reviews (and Sam was actually unable to request a review from anybody), but another reviewer would be absolutely fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty reasonable to me!
Umull32, | ||
/// Unsigned multiply add long | ||
Umlal8, | ||
Umlal16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this and Umlal8
aren't actually generated in any lowerings (unless I'm missing something), but do you want to leave these in for completeness with possible future lowerings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly - thought it would be weird to just port the 32-bit version.
ra: zero_reg(), | ||
}); | ||
} else if ty.is_vector() { | ||
for ext_op in &[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible way to structure this is to perhaps check for is_vector
and i64x2 early on at the top of this block with early-returns if they match, and if that fails all the remaining cases share the logic of
let rn = put_input_in_reg(ctx, inputs[0], NarrowValueMode::None);
let rm = put_input_in_reg(ctx, inputs[1], NarrowValueMode::None);
let rd = get_output_reg(ctx, outputs[0]).only_reg().unwrap();
as a prefix.
Just a thought though, happy to defer to you who work on this much more than I!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So even though I don't like the duplication, or how much conditional stuff is going on here, we would only be able to factor out one collection of these calls as the I128 case uses, ever so slightly, different calls to handle multiple registers. So I think I prefer the clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Just one request for a doc-comment below.
@@ -1243,6 +1243,150 @@ pub(crate) fn maybe_input_insn_via_conv<C: LowerCtx<I = Inst>>( | |||
None | |||
} | |||
|
|||
pub(crate) fn match_vec_long_mul<C: LowerCtx<I = Inst>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a doc-comment here describing the return tuple? E.g. it's not clear to me reading at this point what the bool
denotes, until I notice the low/high pattern below in the implementation.
Lower simd_extmul_[low/high][signed/unsigned] to [s|u]widen inputs to an imul node. Copyright (c) 2021, Arm Limited.
Copyright (c) 2021, Arm Limited.
And removed an accidental code move. Copyright (c) 2021, Arm Limited.
ff2cd0f
to
5eb2dca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Lower simd_extmul_[low/high][signed/unsigned] to [s|u]widen inputs to
an imul node.
This also reorganises the aarch64 'long mul' operations and their assembly.
Copyright (c) 2021, Arm Limited.