Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable simd_extmul_* for AArch64 #3070

Merged
merged 3 commits into from
Jul 28, 2021

Conversation

sparker-arm
Copy link
Contributor

Lower simd_extmul_[low/high][signed/unsigned] to [s|u]widen inputs to
an imul node.

This also reorganises the aarch64 'long mul' operations and their assembly.

Copyright (c) 2021, Arm Limited.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:wasm labels Jul 8, 2021
@sparker-arm sparker-arm force-pushed the simd-extmul-aarch64 branch from 00b264c to ff2cd0f Compare July 9, 2021 09:28
@akirilov-arm akirilov-arm requested a review from cfallin July 14, 2021 17:44
@cfallin
Copy link
Member

cfallin commented Jul 14, 2021

@sparker-arm @akirilov-arm I'm currently out on vacation until Mon Jul 26; I can review this when I'm back if you prefer, or perhaps one of @abrown or @alexcrichton could review?

@akirilov-arm
Copy link
Contributor

@cfallin I pinged you because historically you had done most of the AArch64-related reviews (and Sam was actually unable to request a review from anybody), but another reviewer would be absolutely fine.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty reasonable to me!

Umull32,
/// Unsigned multiply add long
Umlal8,
Umlal16,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this and Umlal8 aren't actually generated in any lowerings (unless I'm missing something), but do you want to leave these in for completeness with possible future lowerings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly - thought it would be weird to just port the 32-bit version.

ra: zero_reg(),
});
} else if ty.is_vector() {
for ext_op in &[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible way to structure this is to perhaps check for is_vector and i64x2 early on at the top of this block with early-returns if they match, and if that fails all the remaining cases share the logic of

                    let rn = put_input_in_reg(ctx, inputs[0], NarrowValueMode::None);
                    let rm = put_input_in_reg(ctx, inputs[1], NarrowValueMode::None);
                    let rd = get_output_reg(ctx, outputs[0]).only_reg().unwrap();

as a prefix.

Just a thought though, happy to defer to you who work on this much more than I!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So even though I don't like the duplication, or how much conditional stuff is going on here, we would only be able to factor out one collection of these calls as the I128 case uses, ever so slightly, different calls to handle multiple registers. So I think I prefer the clarity.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Just one request for a doc-comment below.

@@ -1243,6 +1243,150 @@ pub(crate) fn maybe_input_insn_via_conv<C: LowerCtx<I = Inst>>(
None
}

pub(crate) fn match_vec_long_mul<C: LowerCtx<I = Inst>>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a doc-comment here describing the return tuple? E.g. it's not clear to me reading at this point what the bool denotes, until I notice the low/high pattern below in the implementation.

Lower simd_extmul_[low/high][signed/unsigned] to [s|u]widen inputs to
an imul node.

Copyright (c) 2021, Arm Limited.
Copyright (c) 2021, Arm Limited.
And removed an accidental code move.

Copyright (c) 2021, Arm Limited.
@sparker-arm sparker-arm force-pushed the simd-extmul-aarch64 branch from ff2cd0f to 5eb2dca Compare July 28, 2021 12:18
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cfallin cfallin merged commit 323197e into bytecodealliance:main Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:wasm cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants