Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p64 load/store intrinsics not properly inlined on arm #1236

Open
hkratz opened this issue Oct 22, 2021 · 1 comment
Open

p64 load/store intrinsics not properly inlined on arm #1236

hkratz opened this issue Oct 22, 2021 · 1 comment

Comments

@hkratz
Copy link
Contributor

hkratz commented Oct 22, 2021

The following tests fail the inlining check on arm (but pass on aarch64):

    core_arch::arm_shared::neon::generated::assert_vld1_p64_x2_vld1
    core_arch::arm_shared::neon::generated::assert_vld1_p64_x3_nop
    core_arch::arm_shared::neon::generated::assert_vld1_p64_x4_nop
    core_arch::arm_shared::neon::generated::assert_vld1q_p64_x2_nop
    core_arch::arm_shared::neon::generated::assert_vld1q_p64_x3_nop
    core_arch::arm_shared::neon::generated::assert_vld1q_p64_x4_nop
    core_arch::arm_shared::neon::generated::assert_vld2_dup_p64_nop
    core_arch::arm_shared::neon::generated::assert_vld2_p64_nop
    core_arch::arm_shared::neon::generated::assert_vld3_dup_p64_nop
    core_arch::arm_shared::neon::generated::assert_vld3_p64_nop
    core_arch::arm_shared::neon::generated::assert_vld4_dup_p64_nop
    core_arch::arm_shared::neon::generated::assert_vld4_p64_nop
    core_arch::arm_shared::neon::generated::assert_vst1_p64_x2_vst1
    core_arch::arm_shared::neon::generated::assert_vst1_p64_x3_nop
    core_arch::arm_shared::neon::generated::assert_vst1_p64_x4_nop
    core_arch::arm_shared::neon::generated::assert_vst1q_p64_x2_nop
    core_arch::arm_shared::neon::generated::assert_vst1q_p64_x3_nop
    core_arch::arm_shared::neon::generated::assert_vst1q_p64_x4_nop
    core_arch::arm_shared::neon::generated::assert_vst2_p64_nop
    core_arch::arm_shared::neon::generated::assert_vst3_p64_nop
    core_arch::arm_shared::neon::generated::assert_vst4_p64_nop

Apparently we need an extra version of the load/store intrinsics we delegate those intrinsics to with matching feature flags, see https://godbolt.org/z/WxozeEPav

@hkratz hkratz changed the title p64 intrinsics not properly inlined on arm p64 load/store intrinsics not properly inlined on arm Oct 25, 2021
@gendx
Copy link
Contributor

gendx commented Sep 8, 2023

It looks like this is because the Rust compiler doesn't support inlining a function labeled with some target_feature(enable = "foo") into another function labeled with another set of enabled features (unless all the features in foo are enabled in compiler flags).

I've investigated this problem in this blog post, in particular this section on target_feature and this section on what it means for recursion. See also similar issues in rust-lang/rust#54353 and rust-lang/rust#53069.

Concretely, the problem is that you're trying to inline a function labeled neon,v7 within a function labeled neon,aes,v8. If I compile the first code with -C target-feature=+aes,+v7,+v8, then the intrinsic is inlined.

#[inline]  // This doesn't apply to a function labeled with other features.
#[target_feature(enable = "neon,v7")]
pub unsafe fn vld2_s64(a: *const i64) -> int64x1x2_t {
  ...
}

#[inline]
#[target_feature(enable = "neon,aes,v8"))]
pub unsafe fn vld2_p64(a: *const p64) -> poly64x1x2_t {
    transmute(vld2_s64(transmute(a)))
}

#[target_feature(enable = "neon,aes,v8"))]
#[inline(never)]
pub unsafe fn vld2_p64_testshim(ptr: *const p64) -> poly64x1x2_t {
    vld2_p64(ptr)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants