Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instructions generated for Arm Cortex-R5 processor when compiling code using floating-point operations #128448

Open
rikyborg opened this issue Jul 31, 2024 · 11 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state P-medium Medium priority

Comments

@rikyborg
Copy link

When targeting the Arm Cortex-R5 processor, rustc and llvm generate assembly containing floating-point instructions. These are not available on the Cortex-R5 (only on the Cortex-R5F) and cause the processor to halt.

Details

For example, the following code:

fn add_s(a: f32, b: f32) -> f32 {
    a + b
}

fn add_d(a: f64, b: f64) -> f64 {
    a + b
}

compiled with flags --target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 generates the assembly (compiler explorer link):

add_s:
        vmov    s0, r1
        vmov    s2, r0
        vadd.f32        s0, s2, s0
        vmov    r0, s0
        bx      lr

add_d:
        vmov    d0, r2, r3
        vmov    d1, r0, r1
        vadd.f64        d0, d1, d0
        vmov    r0, r1, d0
        bx      lr

Note the vadd.f32 and vadd.f64 instructions, that are available on the Cortex-R5F which has an FPU, but that are illegal instructions on the Cortex-R5 without an FPU.

My expectation is that, using the target armv7r-none-eabi (rather than armv7r-none-eabihf) and the target CPU cortex-r5 (rather than cortex-r5f), rustc and llvm would generate legal instructions for the processor, i.e. use software floating-point features rather than hard-float instructions.

Relevant information

Original thread on URLO: Unexpected codegen for Cortex-R5 without FPU.

The upstream LLVM CPU model for cortex-r5 includes the flag FeatureVFP3_D16, which seems OK for the R5F but wrong for the R5. However, manually disabling that feature doesn't seem to help (see below). Link to llvm repo at tag 18.1.7.

Information from the ARM Cortex-R Series Programmer's Guide: 6.1.6. VFP in the Cortex-R processors.

Things that do work

Skipping the target-cpu flag

Compiling the snippet above with flags --target armv7r-none-eabi -C opt-level=3 generates:

add_s:
        push    {r11, lr}
        bl      __aeabi_fadd
        pop     {r11, pc}

add_d:
        push    {r11, lr}
        bl      __aeabi_dadd
        pop     {r11, pc}
Adding the soft-float target feature

Compiling the snippet above with flags --target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 -C target-feature=+soft-float generates:

add_s:
        push    {r11, lr}
        bl      __aeabi_fadd
        pop     {r11, pc}

add_d:
        push    {r11, lr}
        bl      __aeabi_dadd
        pop     {r11, pc}

However, rustc generates the warning:

warning: unknown and unstable feature specified for `-Ctarget-feature`: `soft-float`
  |
  = note: it is still passed through to the codegen backend, but use of this feature might be unsound and the behavior of this feature can change in the future
  = help: consider filing a feature request

warning: 1 warning emitted

Things that don't work

Removing the vfp3d16 target feature

Compiling the snippet above with flags --target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 -C target-feature=-vfp3d16 generates:

add_s:
        vmov    s0, r1
        vmov    s2, r0
        vadd.f32        s0, s2, s0
        vmov    r0, s0
        bx      lr

add_d:
        vmov    d0, r2, r3
        vmov    d1, r0, r1
        vadd.f64        d0, d1, d0
        vmov    r0, r1, d0
        bx      lr

Meta

I've tested both stable and nightly toolchains, with same results.

rustc --version --verbose:

rustc 1.80.0 (051478957 2024-07-21)
binary: rustc
commit-hash: 051478957371ee0084a7c0913941d2a8c4757bb9
commit-date: 2024-07-21
host: x86_64-unknown-linux-gnu
release: 1.80.0
LLVM version: 18.1.7
@rikyborg rikyborg added the C-bug Category: This is a bug. label Jul 31, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 31, 2024
@tgross35
Copy link
Contributor

tgross35 commented Jul 31, 2024

As mentioned on the thread, this seems to come from LLVM. Would you mind filing an issue there? Here is an llc reproduction that shows cortex-r5 is the cause https://llvm.godbolt.org/z/zzK7KGv95

@tgross35 tgross35 added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Jul 31, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 31, 2024
@tgross35 tgross35 added I-prioritize Issue: Indicates that prioritization has been requested for this issue. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jul 31, 2024
@tgross35
Copy link
Contributor

Whoops, didn't mean to remove prioritize but I think the labels raced.

@workingjubilee
Copy link
Member

cc @chrisnc as the maintainer for this target.

We updated the float support in #123159 to be more-correct for this target but it looks like we didn't quite stick the landing I guess?

Processors in this family include the Arm Cortex-R4, 5, 7, and 8.

We most definitely intentionally included the R5 here, though apparently we assume an R5F.

@chrisnc
Copy link
Contributor

chrisnc commented Aug 1, 2024

LLVM's policy for Arm target-cpu is that it will enable the maximal set of features that the chosen CPU might support, and expects users to disable ones that they don't have/want, so it is correct and expected that the compiler assumes vfp3d16 when you specify cortex-r5, even when using the non-hf target. The term "R5F" is just a shorthand for "a Cortex-R5 with FPU support", and for LLVM this is the intended default. Arm doesn't define it as a separate CPU, nor does LLVM. (Edit: looks like R4(F) is an exception to this in LLVM.)
Note also that the examples still take the f32 arguments in the integer registers, which doesn't happen on the hf target, so overall the code generated is correct. The only thing I don't understand is why target-feature=-vfp3d16 did not work as expected, so I will investigate that , and see if this is new as of #123159 or was always this way. Edit: no, this can't be affected by that change, because we only modified the hf targets to use slightly different default features so they would compose better with target-cpu, but this case is about the non-hf targets. Still checking on the behavior there...

@chrisnc
Copy link
Contributor

chrisnc commented Aug 1, 2024

--target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 -C target-feature=-vfp2sp,-fp64

https://godbolt.org/z/cj38s8Ev6

This will also disable the floating-point support. I think the issue is that the implied +vfp3d16 from cortex-r5 causes multiple other features to be enabled, but -vfp3d16 doesn't disable those dependencies. Just using +soft-float in your project is still probably the right approach here, rather than referring to the VFP feature tree internal to LLVM.

@rikyborg
Copy link
Author

rikyborg commented Aug 1, 2024

Thanks a lot @chrisnc for investigating this issue!

Just using +soft-float in your project is still probably the right approach here, rather than referring to the VFP feature tree internal to LLVM.

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for `-Ctarget-feature`: `soft-float`. Is there any way to suppress that?

Or should the non-hf target already include the +soft-float feature? If the target does not have hard-float support, is there any reason to generate code using floating-point instructions? (I might be missing something here...)

As mentioned on the thread, this seems to come from LLVM. Would you mind filing an issue there?

Should I still go ahead and file an issue with LLVM as suggested initially by @tgross35?

One source of confusion for me is that LLVM seems to distinguish between Cortex-R4 and Cortex-R4F, but not between Cortex-R5 and Cortex-R5F.

@apiraino
Copy link
Contributor

apiraino commented Aug 1, 2024

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Aug 1, 2024
@chrisnc
Copy link
Contributor

chrisnc commented Aug 1, 2024

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for -Ctarget-feature: soft-float. Is there any way to suppress that?

I don't think there's a way to disable this right now except to use nightly. I can't find an open issue for stabilizing these right now, but that is the path. This warning is relatively recent, and this issue gives some explanation of why it's there: #117616.

Or should the non-hf target already include the +soft-float feature? If the target does not have hard-float support, is there any reason to generate code using floating-point instructions? (I might be missing something here...)

The hf-ness is about the ABI, not what the target could support if the user asks for it (which was the case here). It's perfectly valid to use the soft-float ABI on a core that has an FPU, and the compiler can use the VFP instructions for things other than computation on f32/f64 (or you might legitimately need to do floating-point computation while calling/being called by code that uses the soft-float ABI because you can't re-compile it). This is consistent with how clang/llvm handle this; the expectation is that you use a combination of target-cpu+target-feature to specify what you want, rather than having the ABI choice forcibly disable some features.

As mentioned on the thread, this seems to come from LLVM. Would you mind filing an issue there?
Should I still go ahead and file an issue with LLVM as suggested initially by @tgross35?

One source of confusion for me is that LLVM seems to distinguish between Cortex-R4 and Cortex-R4F, but not between Cortex-R5 and Cortex-R5F.

I think that would be worthwhile, at least to understand why there is a separate cortex-r4f but not for the others. I will speculate that this is for historical reasons and that they would remove it if not for backward compatibility, because it is inconsistent. Anecdotally, I think nofp cortex-r4 are much more common than nofp cortex-r5, which might also explain the exception, but I don't know the provenance of the one you are using. When you are using clang, you would use -mcpu=cortex-r5+nofp for your case, which obviates the need for a separate CPU name or explicitly disabling exactly the FPU features that are included with cortex-r5.

@workingjubilee
Copy link
Member

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for -Ctarget-feature: soft-float. Is there any way to suppress that?

We should probably just add this feature.

@workingjubilee
Copy link
Member

@chrisnc For note, LLVM seems to have flipflopped on the maximal vs. minimal thing for Arm CPUs at least twice now, and recent Arm CPUs (aarch64, mostly) are more likely to use a minimal featureset, though that may only be true from Armv9 onwards.

@chrisnc
Copy link
Contributor

chrisnc commented Aug 2, 2024

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for -Ctarget-feature: soft-float. Is there any way to suppress that?

We should probably just add this feature.

Whoops, I missed that this was also unknown, not just unstable, when I wrote my comment. I'm not sure this will be possible though, because it affects the ABI, and the reasoning here applies to Arm also.

Should rustc consider something like clang's -mcpu which allows things like cortex-r5+nofp? I'm not sure if this is exposed in a re-usable way by llvm, but it could go a long way in solving these types of problems.

For note, LLVM seems to have flipflopped on the maximal vs. minimal thing for Arm CPUs at least twice now, and recent Arm CPUs (aarch64, mostly) are more likely to use a minimal featureset, though that may only be true from Armv9 onwards.

Interesting... At least for the v7m, v7r, v8m, and v8r cores, the default feature sets have all been maximal when each core was added and then not changed except for refactoring and bug fixes from what I can see, but A-profile and Aarch64 are quite a bit more varied and may have different considerations. It seems that R4 as distinct from R4F was done originally in GCC in 2008 and then LLVM followed it, but no other exceptions exist, even though "M4F" and "R5F" and others are commonly mentioned when a non-trivial population of them are nofp.

@workingjubilee workingjubilee added I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness and removed I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state P-medium Medium priority
Projects
None yet
Development

No branches or pull requests

6 participants