Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm Fused Multiply-Add fixes #1219

Merged
merged 2 commits into from
Sep 20, 2021

Conversation

hkratz
Copy link
Contributor

@hkratz hkratz commented Sep 19, 2021

This fixes muliple issues with 32-bit ARM Fused Multiply-Add intrinsics:

@rust-highfive
Copy link

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

@hkratz hkratz changed the title vfp4 fma fixes Arm FMA fixes Sep 19, 2021
@hkratz hkratz changed the title Arm FMA fixes Arm Fused Multiply-Add fixes Sep 19, 2021
@hkratz hkratz force-pushed the require_vfp4_instead_of_v8_for_fma branch 2 times, most recently from 6b91a5f to 12d2c24 Compare September 19, 2021 17:40
@hkratz
Copy link
Contributor Author

hkratz commented Sep 19, 2021

Inlining check disabled again. The new vld1{q}_p64_x* intrinsics contain function calls.

cc @SparrowLii

Some VFMA functions have `target_feature(enable = "vfp4")` while the called functions `vdup_n_f32` and `vdupq_n_f32` are `target_feature(enable = "v7")`. LLVM does not inline the functions due to the different feature flags. Using private _vfp4 variants of those functions allows them to be inlined.
@hkratz hkratz force-pushed the require_vfp4_instead_of_v8_for_fma branch from 12d2c24 to 9e33ce7 Compare September 19, 2021 18:03
@Amanieu Amanieu merged commit f75b8b7 into rust-lang:master Sep 20, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 27, 2021
Update stdarch submodule

This is mainly to fix the critical issue of aarch64 store intrinsics overwriting additional memory, see rust-lang/stdarch#1220

Changes:
* aarch64/armv7: additional vld1/vst1 intrinsics + perf fixes for existing ones
  * rust-lang/stdarch#1205
  * rust-lang/stdarch#1207
  * rust-lang/stdarch#1216
* armv7: Make FMA work with vfpv4 and optimize
  * rust-lang/stdarch#1219
* Non-visible changes to the testing framework
  * rust-lang/stdarch#1208
  * rust-lang/stdarch#1211
  * rust-lang/stdarch#1213
  * rust-lang/stdarch#1215
  * rust-lang/stdarch#1218
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The assembly emitted for some arm neon fma intrinsics contains function calls
3 participants