Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change aarch64 vld1* instructions to not cause individual loads #1207

Merged
merged 4 commits into from
Sep 8, 2021

Conversation

hkratz
Copy link
Contributor

@hkratz hkratz commented Sep 7, 2021

vld1* instructions are required to always compile to a single load instruction (see ARM developer documentation). The current implementation causes individual loads to be emitted in LLVM-IR which are not always combined to a single load instruction during LLVM optimization passes. This change causes a single load to be emitted in all cases.

cc @SparrowLii

TODO:

  • Add test
  • Open question: Really change all impls for consistency or just where it matters?

Fixes #1148.

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Amanieu (or someone else) soon.

Please see the contribution instructions for more information.

@SparrowLii
Copy link
Member

SparrowLii commented Sep 7, 2021

Really change all impls for consistency or just where it matters?

I think at least the vst1* instruction should also use write_unaligned for consistency.

@Amanieu
Copy link
Member

Amanieu commented Sep 8, 2021

@hkratz Thanks! Could you update the vst1 intrinsics as well like @SparrowLii suggested.

@hkratz hkratz force-pushed the aarch64_vld1_always_single_load branch from f6c48fb to a6d925e Compare September 8, 2021 19:54
@Amanieu
Copy link
Member

Amanieu commented Sep 8, 2021

LGTM. Are you planning on implementing something else (the PR is still a draft).

I don't think a test is needed for this.

@hkratz
Copy link
Contributor Author

hkratz commented Sep 8, 2021

I was thinking of a regression test for #1148, but it is not strictly necessary and afaics there is no easy way to do it.

@hkratz hkratz marked this pull request as ready for review September 8, 2021 22:03
@Amanieu Amanieu merged commit 68cde98 into rust-lang:master Sep 8, 2021
@hkratz hkratz deleted the aarch64_vld1_always_single_load branch September 9, 2021 03:50
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 27, 2021
Update stdarch submodule

This is mainly to fix the critical issue of aarch64 store intrinsics overwriting additional memory, see rust-lang/stdarch#1220

Changes:
* aarch64/armv7: additional vld1/vst1 intrinsics + perf fixes for existing ones
  * rust-lang/stdarch#1205
  * rust-lang/stdarch#1207
  * rust-lang/stdarch#1216
* armv7: Make FMA work with vfpv4 and optimize
  * rust-lang/stdarch#1219
* Non-visible changes to the testing framework
  * rust-lang/stdarch#1208
  * rust-lang/stdarch#1211
  * rust-lang/stdarch#1213
  * rust-lang/stdarch#1215
  * rust-lang/stdarch#1218
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aarch64 performance: vld1q_u8 intrinsic can cause single-byte loads
4 participants