Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-C linker-plugin-lto prevents vectorization of mem::swap at -C opt-level=3 #124234

Closed
DaniPopes opened this issue Apr 21, 2024 · 6 comments
Closed

Comments

@DaniPopes
Copy link
Contributor

DaniPopes commented Apr 21, 2024

I tried this code (godbolt):

pub type T = [u64; 4];

#[no_mangle]
pub fn swap_32(a: &mut T, b: &mut T) {
    std::mem::swap(a, b);
    unsafe {
        // std::ptr::swap(a, b);
        // std::ptr::swap_nonoverlapping(a, b, 1);
    }
}

I expected to see this happen: code optimizes to a few <_ x i64> (target-cpu-dependent) loads and stores on -C opt-level=3.

Instead, this happened: the code is not vectorized when -C linker-plugin-lto is also passed to the compiler (this is passed together with -Clto=... but that has no effect here)

Note that this only happens with mem::swap/ptr::swap_nonoverlapping, not ptr::swap.

Meta

rustc --version --verbose:

rustc 1.79.0-nightly (f9b161492 2024-04-19)
binary: rustc
commit-hash: f9b16149208c8a8a349c32813312716f6603eb6f
commit-date: 2024-04-19
host: x86_64-unknown-linux-gnu
release: 1.79.0-nightly
LLVM version: 18.1.4

This happens since at least 1.61, but I could not find a way to bisect this, so I opted not to mark this a regression.

@rustbot modify labels: +A-LLVM +A-LTO +I-slow

@DaniPopes DaniPopes added the C-bug Category: This is a bug. label Apr 21, 2024
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) I-slow Issue: Problems and improvements with respect to performance of generated code. labels Apr 21, 2024
@DaniPopes DaniPopes changed the title -C linker-plugin-lto prevents vectorization of ptr::swap_nonoverlapping at -C opt-level=3 -C linker-plugin-lto prevents vectorization of mem::swap at -C opt-level=3 Apr 21, 2024
@DaniPopes
Copy link
Contributor Author

searched toolchains nightly-2020-12-25 through nightly-2024-04-21
Regression in nightly-2022-02-26
get_commits_between returning commits, len: 11
  commit[0] 2022-02-24: Auto merge of #94131 - Mark-Simulacrum:fmt-string, r=oli-obk
  commit[1] 2022-02-24: Auto merge of #94333 - Dylan-DPC:rollup-7yxtywp, r=Dylan-DPC
  commit[2] 2022-02-25: Auto merge of #93368 - eddyb:diagbld-guarantee, r=estebank
  commit[3] 2022-02-25: Auto merge of #93878 - Aaron1011:newtype-macro, r=cjgillot
  commit[4] 2022-02-25: Auto merge of #94130 - erikdesjardins:partially, r=nikic
  commit[5] 2022-02-25: Auto merge of #94350 - matthiaskrgr:rollup-eesfiyr, r=matthiaskrgr
  commit[6] 2022-02-25: Auto merge of #93644 - michaelwoerister:simpler-debuginfo-typemap, r=wesleywiser
  commit[7] 2022-02-25: Auto merge of #94357 - matthiaskrgr:rollup-xrjaof3, r=matthiaskrgr
  commit[8] 2022-02-25: Auto merge of #94279 - tmiasko:write-print, r=Mark-Simulacrum
  commit[9] 2022-02-25: Auto merge of #94290 - Mark-Simulacrum:bump-bootstrap, r=pietroalbini
  commit[10] 2022-02-25: Auto merge of #94369 - matthiaskrgr:rollup-qtripm2, r=matthiaskrgr
ERROR: no CI builds available between 4b043faba34ccc053a4d0110634c323f6c03765e and d3ad51b48f83329fac0cd8a9f1253f3146613c1c within last 167 days

4b043fa...d3ad51b

Probably #94212 cc @scottmcm

@scottmcm
Copy link
Member

We used to basically be forcing this, but stopped doing that in #123185

I know nothing about what linker-plugin-lto does or why it would be relevant.

But yeah, in assembly it swaps via ymms in v3 https://rust.godbolt.org/z/Mqc64zYxW without the flag, but does individual qwords with it 🤷

@nikic
Copy link
Contributor

nikic commented Apr 23, 2024

I'm not sure whether -C linker-plugin-lto is compatible with --emit=llvm-ir (or --emit=asm). I'd guess it prints the pre-LTO IR in that case?

@DaniPopes
Copy link
Contributor Author

My workflow is getting the --release assembly with cargo-show-asm. LTO is enabled in the release profile overrides in the manifest.

It's unfortunate that this does not return the fully optimized assembly, but it also makes sense since linking doesn't actually happen. I thought this issue initially was a regression because couldn't find any info on this.

I confirmed that it gets vectorized in the final assembly with objdump + grep in a new crate, so I'll close this issue.

@scottmcm
Copy link
Member

Glad to hear it's working! (and that I didn't break everything *phew*)

TBH, this has been confusing me too as I work on stuff. Thus tests like https://github.com/rust-lang/rust/blob/master/tests/assembly/x86_64-typed-swap.rs as it seems that looking in LLVM IR keeps not telling the full story :(

@saethlin saethlin removed A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. C-bug Category: This is a bug. A-LTO Area: Link-time optimization (LTO) needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Apr 23, 2024
@pacak
Copy link
Contributor

pacak commented Apr 24, 2024

It's unfortunate that this does not return the fully optimized assembly, but it also makes sense since linking doesn't actually happen. I thought this issue initially was a regression because couldn't find any info on this.

FWIW I just released cargo-show-asm 0.2.33 and now it can actually disassemble your binary (pass --disasm in addition to usual flags and make sure symbols are not stripped by cargo) so output should be similar to what you can get from objdump, but with less manual interactions. For now it's locked behind "disasm" feature (cargo install cargo-show-asm --features disasm), if something doesn't work - please make a ticket :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants