-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to -Z plt=yes #106380
Default to -Z plt=yes #106380
Conversation
r? @oli-obk (rustbot has picked a reviewer for you, use r? to override) |
These commits modify compiler targets. |
@GabrielMajeri FYI |
For the record, I'll leave a link to the original pull request which enabled As for this:
I'm afraid I'm not familiar enough with CPU (micro)architectures to counter this claim. On x86-64, at the very least, the |
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 5a3791f76dde2490094ebeb8977614a55a72fc25 with merge 5f4f27a0a98c60c9218675e35d2cc60c47546394... |
This comment has been minimized.
This comment has been minimized.
ad22866
to
043b307
Compare
This comment has been minimized.
This comment has been minimized.
@rust-timer build 5f4f27a0a98c60c9218675e35d2cc60c47546394 |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (5f4f27a0a98c60c9218675e35d2cc60c47546394): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
@rustbot label: +llvm-main |
@krasimirgg Is that because llvm/llvm-project@2679e8b broke the Rust build? If so, consider this PR blocked on that change getting reverted. We can evaluate changing our defaults, but it's not okay to break LLVM and force us to change defaults because of that. |
6009da0 defaulted to `-Z plt=no` (like `clang -fno-plt`) which a not a useful default[1]. On x86-64, if the target symbol is preemptible, there is an `R_X86_64_GLOB_DAT` relocation, and the (very minor) optimization works as intended. However, if the target is non-preemptible, i.e. the target is resolved to the same component, this is actually a pessimization due to the longer instruction. On RISC architectures, there is typically no single instruction which can load a GOT entry and perform an indirect call. `-fno-plt` has a longer code quence. For example, AArch64 needs 3 instructions: adrp x0, _GLOBAL_OFFSET_TABLE_ ldr x0, [x0, #:gotpage_lo15:bar] br x0 This does not end up with a serious code size issue, because LLVM "RtLibUseGOT" is not implemented for non-x86 targets. On x86-32, very new lld[2] (2022-12-31) is needed to support general-dynamic/local-dynamic TLS models. `-Z plt=no` is not an appropriate default, so just default to true for all targets. [1] https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table#fno-plt [2] llvm/llvm-project@8dc7366
Rebase |
Nominating for discussion at the next compiler meeting. The proposal here is to change the current In short, the basic idea is that if you call a function from a different shared object (say from libc.so), what actually gets called is a PLT stub, which then looks up the function to call in the GOT and calls it. With If the called function is in the same shared object / executable, it will get resolved to a direct call. In this case, the longer On non-x86 targets, the encoding for a GOT call rather than PLT call may be substantially larger. Rather than going from 5 to 6 bytes, it may go from 4 to 12 bytes (I guess that would be the case for the AArch64 example). Relevant results from the perf run above are:
I believe these are the relevant facts. My personal recommendation based on my understanding of the tradeoffs here would be to default to I'd like some broader input from T-compiler on what to do here. Also cc @nagisa who reviewed the original PR. Finally, I think whatever we do here, it probably makes sense to stabilize |
I have filed an MCP for the plan regarding the change in defaults outlined by @nikic; that is rust-lang/compiler-team#581 (It probably would be good to expose control of the PLT via a proper |
☔ The latest upstream changes (presumably #108301) made this pull request unmergeable. Please resolve the merge conflicts. |
@rustbot label: -llvm-main |
MCP has been approved compiler-team#581 |
@MaskRay can you revise this PR to reflect the refinement described in rust-lang/compiler-team#581 (namely, that the default should be PLT=yes for everything except x86_64 ?) @rustbot label: -S-waiting-on-review +S-waiting-on-author |
I disagree with keeping |
for the sake of nonperfection, could you make the change anyway? i fear otherwise this change would just get lost in a cyclic argument, because it is correct from both arguments sides in some capacity (on x86 specifically, in the most common case of no special user-optimised static linking overrides (few people are doing this in rust to my knowledge), plt=no is generally a small gain (the rustc-itself benchmarks), by everything established above). i feel like everyone here has already discussed this to the full capacity possible, and there is no new information to really take in on the subject. but it would be nice to at least indeed change it everywhere else (as acked via MCP above), and people then would get a real gain. |
@MaskRay any updates on this? |
Per the discussion in rust-lang#106380 plt=no isn't a great default, and rust-lang/compiler-team#581 decided that the default should be PLT=yes for everything except x86_64. Not everyone agrees about the x86_64 part of this change, but this at least is an improvement in the state of things without changing the x86_64 situation, so I've attempted making this change in the name of not letting the perfect be the enemy of the good.
rustc_session: default to -Z plt=yes on non-x86_64 Per the discussion in rust-lang#106380 plt=no isn't a great default, and rust-lang/compiler-team#581 decided that the default should be PLT=yes for everything except x86_64. Not everyone agrees about the x86_64 part of this change, but this at least is an improvement in the state of things without changing the x86_64 situation, so I've attempted making this change in the name of not letting the perfect be the enemy of the good. Please let me know if I've messed this up somehow - I'm not wholly confident I got this right. r? `@nikic`
Superseded by #109982. |
6009da0 defaulted to
-Z plt=no
(likeclang -fno-plt
) which a not a useful default[1].On x86-64, if the target symbol is preemptible, there is an
R_X86_64_GLOB_DAT
relocation, and the (very minor) optimization works asintended. However, if the target is non-preemptible, i.e. the target is
resolved to the same component, this is actually a pessimization due to
the longer instruction.
On RISC architectures, there is typically no single instruction which
can load a GOT entry and perform an indirect call.
-fno-plt
has a longercode sequence. For example, AArch64 needs 3 instructions:
This does not end up with a serious code size issue, because LLVM
"RtLibUseGOT" is not implemented for non-x86 targets.
On x86-32, very new lld[2] (2022-12-31) is needed to support
general-dynamic/local-dynamic TLS models.
-Z plt=no
is not an appropriate default, so just default to true forall targets.
[1] https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table#fno-plt
[2] llvm/llvm-project@8dc7366