-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc_codegen_ssa: don't use LLVM struct types for field offsets. #98615
Conversation
Some changes occured to rustc_codegen_gcc cc @antoyo |
Some changes occurred in compiler/rustc_codegen_gcc cc @antoyo |
This comment was marked as outdated.
This comment was marked as outdated.
@bors try @rust-timer queue |
This comment was marked as resolved.
This comment was marked as resolved.
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 67d2c9f with merge 2f9a464f3f179bba982ccfa5b3d354d32b6dc88f... |
This comment was marked as outdated.
This comment was marked as outdated.
r? @nikic |
SROA does make use of the alloca type, though only a fallback. I'm not sure how important it is. The relevant code is here: https://github.com/llvm/llvm-project/blob/04dac2ca7c06d0ce173e53527e3b90a07e3b325d/llvm/lib/Transforms/Scalar/SROA.cpp#L4259
If the offset is constant, this is okay in theory. LLVM is pretty good about not caring about GEP types when it comes to constant offsets (the same is not true if variable offsets are involved). Though the last time I actually tried to do this as an InstCombine canonicalization, I observed significant codegen impact, and didn't analyze it further at the time.
Your intuition is correct. I think generally this change makes sense, but I'm not sure whether now is the right time to make it -- we're currently still on LLVM 14 without opaque pointers, where this would have higher impact. |
FWIW changing GEP representation to use offsets is on my long term roadmap, but it would be another major effort, and needs the typed pointer removal to be finalized first anyway. |
☀️ Try build successful - checks-actions |
Queued 2f9a464f3f179bba982ccfa5b3d354d32b6dc88f with parent 64eb9ab, future comparison URL. |
Finished benchmarking commit (2f9a464f3f179bba982ccfa5b3d354d32b6dc88f): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
@Mark-Simulacrum ^^ is it normal that there's nothing under the "Bootstrap timings" heading? Either way, pretty mixed results, though it seems like Maybe we need opaque pointers before this can work well enough? The extra casts might be a thorn in the side of debug mode. Sadly, looking at the opaque pointers perf run, seems like it would be a bit of a pain to do a fair comparison (since it would have to be opaque-pointers vs opaque-pointers+this PR). |
It seems rustc is currently broken on the perfbot https://perf.rust-lang.org/status.html |
I'm assuming the API to things accessing pointers like |
Oh, right, just accounting for opaque pointer types would probably make what I'm doing here work. I suppose I should ask if GCC will treat IOW, are types just used as a impractical proxy for the underlying untyped memory semantics or are they actually special-cased? (for LLVM it can vary per pass but generally it leans towards the former) |
Okay at least |
I think the numbers here are good enough to land this. I expect this will have mixed improvements and regressions in codegen. We can address reported regressions in the next LLVM release. See #101210 for an example where this should be an improvement, and I've been seeing patterns like this quite often recently. r=me with that one fixme dropped. |
Can we add a codegen test to check if this helps with #101210? Or should it be in a follow-up PR? |
As it's been quite a while, rerunning perf to sanity check. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 4d803a0 with merge 8456a3d8aa534b17f06fbd8ec8caa743630805e5... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (8456a3d8aa534b17f06fbd8ec8caa743630805e5): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
@nikic the fixme comment has been removed and I think perf still looks reasonable, are there any other changes you want here? |
For the record, https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699?u=nikic is the LLVM proposal to move to purely offset-based GEPs. |
What's the status here? Are merge conflicts the last thing that is stopping this from being merged? Are the next steps [rebase, re-check perf, r=nikic]? |
@WaffleLapkin this is marked as blocked on rust-lang/rustc-perf#1345 as per this comment though i'm not sure if that's still a concern |
Stop using LLVM struct types for alloca, byval, sret, and many GEPs This is an extension of rust-lang#98615, extending the removal from field offsets to most places that it's feasible right now. (It might make sense to split this PR up, but I want to test perf with everything.) For `alloca`, `byval`, and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's layout algorithm. Particularly for `alloca`, it is likely that a future LLVM will change to a representation where you only specify the size. For GEPs, upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which is the same thing we do here. \*: Since we always explicitly specify the alignment. For `byval`, this wasn't the case until rust-lang#112157. †: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue here. r? `@ghost`
Always generate GEP i8 / ptradd for struct offsets This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely. Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change. Split out from rust-lang#121577. r? `@nikic`
Always generate GEP i8 / ptradd for struct offsets This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely. Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change. Fixes rust-lang#121719. Split out from rust-lang#121577. r? `@nikic`
Always generate GEP i8 / ptradd for struct offsets This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely. Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change. Fixes rust-lang#121719. Split out from rust-lang#121577. r? `@nikic`
Always generate GEP i8 / ptradd for struct offsets This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely. Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change. Fixes rust-lang#121719. Split out from rust-lang#121577. r? `@nikic`
This change was landed in #121665, so this PR can be closed. |
As opaque pointers in LLVM become inevitable, let's review our usage of LLVM "struct" types:
constants(gone since miri, we just flatly serialize the mix of raw bytes and relocations)pointee types(will be gone with opaque pointers)alloca
: only for size (we already always supply the alignment explicitly)[i8 x sizeof(T)]
still needs to be measured, I have a sneaking suspicion that (even post-opaque-pointers) some LLVM passes use the type to make decisionsgetelementptr
: only for offset computationalloca
), but I doubt LLVM would have a hard time rewriting the GEP to fit the type (i.e. this might push back all the pessimizations to a futurealloca
de-typing)1can't be easily cached IIRC because types are interned in the
Context
but only aModule
can specify adatalayout
2you want offsets as any decision based on "fields" would likely miss some optimization opportunities
rustc_codegen_ssa
's abstractions should work with e.g.SmallVec<[Bx::Value; 2]>
(or have some sort ofBx::ValueBundle
for handling trickier situations likeinvoke
, where theinvoke
and theextract
s are in different BBs)I never got LLVM devs to agree with me that aggregate types in LLVM were a mistake, but if we can get away with not using them (outside of "multiple return values"), maybe that will make a difference.
(a commonly cited reason was debugging the IR, but wouldn't you want more readable DWARF metadata in that case? LLVM types are already quite lossy. also, the representation of LLVM constants I was floating back then in the context of removing LLVM aggregates, turned into a miri "abstract allocation" representation proposal, so it wasn't entirely a waste of time)
A couple notes for context:
rustc
ABI/layout system (thankfully we didn't end up needing cycle detection for "mutual recursion" groups)Some additional concerns come from backends that are even higher-level than LLVM:
rustc_codegen_gcc
(cc @antoyo)rustc_codegen_spirv
struct
types (and a limited selection ofenum
s, based on layout) as SPIR-V structsalloca
and all the call ABI handlingValue
s would be tracking a richer SPIR-V type thanrustc_codegen_ssa
needs or wants for the backendType
, in an opaque-pointer/aggreggate-less worldI'm not sure how well we can test the LLVM optimization impact of this PR, but to some extent a perf run might be able to reflect at least the impact on
rustc
's code.cc @rust-lang/wg-llvm @bjorn3 @tmiasko