Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc_codegen_ssa: don't use LLVM struct types for field offsets. #98615

Closed
wants to merge 2 commits into from

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Jun 28, 2022

As opaque pointers in LLVM become inevitable, let's review our usage of LLVM "struct" types:

  • representing Rust types
    • constants (gone since miri, we just flatly serialize the mix of raw bytes and relocations)
    • pointee types (will be gone with opaque pointers)
    • alloca: only for size (we already always supply the alignment explicitly)
      • impact of replacing the type with [i8 x sizeof(T)] still needs to be measured, I have a sneaking suspicion that (even post-opaque-pointers) some LLVM passes use the type to make decisions
    • getelementptr: only for offset computation
      • this PR replaces them with byte offsets, which might interfere with some optimizations (as I worry about alloca), but I doubt LLVM would have a hard time rewriting the GEP to fit the type (i.e. this might push back all the pessimizations to a future alloca de-typing)
      • my intuition back when I tried to get involved with the opaque pointers work, was that LLVM was wasting a lot of time (re)computing1 offsets2 when a more pragmatic IR would just put the constant offset in the instruction (at most you'd end up with what I call "integer dot product", for the full form that indexes nested arrays)
        1can't be easily cached IIRC because types are interned in the Context but only a Module can specify a datalayout
        2you want offsets as any decision based on "fields" would likely miss some optimization opportunities
  • not representing Rust types (but automatically generated for e.g. ABI reasons)
    • the common one here is "returning more than one value" (as LLVM doesn't support the feature itself, but forces emitters to go through the tuple-returning idiom)
    • wherever possible, rustc_codegen_ssa's abstractions should work with e.g. SmallVec<[Bx::Value; 2]> (or have some sort of Bx::ValueBundle for handling trickier situations like invoke, where the invoke and the extracts are in different BBs)

I never got LLVM devs to agree with me that aggregate types in LLVM were a mistake, but if we can get away with not using them (outside of "multiple return values"), maybe that will make a difference.
(a commonly cited reason was debugging the IR, but wouldn't you want more readable DWARF metadata in that case? LLVM types are already quite lossy. also, the representation of LLVM constants I was floating back then in the context of removing LLVM aggregates, turned into a miri "abstract allocation" representation proposal, so it wasn't entirely a waste of time)


A couple notes for context:

  • historically the lack of opaque pointers, and the (perceived?) need to represent Rust types (or rather, their layouts) using LLVM aggregates, had delayed (and complicated) what we now call the rustc ABI/layout system (thankfully we didn't end up needing cycle detection for "mutual recursion" groups)
  • I was looking into generalizing "scalar pair" layouts the other day, and the interaction with LLVM struct types is getting in the way a lot (both coming up with appropriate LLVM types for "scalar components", and subsequent GEPs)

Some additional concerns come from backends that are even higher-level than LLVM:

  • rustc_codegen_gcc (cc @antoyo)
    • not super familiar with it, but I suspect it could also switch fully to untyped pointers
  • rustc_codegen_spirv
    • currently takes advantage of being able to represent Rust struct types (and a limited selection of enums, based on layout) as SPIR-V structs
    • already handles pointer-casting GEPs (i.e. it tries to recover a castless GEP based on the offset), so all it really needs is to see the layout in alloca and all the call ABI handling
      • that is, its backend Values would be tracking a richer SPIR-V type than rustc_codegen_ssa needs or wants for the backend Type, in an opaque-pointer/aggreggate-less world
    • long-term the plan is to introduce a SPIR-V derivative IR that would more readily allow both representing the "untyped memory" semantics Rust wants, and legalizing it away (mostly into SSA values, but inferring "typed memory" is also an option for some things)

I'm not sure how well we can test the LLVM optimization impact of this PR, but to some extent a perf run might be able to reflect at least the impact on rustc's code.

cc @rust-lang/wg-llvm @bjorn3 @tmiasko

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 28, 2022
@rust-highfive
Copy link
Collaborator

Some changes occured to rustc_codegen_gcc

cc @antoyo

@rustbot
Copy link
Collaborator

rustbot commented Jun 28, 2022

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo

@rust-highfive

This comment was marked as outdated.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 28, 2022
@eddyb
Copy link
Member Author

eddyb commented Jun 28, 2022

@bors try @rust-timer queue

@lcnr

This comment was marked as resolved.

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 28, 2022
@bors
Copy link
Contributor

bors commented Jun 28, 2022

⌛ Trying commit 67d2c9f with merge 2f9a464f3f179bba982ccfa5b3d354d32b6dc88f...

@lcnr

This comment was marked as outdated.

@rust-highfive rust-highfive assigned jackh726 and unassigned lcnr Jun 28, 2022
@eddyb
Copy link
Member Author

eddyb commented Jun 28, 2022

r? @nikic

@nikic
Copy link
Contributor

nikic commented Jun 28, 2022

impact of replacing the type with [i8 x sizeof T] still needs to be measured, I have a sneaking suspicion that (even post-opaque-pointers) some LLVM passes use the type to make decisions

SROA does make use of the alloca type, though only a fallback. I'm not sure how important it is. The relevant code is here: https://github.com/llvm/llvm-project/blob/04dac2ca7c06d0ce173e53527e3b90a07e3b325d/llvm/lib/Transforms/Scalar/SROA.cpp#L4259

this PR replaces them with byte offsets, which might interfere with some optimizations (as I worry about alloca), but I doubt LLVM would have a hard time rewriting the GEP to fit the type (i.e. this might push back all the pessimizations to a future alloca de-typing)

If the offset is constant, this is okay in theory. LLVM is pretty good about not caring about GEP types when it comes to constant offsets (the same is not true if variable offsets are involved). Though the last time I actually tried to do this as an InstCombine canonicalization, I observed significant codegen impact, and didn't analyze it further at the time.

my intuition back when I tried to get involved with the opaque pointers work, was that LLVM was wasting a lot of time (re)computing1 offsets2 when a more pragmatic IR would just put the constant offset in the instruction (at most you'd end up with what I call "integer dot product", for the full form that indexes nested arrays)

Your intuition is correct.


I think generally this change makes sense, but I'm not sure whether now is the right time to make it -- we're currently still on LLVM 14 without opaque pointers, where this would have higher impact.

@nikic
Copy link
Contributor

nikic commented Jun 28, 2022

FWIW changing GEP representation to use offsets is on my long term roadmap, but it would be another major effort, and needs the typed pointer removal to be finalized first anyway.

@bors
Copy link
Contributor

bors commented Jun 28, 2022

☀️ Try build successful - checks-actions
Build commit: 2f9a464f3f179bba982ccfa5b3d354d32b6dc88f (2f9a464f3f179bba982ccfa5b3d354d32b6dc88f)

@rust-timer
Copy link
Collaborator

Queued 2f9a464f3f179bba982ccfa5b3d354d32b6dc88f with parent 64eb9ab, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (2f9a464f3f179bba982ccfa5b3d354d32b6dc88f): comparison url.

Instruction count

  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
0.8% 1.5% 8
Regressions 😿
(secondary)
3.3% 8.9% 11
Improvements 🎉
(primary)
-0.7% -3.8% 14
Improvements 🎉
(secondary)
-1.5% -2.5% 7
All 😿🎉 (primary) -0.2% -3.8% 22

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
10.2% 14.1% 5
Improvements 🎉
(primary)
-3.9% -3.9% 1
Improvements 🎉
(secondary)
-5.2% -6.3% 4
All 😿🎉 (primary) -3.9% -3.9% 1

Cycles

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
6.0% 12.1% 8
Improvements 🎉
(primary)
-3.2% -3.2% 1
Improvements 🎉
(secondary)
-2.6% -2.6% 1
All 😿🎉 (primary) -3.2% -3.2% 1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 28, 2022
@eddyb
Copy link
Member Author

eddyb commented Jun 28, 2022

@Mark-Simulacrum ^^ is it normal that there's nothing under the "Bootstrap timings" heading?


Either way, pretty mixed results, though it seems like opt benefits and debug loses (at least when the amplitude is high enough, there's probably second-order effects in there as well).

Maybe we need opaque pointers before this can work well enough? The extra casts might be a thorn in the side of debug mode.

Sadly, looking at the opaque pointers perf run, seems like it would be a bit of a pain to do a fair comparison (since it would have to be opaque-pointers vs opaque-pointers+this PR).

@lqd
Copy link
Member

lqd commented Jun 28, 2022

@Mark-Simulacrum ^^ is it normal that there's nothing under the "Bootstrap timings" heading?

It seems rustc is currently broken on the perfbot https://perf.rust-lang.org/status.html

@antoyo
Copy link
Contributor

antoyo commented Jun 28, 2022

rustc_codegen_gcc (cc @antoyo)

  • not super familiar with it, but I suspect it could also switch fully to untyped pointers

I'm assuming the API to things accessing pointers like load() will still carry the type information, so that the GCC codegen will be able to cast to the right pointer type. Is this correct?

@eddyb
Copy link
Member Author

eddyb commented Jun 28, 2022

I'm assuming the API to things accessing pointers like load() will still carry the type information, so that the GCC codegen will be able to cast to the right pointer type. Is this correct?

Oh, right, just accounting for opaque pointer types would probably make what I'm doing here work.
(With opaque pointers, pointee types are provided to alloca/GEP/load but not tracked in the pointer type)

I suppose I should ask if GCC will treat ((char*)base) + offset much worse than &((Foo*)base)->field when they're equivalent (i.e. offset == offsetof(Foo, field)).
And separately/on top of that, whether declaring stack locals (LLVM alloca) as char[sizeof(Foo)] is worse than just Foo (not relevant to this PR, just the logical next step).

IOW, are types just used as a impractical proxy for the underlying untyped memory semantics or are they actually special-cased? (for LLVM it can vary per pass but generally it leans towards the former)

@eddyb
Copy link
Member Author

eddyb commented Jun 29, 2022

Okay at least deep-vector seems to be one small file with no dependencies, so at least profiling LLVM and whatnot should be maximally doable (but from some quick testing I will need release-debuginfo = true, haven't done a local LLVM build in a while, this is going to be fun).

@nikic
Copy link
Contributor

nikic commented Sep 9, 2022

I think the numbers here are good enough to land this. I expect this will have mixed improvements and regressions in codegen. We can address reported regressions in the next LLVM release. See #101210 for an example where this should be an improvement, and I've been seeing patterns like this quite often recently.

r=me with that one fixme dropped.

@Kobzol
Copy link
Contributor

Kobzol commented Sep 27, 2022

Can we add a codegen test to check if this helps with #101210? Or should it be in a follow-up PR?

@nikic
Copy link
Contributor

nikic commented Dec 17, 2022

As it's been quite a while, rerunning perf to sanity check.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 17, 2022
@bors
Copy link
Contributor

bors commented Dec 17, 2022

⌛ Trying commit 4d803a0 with merge 8456a3d8aa534b17f06fbd8ec8caa743630805e5...

@bors
Copy link
Contributor

bors commented Dec 17, 2022

☀️ Try build successful - checks-actions
Build commit: 8456a3d8aa534b17f06fbd8ec8caa743630805e5 (8456a3d8aa534b17f06fbd8ec8caa743630805e5)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8456a3d8aa534b17f06fbd8ec8caa743630805e5): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.3%, 0.4%] 5
Regressions ❌
(secondary)
0.4% [0.2%, 0.9%] 16
Improvements ✅
(primary)
-0.8% [-1.5%, -0.4%] 13
Improvements ✅
(secondary)
-1.5% [-3.6%, -0.5%] 7
All ❌✅ (primary) -0.5% [-1.5%, 0.4%] 18

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.3% [2.3%, 2.3%] 1
Regressions ❌
(secondary)
3.6% [2.5%, 5.6%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.3% [2.3%, 2.3%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.3% [-1.8%, -1.0%] 9
Improvements ✅
(secondary)
-2.1% [-3.6%, -1.1%] 5
All ❌✅ (primary) -1.3% [-1.8%, -1.0%] 9

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 17, 2022
@erikdesjardins
Copy link
Contributor

erikdesjardins commented Jan 7, 2023

@nikic the fixme comment has been removed and I think perf still looks reasonable, are there any other changes you want here?

@Dylan-DPC Dylan-DPC removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 21, 2023
@nikic
Copy link
Contributor

nikic commented Mar 11, 2023

For the record, https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699?u=nikic is the LLVM proposal to move to purely offset-based GEPs.

@WaffleLapkin
Copy link
Member

What's the status here? Are merge conflicts the last thing that is stopping this from being merged? Are the next steps [rebase, re-check perf, r=nikic]?

@Dylan-DPC
Copy link
Member

Dylan-DPC commented May 14, 2023

What's the status here? Are merge conflicts the last thing that is stopping this from being merged? Are the next steps [rebase, re-check perf, r=nikic]?

@WaffleLapkin this is marked as blocked on rust-lang/rustc-perf#1345 as per this comment though i'm not sure if that's still a concern

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 25, 2024
Stop using LLVM struct types for alloca, byval, sret, and many GEPs

This is an extension of rust-lang#98615, extending the removal from field offsets to most places that it's feasible right now. (It might make sense to split this PR up, but I want to test perf with everything.)

For `alloca`, `byval`, and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's layout algorithm. Particularly for `alloca`, it is likely that a future LLVM will change to a representation where you only specify the size.

For GEPs, upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which is the same thing we do here.

\*: Since we always explicitly specify the alignment. For `byval`, this wasn't the case until rust-lang#112157.

†: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue here.

r? `@ghost`
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 27, 2024
Always generate GEP i8 / ptradd for struct offsets

This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely.

Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change.

Split out from rust-lang#121577.

r? `@nikic`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 2, 2024
Always generate GEP i8 / ptradd for struct offsets

This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely.

Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change.

Fixes rust-lang#121719.

Split out from rust-lang#121577.

r? `@nikic`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 3, 2024
Always generate GEP i8 / ptradd for struct offsets

This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely.

Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change.

Fixes rust-lang#121719.

Split out from rust-lang#121577.

r? `@nikic`
GuillaumeGomez pushed a commit to GuillaumeGomez/rust that referenced this pull request Mar 5, 2024
Always generate GEP i8 / ptradd for struct offsets

This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely.

Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change.

Fixes rust-lang#121719.

Split out from rust-lang#121577.

r? `@nikic`
@erikdesjardins
Copy link
Contributor

This change was landed in #121665, so this PR can be closed.

@nikic nikic closed this Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-blocked Status: Blocked on something else such as an RFC or other implementation work. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.