-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resurrect #70477: "Use the niche optimization if other variant are small enough" #75866
Conversation
This needs a perf run, since results in the original PR were negative. |
Last time i had a look at this issue, the benchmark showed performances regressions. |
☔ The latest upstream changes (presumably #74862) made this pull request unmergeable. Please resolve the merge conflicts. |
let's do a perf run first: @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit ef282bb1cd28a6a4ce0b8dda66dc4a3250930a77 with merge d85bea00451112d262ee8d42afe2819e1153be8e... |
☀️ Try build successful - checks-actions, checks-azure |
Queued d85bea00451112d262ee8d42afe2819e1153be8e with parent 10ef7f9, future comparison URL. |
Finished benchmarking try commit (d85bea00451112d262ee8d42afe2819e1153be8e): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
* Put the largest niche first * Add test from issue 63866 * Add test for enum of sized/unsized * Prefer fields that are already earlier in the struct
ef282bb
to
ad758f8
Compare
Perf is about the same as the original PR, regressions up to 4-5% on a lot of benchmarks. |
Pushed another commit to disable the optimization and just reorder fields (tests will fail but the build should pass), let's do another perf run to see if that's the problem. |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit d5ef4b6 with merge 4cdfe129f8f8acf6532f8252997ece91030dd135... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 4cdfe129f8f8acf6532f8252997ece91030dd135 with parent ef663a8, future comparison URL. |
Finished benchmarking try commit (4cdfe129f8f8acf6532f8252997ece91030dd135): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
The result seems to indicate that the problem is not caused by the re-ordering if the field and the change in the padding. Either way, we could verify this hypotheses if we try to disable completely the niche optimization optimization (or only for type bigger than 8 bytes or so) and see if that improve the compile time performance. I guess the fix would be to improve the code generation for the match. In the rust/compiler/rustc_mir_build/src/build/matches/test.rs Lines 202 to 213 in be38081
Instead, there should probably be a new Currently, codegen generates something that looks like this (pseudo_code): // this condition produces quite some instruction that llvm then need to try optimize.
let discriminent = if (niche in [begin...end]) { niche - offset } else { data_variant };
match(discriminent) {
data_variant => ...,
other_variant1 => ...
other_variant2 => ...
} Instead, the codegen should do this: match(niche) {
other_variant1+offset => ...
other_variant2+offset => ...
_ => ... // (for the data_variant)
// begin..end => ... // alternative if the switch is not exhaustive
}
|
#77816 did a performence test to disable the niche optimization, showing small performence improvements. So I would conclude that the problem is indeed that the code generated for the niche enum is not optimal, and we should improve it before enabling the niche optimization in more cases. |
@erikdesjardins any updates? (on the CI failure) |
This can't be merged in its current state anyways, it's too much of a perf regression. It's likely that a codegen change (as described in #75866 (comment)) will be necessary in order to land this. I won't have time to look into that personally in the near future, so I'll close this for now. |
Fixes #46213, fixes #66029.
r? @eddyb