-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIR-OPT: Pass to deduplicate blocks #77551
MIR-OPT: Pass to deduplicate blocks #77551
Conversation
r? @oli-obk (rust_highfive has picked a reviewer for you, use r? to override) |
bca4c75
to
6d10adc
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 6d10adca99e598a96de8dc438de2b2bbfe974fc7 with merge e8bab90e8bb7eaee4e142e99267ec4dacdfd3985... |
☀️ Try build successful - checks-actions, checks-azure |
Queued e8bab90e8bb7eaee4e142e99267ec4dacdfd3985 with parent 4ccf5f7, future comparison URL. |
Finished benchmarking try commit (e8bab90e8bb7eaee4e142e99267ec4dacdfd3985): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Performance does not look good. I'll setup a profiler in the coming days to see if it can be improved. |
fn rvalue_eq(lhs: &Rvalue<'tcx>, rhs: &Rvalue<'tcx>) -> bool { | ||
let res = match (lhs, rhs) { | ||
( | ||
Rvalue::Use(Operand::Constant(box Constant { user_ty: _, literal, span: _ })), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason for not using == is to not compare span. See #77549 (comment)
Should we have a compare_without_user_info? Not comparing span is an issue that most optimization passes would like to solve I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a riddiculous footgun. I'll open an issue to discuss this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance should be quite a bit better with the latest commit. Unnecessary tuple combinations are still being created though. I'll see if this can be removed. |
Unnecessary tuple combinations should be gone now. Can I get a perf run again? |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 44b87ce06d5582a00b5b21688834e8d03f7b5600 with merge 50438f458aff5610389abb267c5b5b89fbb68b04... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 50438f458aff5610389abb267c5b5b89fbb68b04 with parent a1dfd24, future comparison URL. |
Finished benchmarking try commit (50438f458aff5610389abb267c5b5b89fbb68b04): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Based on the perf results, there is absolutely still room for improvement. I'll see what can be done after some profiling. The regressions in optimized_mir make sense as we are doing more work (and maybe too much), but it's not intuitive to mee why the LLVM passes seem to regress. Intuitively, when handing fewer basic blocks to LLVM, it itself has to do less work. Any ideas? I guess we can verify that we are indeed handing simpler IR to LLVM using cargo-lines. If it turns out we are not, the problem could be that deduplicating blocks in Mir confuses some of the existing Mir opt passes. |
You can basically ignore these llvm regressions as they are very small and only during incremental compilation afaict. They happen because the number of codegen units changes when a function gets optimized too well in MIR. One thing that I found curious is that the ctfe stress tests are regressing during mir interpretation. Shouldn't we get less code and thus do less work? About the |
debug!("tuple_iter: {:?}", tuple_iter); | ||
|
||
let statementkinds1 = body.basic_blocks()[item1].statements.iter().map(|x| &x.kind); | ||
let statementkinds2 = body.basic_blocks()[item2].statements.iter().map(|x| &x.kind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you can gain some performance by moving these two lines out of the while loop, so we're not evaluating them many times?
// - This is needed for itertools::group_by which only assigns consecutive elements to the same group | ||
// 2. Group by (length of statements, TerminatorKind) | ||
// 3. Compare StatementKinds pairwise inside the group | ||
// - This is technically O(n²), but `n` should be small in most cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the reason why the unicode_normalization
benchmark is slow is that n
is actually large, I wonder if we know why that is? Maybe a large table of constants or something?
|
||
let mut tuple_iter = TupleCombinationsWithSkip::new(group); | ||
debug!("tuple_iter: {:?}", tuple_iter); | ||
while let Some((item1, item2)) = tuple_iter.next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely the most common case are (len statements, terminator kind)
groups of size one, right? Could it be beneficial to filter them out early, since there obviously can't be any duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I have added a filtering group size > 1 in the latest commits
The pass should be a lot faster with the latest changes. Can I get another perf run? |
We discussed it at the compiler team meeting. A few points came up
|
Yeah, I agree.
This would be interesting to check, yeah. The runtime benchmarks would have to be cherry-picked though to actually have duplicated blocks. rustc-perf Webrender triggered the deduplication removed ~2% of statements last i tried, but i don't see Webrender having any benchmarks that can easily be run. Any ideas?
Ignoring spans and debug info would be a great improvement to finding duplicates for sure. I'm not sure how to do deduplication without throwing some span away. How would the pass know which of multiple spans to preserve?
Until clear compile-time wins (or runtime) improvements are seen, yeah, it should. I guess next step would be to figure out how to ignore spans, while not throwing information away. For what is Span used for if a local is not used for debuginfo? Constants, for instance, seem to get different spans quite often. See https://godbolt.org/z/YK6j8W . |
I had a similar issue with MIR inlining. All the spans were pointing into the inlined MIR body, and all I could do was rewrite them to point at the callsite where it was inlined. Neither solution was good. So I implemented another expansion variant. Look at the ui test diff (all the way at the bottom if the github link doesn't work) to see how such a span could be rendered. I think in this situation, since there is no hierarchy of spans, I would create a span that maybe points to whatever the two merged spans have in common, and if they do not have this, make an arbitrary choice for the root span. Then the two spans can be put into an |
So for the quickest way forward, let's put it into mir-opt-level 3 and merge it, then figure out the span and source_info handling (which we should probably do for more optimizations anyway). |
Ping from triage: @simonvandel can you post your status on this PR and resolve the merge conflicts? Thank you |
97a4b4b
to
149badd
Compare
149badd
to
bd989b9
Compare
bd989b9
to
2d1e0ad
Compare
@oli-obk rebased on master, and gated the deduplication behind mir-opt-level >= 3 |
@bors r+ rollup=never |
📌 Commit 2d1e0ad has been approved by |
☀️ Test successful - checks-actions |
This pass finds basic blocks that are completely equal,
and replaces all uses with just one of them.