-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match checking has quadratic average complexity #7462
Comments
Currently, scopes are tied to LLVM basic blocks. For each scope, there are two new basic blocks, which means two extra jumps in the unoptimized IR. These blocks aren't actually required, but only used to act as the boundary for cleanups. By keeping track of the current scope within a single basic block, we can avoid those extra blocks and jumps, shrinking the pre-optimization IR quite considerably. For example, the IR for trans_intrinsic goes from ~22k lines to ~16k lines, almost 30% less. The impact on the build times of optimized builds is rather small (about 1%), but unoptimized builds are about 11% faster. The testsuite for unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on my i7) faster. Also, in some situations this helps LLVM to generate better code by inlining functions that it previously considered to be too large. Likely because of the pointless blocks/jumps that were still present at the time the inlining pass runs. Refs rust-lang#7462
Currently, scopes are tied to LLVM basic blocks. For each scope, there are two new basic blocks, which means two extra jumps in the unoptimized IR. These blocks aren't actually required, but only used to act as the boundary for cleanups. By keeping track of the current scope within a single basic block, we can avoid those extra blocks and jumps, shrinking the pre-optimization IR quite considerably. For example, the IR for trans_intrinsic goes from ~22k lines to ~16k lines, almost 30% less. The impact on the build times of optimized builds is rather small (about 1%), but unoptimized builds are about 11% faster. The testsuite for unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on my i7) faster. Also, in some situations this helps LLVM to generate better code by inlining functions that it previously considered to be too large. Likely because of the pointless blocks/jumps that were still present at the time the inlining pass runs. Refs #7462
/cc @toddaaro |
/cc @msullivan |
I have whined a lot about match implementations that suffer from combinatorial explosion, so maybe that is why you thought of me. This Scala issue is probably unrelated, but maybe reading about the experience will prove useful to a Rust developer someday. |
Triage: did anyone determine if |
@steveklabnik I don't know of any changes to |
Yes, |
@cmr if you have a moment, can you explain how you checked this out? Future triage-rs will be thankful 😄 |
|
trans-ing a
(The python invocation prints code like |
Results of @huonw's script above with
With
MIR trans looks to be a lot better at this case. |
MIR trans's algorithm is indeed very careful to generate a linear number of basic blocks. |
Match checking currently takes the most time, which is rather unsurprising since I think it's O(n^2) code. So I'm going to leave this open for the time being, though realistically matches with this many arms are... excessive, but I plan on taking a look at the code and seeing if it's possible to optimize it a little soon, so I'll leave it open till then.
|
This comment has been minimized.
This comment has been minimized.
I'd like to make it clear that match checking is a NP-Complete problem, and quadratic average time is not very bad for such complex algorithm. However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead. |
Triage:
|
triage: P-medium. This is not believed to be a high priority work-item; in practice these match checking times, even for large matches, are minuscule compared to other passes in the compiler. Still, it might be a fun thing for a volunteer to attack. |
(Even this simpler step would probably be both easy and beneficial!) |
It's 2020 and $ for K in {8..16}; do
N=$((1 << K))
echo -n "$((N)): "
python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, $N)))" |
rustc - -Z time-passes |
rg --color never 'match'
done
256: time: 0.001; rss: 88MB match_checking
512: time: 0.004; rss: 89MB match_checking
1024: time: 0.013; rss: 89MB match_checking # huh, rss is almost constant up to here
2048: time: 0.046; rss: 93MB match_checking
4096: time: 0.179; rss: 96MB match_checking
8192: time: 0.715; rss: 103MB match_checking
16384: time: 3.152; rss: 118MB match_checking
32768: time: 11.660; rss: 148MB match_checking
65536: time: 55.046; rss: 206MB match_checking |
Add fast path for match checking This adds a fast path that would reduce the complexity to linear on matches consisting of only variant patterns (i.e. enum matches). (Also see: rust-lang#7462) Unfortunately, I was too lazy to add a similar fast path for constants (mostly for integer matches), ideally that could be added another day. TBH, I'm not confident with the performance claims due to the fact that enums tends to be small and FxHashMap could add a lot of overhead. r? `@Mark-Simulacrum` needs perf
The match exhaustiveness algorithm is indeed completely quadratic: for every branch, we check if it is redundant with any of the branches above. We could hack workarounds for some special cases like what #76918 did, but those usually stop being useful for non-trivial patterns, and I'm not sure they're worth it since most matches have only a few branches. (I would love some data to confirm that, if you want to help see rust-lang/rustc-perf#792) |
…se-expr-fp, r=camsteffen FP fix and documentation for `branches_sharing_code` lint Closes rust-lang/rust-clippy#7369 Related rust-lang/rust-clippy#7452 I'm still thinking about the best way to fix this. I could simply add another visitor to ensure that the moved expressions don't modify values being used in the condition, but I'm not totally happy with this due to the complexity. I therefore only documented it for now changelog: [`branches_sharing_code`] fixed false positive where block expressions would sometimes be ignored.
It's 2023 and for K in {8..16}; do
N=$((1 << K))
echo -n "$((N)): "
python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |
rustc - -Z time-passes 2>&1 |
rg --color never 'MIR_borrow_checking'
done with
which is still quadratic. I'd be curious to know if it's still match checking that drives this issue, how could I measure that? If it is indeed match checking, then I'd like to close this as wontfix. I've done what I could to reduce this quadraticness (most notably #117611) and I don't believe tackling this further would be worth it for typical matches (I have ideas but they would require a lot of allocations and increased code complexity). |
I think it's fine to close this as wontfix. Also, thanks for your work on rewriting the check passes! It's a quite complicated problem and I was surprised you were able to make it quite a bit faster. |
Thank you! I'll close it then, and someone can reopen if they feel it's relevant |
If you look at the trans-stats for some of our modules, some "big functions" really stand out:
If you look at the code generating them, they don't immediately look like they should be taking 10x as long as other functions; but they do all seem to use
match
somewhat heavily. This suggests to me that we're still compiling matches in some way that causes a combinatorial explosion of basic blocks, or something.Investigate!
The text was updated successfully, but these errors were encountered: