Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match checking has quadratic average complexity #7462

Closed
graydon opened this issue Jun 28, 2013 · 22 comments
Closed

Match checking has quadratic average complexity #7462

graydon opened this issue Jun 28, 2013 · 22 comments
Labels
A-patterns Relating to patterns and pattern matching C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@graydon
Copy link
Contributor

graydon commented Jun 28, 2013

If you look at the trans-stats for some of our modules, some "big functions" really stand out:

rustc:

16838 insns, 67 ms, middle::trans::foreign::trans_intrinsic
15988 insns, 145 ms, middle::typeck::check::check_expr_with_unifier
11759 insns, 87 ms, middle::privacy::check_crate

libextra: 

5244 insns, 31 ms, terminfo::parm::expand
5073 insns, 21 ms, time::do_strptime::parse_type
4068 insns, 23 ms, time::do_strftime::parse_type
3610 insns, 73 ms, terminfo::parser::compiled::parse
2169 insns, 89 ms, net_tcp::listen_common
1952 insns, 28 ms, getopts::getopts
1937 insns, 193 ms, net_tcp::connect

If you look at the code generating them, they don't immediately look like they should be taking 10x as long as other functions; but they do all seem to use match somewhat heavily. This suggests to me that we're still compiling matches in some way that causes a combinatorial explosion of basic blocks, or something.

Investigate!

dotdash added a commit to dotdash/rust that referenced this issue Jul 7, 2013
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.

By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.

The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.

Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.

Refs rust-lang#7462
bors added a commit that referenced this issue Jul 7, 2013
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.

By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.

The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.

Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.

Refs #7462
@emberian
Copy link
Member

/cc @toddaaro

@emberian
Copy link
Member

/cc @msullivan
I don't think it was actually toddaaro I was talking to about this...

@toddaaro
Copy link
Contributor

I have whined a lot about match implementations that suffer from combinatorial explosion, so maybe that is why you thought of me.

This Scala issue is probably unrelated, but maybe reading about the experience will prove useful to a Rust developer someday.

https://issues.scala-lang.org/browse/SI-1133

@steveklabnik
Copy link
Member

Triage: did anyone determine if match was still the issue here? I'd imagine codegen has gotten better since 2013...

@emberian
Copy link
Member

@steveklabnik I don't know of any changes to match codegen that would have alleviated this, re-running the measurements...

@emberian
Copy link
Member

Yes, match is still generating more code than anything else, by a significant margin.

@steveklabnik
Copy link
Member

@cmr if you have a moment, can you explain how you checked this out? Future triage-rs will be thankful 😄

@emberian
Copy link
Member

make RUSTFLAGS="-Z count-llvm-insns" | tee output, followed by sort -n < output > output2, then look at the bottom of output2 (ctrl-f for "trans_match")

@huonw
Copy link
Member

huonw commented Jan 5, 2016

trans-ing a match appears to be quadratic in the number of arms (at least for integers and strings):

$ for N in 500 1000 2000 4000; do
    echo $N
    python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" | 
        rustc - -Z time-passes |
        grep translation
  done
500
time: 0.030; rss: 65MB  translation
1000
time: 0.107; rss: 67MB  translation
2000
time: 0.427; rss: 70MB  translation
4000
time: 1.620; rss: 75MB  translation

(The python invocation prints code like fn main() { match 0 { 0 => {} 1 => {} 2 => {} 3 => {} 4 => {} _ => {} } } with $N + 1 arms.)

@brson
Copy link
Contributor

brson commented Jun 10, 2016

Results of @huonw's script above with rustc 1.11.0-nightly (34505e222 2016-06-08):

500
time: 0.088; rss: 97MB  translation
1000
time: 0.308; rss: 100MB translation
2000
time: 1.247; rss: 101MB translation
4000
time: 4.723; rss: 106MB translation

With -Z orbit:

500
time: 0.009; rss: 95MB  translation
1000
time: 0.015; rss: 97MB  translation
2000
time: 0.017; rss: 101MB translation
4000
time: 0.030; rss: 104MB translation

MIR trans looks to be a lot better at this case.

@arielb1
Copy link
Contributor

arielb1 commented Jun 10, 2016

@brson

MIR trans's algorithm is indeed very careful to generate a linear number of basic blocks.

@Mark-Simulacrum
Copy link
Member

Match checking currently takes the most time, which is rather unsurprising since I think it's O(n^2) code. So I'm going to leave this open for the time being, though realistically matches with this many arms are... excessive, but I plan on taking a look at the code and seeing if it's possible to optimize it a little soon, so I'll leave it open till then.

$ for N in {10..100}; do echo -n "$((2**$N)): "; python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, 2 ** $N)))" | rustc - -Z time-passes | rg --color never 'match'; done
1024: time: 0.021; rss: 93MB	match checking
2048: time: 0.083; rss: 96MB	match checking
4096: time: 0.335; rss: 99MB	match checking
8192: time: 1.340; rss: 107MB	match checking
16384: time: 5.360; rss: 117MB	match checking
32768: time: 21.446; rss: 142MB	match checking
65536: time: 88.756; rss: 193MB	match checking
131072: time: 380.206; rss: 300MB	match checking

@Mark-Simulacrum Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 19, 2017
@ishitatsuyuki ishitatsuyuki changed the title Code generated for match appears to have combinatorial problems Match checking has quadratic average complexity Apr 22, 2018
@ishitatsuyuki

This comment has been minimized.

@ishitatsuyuki
Copy link
Contributor

I'd like to make it clear that match checking is a NP-Complete problem, and quadratic average time is not very bad for such complex algorithm.

However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.

@estebank
Copy link
Contributor

estebank commented May 9, 2019

Triage:

~$ for N in 500 1000 2000 4000; do     echo $N;     python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |          rustc +devel - -Z time-passes | grep "checking 2"; done
500
  time: 0.028	misc checking 2
1000
  time: 0.087	misc checking 2
2000
  time: 0.312	misc checking 2
4000
  time: 1.220	misc checking 2
~$ for N in 500 1000 2000 4000; do     echo $N;     python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |          rustc +devel - -Z time-passes | grep "match"; done
500
    time: 0.029	rvalue promotion + match checking
1000
    time: 0.087	rvalue promotion + match checking
2000
    time: 0.313	rvalue promotion + match checking
4000
    time: 1.233	rvalue promotion + match checking

@jonas-schievink jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 31, 2020
@pnkfelix
Copy link
Member

triage: P-medium.

This is not believed to be a high priority work-item; in practice these match checking times, even for large matches, are minuscule compared to other passes in the compiler.

Still, it might be a fun thing for a volunteer to attack.

@pnkfelix pnkfelix added the P-medium Medium priority label Feb 27, 2020
@pnkfelix
Copy link
Member

However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.

(Even this simpler step would probably be both easy and beneficial!)

@workingjubilee
Copy link
Member

It's 2020 and

$ for K in {8..16}; do
  N=$((1 << K))
  echo -n "$((N)): "
  python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, $N)))" | 
    rustc - -Z time-passes | 
    rg --color never 'match'
done
256: time: 0.001; rss: 88MB	match_checking
512: time: 0.004; rss: 89MB	match_checking
1024: time: 0.013; rss: 89MB	match_checking # huh, rss is almost constant up to here
2048: time: 0.046; rss: 93MB	match_checking
4096: time: 0.179; rss: 96MB	match_checking
8192: time: 0.715; rss: 103MB	match_checking
16384: time: 3.152; rss: 118MB	match_checking
32768: time: 11.660; rss: 148MB	match_checking
65536: time: 55.046; rss: 206MB	match_checking

@jonas-schievink jonas-schievink added the A-patterns Relating to patterns and pattern matching label Sep 5, 2020
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 24, 2020
Add fast path for match checking

This adds a fast path that would reduce the complexity to linear on matches consisting of only variant patterns (i.e. enum matches). (Also see: rust-lang#7462) Unfortunately, I was too lazy to add a similar fast path for constants (mostly for integer matches), ideally that could be added another day.

TBH, I'm not confident with the performance claims due to the fact that enums tends to be small and FxHashMap could add a lot of overhead.

r? `@Mark-Simulacrum`

needs perf
@Nadrieril
Copy link
Member

Nadrieril commented Nov 6, 2020

The match exhaustiveness algorithm is indeed completely quadratic: for every branch, we check if it is redundant with any of the branches above. We could hack workarounds for some special cases like what #76918 did, but those usually stop being useful for non-trivial patterns, and I'm not sure they're worth it since most matches have only a few branches. (I would love some data to confirm that, if you want to help see rust-lang/rustc-perf#792)
A real solution would involve a pretty drastic redesign of the algorithm. I have been thinking about that in my spare time, but as @ishitatsuyuki mentioned, it's not easy.

flip1995 pushed a commit to flip1995/rust that referenced this issue Jul 15, 2021
…se-expr-fp, r=camsteffen

FP fix and documentation for `branches_sharing_code` lint

Closes rust-lang/rust-clippy#7369

Related rust-lang/rust-clippy#7452 I'm still thinking about the best way to fix this. I could simply add another visitor to ensure that the moved expressions don't modify values being used in the condition, but I'm not totally happy with this due to the complexity. I therefore only documented it for now

changelog: [`branches_sharing_code`] fixed false positive where block expressions would sometimes be ignored.
@Nadrieril
Copy link
Member

It's 2023 and -Z time-passes no longer has translation or match-checking. The only thing that's affected by huge matches is MIR_borrow_checking, which assume includes match checking and lowering. If I run:

for K in {8..16}; do
  N=$((1 << K))
  echo -n "$((N)): "
  python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |
    rustc - -Z time-passes 2>&1 |
    rg --color never 'MIR_borrow_checking'
done

with rustc 1.76.0-nightly (87e1447aa 2023-11-30) I get

256: time:   0.002; rss:   58MB ->   62MB (   +3MB)     MIR_borrow_checking
512: time:   0.005; rss:   59MB ->   64MB (   +5MB)     MIR_borrow_checking
1024: time:   0.014; rss:   60MB ->   65MB (   +5MB)    MIR_borrow_checking
2048: time:   0.049; rss:   62MB ->   69MB (   +7MB)    MIR_borrow_checking
4096: time:   0.161; rss:   65MB ->   76MB (  +12MB)    MIR_borrow_checking
8192: time:   0.599; rss:   72MB ->   92MB (  +20MB)    MIR_borrow_checking
16384: time:   2.309; rss:   85MB ->  110MB (  +25MB)   MIR_borrow_checking
32768: time:   9.270; rss:  114MB ->  161MB (  +47MB)   MIR_borrow_checking
65536: time:  48.641; rss:  154MB ->  242MB (  +88MB)   MIR_borrow_checking

which is still quadratic. I'd be curious to know if it's still match checking that drives this issue, how could I measure that?

If it is indeed match checking, then I'd like to close this as wontfix. I've done what I could to reduce this quadraticness (most notably #117611) and I don't believe tackling this further would be worth it for typical matches (I have ideas but they would require a lot of allocations and increased code complexity).

@ishitatsuyuki
Copy link
Contributor

I think it's fine to close this as wontfix. Also, thanks for your work on rewriting the check passes! It's a quite complicated problem and I was surprised you were able to make it quite a bit faster.

@Nadrieril
Copy link
Member

Thank you! I'll close it then, and someone can reopen if they feel it's relevant

@Nadrieril Nadrieril closed this as not planned Won't fix, can't repro, duplicate, stale Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-patterns Relating to patterns and pattern matching C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests