Match checking has quadratic average complexity #7462

graydon · 2013-06-28T20:42:55Z

If you look at the trans-stats for some of our modules, some "big functions" really stand out:

rustc:

16838 insns, 67 ms, middle::trans::foreign::trans_intrinsic
15988 insns, 145 ms, middle::typeck::check::check_expr_with_unifier
11759 insns, 87 ms, middle::privacy::check_crate

libextra: 

5244 insns, 31 ms, terminfo::parm::expand
5073 insns, 21 ms, time::do_strptime::parse_type
4068 insns, 23 ms, time::do_strftime::parse_type
3610 insns, 73 ms, terminfo::parser::compiled::parse
2169 insns, 89 ms, net_tcp::listen_common
1952 insns, 28 ms, getopts::getopts
1937 insns, 193 ms, net_tcp::connect

If you look at the code generating them, they don't immediately look like they should be taking 10x as long as other functions; but they do all seem to use match somewhat heavily. This suggests to me that we're still compiling matches in some way that causes a combinatorial explosion of basic blocks, or something.

Investigate!

The text was updated successfully, but these errors were encountered:

Currently, scopes are tied to LLVM basic blocks. For each scope, there are two new basic blocks, which means two extra jumps in the unoptimized IR. These blocks aren't actually required, but only used to act as the boundary for cleanups. By keeping track of the current scope within a single basic block, we can avoid those extra blocks and jumps, shrinking the pre-optimization IR quite considerably. For example, the IR for trans_intrinsic goes from ~22k lines to ~16k lines, almost 30% less. The impact on the build times of optimized builds is rather small (about 1%), but unoptimized builds are about 11% faster. The testsuite for unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on my i7) faster. Also, in some situations this helps LLVM to generate better code by inlining functions that it previously considered to be too large. Likely because of the pointless blocks/jumps that were still present at the time the inlining pass runs. Refs rust-lang#7462

Currently, scopes are tied to LLVM basic blocks. For each scope, there are two new basic blocks, which means two extra jumps in the unoptimized IR. These blocks aren't actually required, but only used to act as the boundary for cleanups. By keeping track of the current scope within a single basic block, we can avoid those extra blocks and jumps, shrinking the pre-optimization IR quite considerably. For example, the IR for trans_intrinsic goes from ~22k lines to ~16k lines, almost 30% less. The impact on the build times of optimized builds is rather small (about 1%), but unoptimized builds are about 11% faster. The testsuite for unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on my i7) faster. Also, in some situations this helps LLVM to generate better code by inlining functions that it previously considered to be too large. Likely because of the pointless blocks/jumps that were still present at the time the inlining pass runs. Refs #7462

emberian · 2013-08-13T06:04:42Z

/cc @toddaaro

emberian · 2013-08-13T06:05:10Z

/cc @msullivan
I don't think it was actually toddaaro I was talking to about this...

toddaaro · 2013-08-21T17:16:08Z

I have whined a lot about match implementations that suffer from combinatorial explosion, so maybe that is why you thought of me.

This Scala issue is probably unrelated, but maybe reading about the experience will prove useful to a Rust developer someday.

https://issues.scala-lang.org/browse/SI-1133

steveklabnik · 2015-01-20T19:59:02Z

Triage: did anyone determine if match was still the issue here? I'd imagine codegen has gotten better since 2013...

emberian · 2015-01-20T20:11:53Z

@steveklabnik I don't know of any changes to match codegen that would have alleviated this, re-running the measurements...

emberian · 2015-01-20T21:10:42Z

Yes, match is still generating more code than anything else, by a significant margin.

steveklabnik · 2015-01-20T22:07:58Z

@cmr if you have a moment, can you explain how you checked this out? Future triage-rs will be thankful 😄

emberian · 2015-01-20T22:35:40Z

make RUSTFLAGS="-Z count-llvm-insns" | tee output, followed by sort -n < output > output2, then look at the bottom of output2 (ctrl-f for "trans_match")

huonw · 2016-01-05T05:41:09Z

trans-ing a match appears to be quadratic in the number of arms (at least for integers and strings):

$ for N in 500 1000 2000 4000; do
    echo $N
    python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" | 
        rustc - -Z time-passes |
        grep translation
  done
500
time: 0.030; rss: 65MB  translation
1000
time: 0.107; rss: 67MB  translation
2000
time: 0.427; rss: 70MB  translation
4000
time: 1.620; rss: 75MB  translation

(The python invocation prints code like fn main() { match 0 { 0 => {} 1 => {} 2 => {} 3 => {} 4 => {} _ => {} } } with $N + 1 arms.)

brson · 2016-06-10T06:36:59Z

Results of @huonw's script above with rustc 1.11.0-nightly (34505e222 2016-06-08):

500
time: 0.088; rss: 97MB  translation
1000
time: 0.308; rss: 100MB translation
2000
time: 1.247; rss: 101MB translation
4000
time: 4.723; rss: 106MB translation

With -Z orbit:

500
time: 0.009; rss: 95MB  translation
1000
time: 0.015; rss: 97MB  translation
2000
time: 0.017; rss: 101MB translation
4000
time: 0.030; rss: 104MB translation

MIR trans looks to be a lot better at this case.

arielb1 · 2016-06-10T22:55:50Z

@brson

MIR trans's algorithm is indeed very careful to generate a linear number of basic blocks.

Mark-Simulacrum · 2017-05-04T23:18:18Z

Match checking currently takes the most time, which is rather unsurprising since I think it's O(n^2) code. So I'm going to leave this open for the time being, though realistically matches with this many arms are... excessive, but I plan on taking a look at the code and seeing if it's possible to optimize it a little soon, so I'll leave it open till then.

$ for N in {10..100}; do echo -n "$((2**$N)): "; python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, 2 ** $N)))" | rustc - -Z time-passes | rg --color never 'match'; done
1024: time: 0.021; rss: 93MB	match checking
2048: time: 0.083; rss: 96MB	match checking
4096: time: 0.335; rss: 99MB	match checking
8192: time: 1.340; rss: 107MB	match checking
16384: time: 5.360; rss: 117MB	match checking
32768: time: 21.446; rss: 142MB	match checking
65536: time: 88.756; rss: 193MB	match checking
131072: time: 380.206; rss: 300MB	match checking

ishitatsuyuki · 2018-11-05T12:17:47Z

I'd like to make it clear that match checking is a NP-Complete problem, and quadratic average time is not very bad for such complex algorithm.

However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.

estebank · 2019-05-09T17:49:54Z

Triage:

~$ for N in 500 1000 2000 4000; do     echo $N;     python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |          rustc +devel - -Z time-passes | grep "checking 2"; done
500
  time: 0.028	misc checking 2
1000
  time: 0.087	misc checking 2
2000
  time: 0.312	misc checking 2
4000
  time: 1.220	misc checking 2

~$ for N in 500 1000 2000 4000; do     echo $N;     python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |          rustc +devel - -Z time-passes | grep "match"; done
500
    time: 0.029	rvalue promotion + match checking
1000
    time: 0.087	rvalue promotion + match checking
2000
    time: 0.313	rvalue promotion + match checking
4000
    time: 1.233	rvalue promotion + match checking

pnkfelix · 2020-02-27T21:27:46Z

triage: P-medium.

This is not believed to be a high priority work-item; in practice these match checking times, even for large matches, are minuscule compared to other passes in the compiler.

Still, it might be a fun thing for a volunteer to attack.

pnkfelix · 2020-02-27T21:28:53Z

However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.

(Even this simpler step would probably be both easy and beneficial!)

workingjubilee · 2020-09-05T21:44:52Z

It's 2020 and

$ for K in {8..16}; do
  N=$((1 << K))
  echo -n "$((N)): "
  python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, $N)))" | 
    rustc - -Z time-passes | 
    rg --color never 'match'
done
256: time: 0.001; rss: 88MB	match_checking
512: time: 0.004; rss: 89MB	match_checking
1024: time: 0.013; rss: 89MB	match_checking # huh, rss is almost constant up to here
2048: time: 0.046; rss: 93MB	match_checking
4096: time: 0.179; rss: 96MB	match_checking
8192: time: 0.715; rss: 103MB	match_checking
16384: time: 3.152; rss: 118MB	match_checking
32768: time: 11.660; rss: 148MB	match_checking
65536: time: 55.046; rss: 206MB	match_checking

Add fast path for match checking This adds a fast path that would reduce the complexity to linear on matches consisting of only variant patterns (i.e. enum matches). (Also see: rust-lang#7462) Unfortunately, I was too lazy to add a similar fast path for constants (mostly for integer matches), ideally that could be added another day. TBH, I'm not confident with the performance claims due to the fact that enums tends to be small and FxHashMap could add a lot of overhead. r? `@Mark-Simulacrum` needs perf

Nadrieril · 2020-11-06T20:46:27Z

The match exhaustiveness algorithm is indeed completely quadratic: for every branch, we check if it is redundant with any of the branches above. We could hack workarounds for some special cases like what #76918 did, but those usually stop being useful for non-trivial patterns, and I'm not sure they're worth it since most matches have only a few branches. (I would love some data to confirm that, if you want to help see rust-lang/rustc-perf#792)
A real solution would involve a pretty drastic redesign of the algorithm. I have been thinking about that in my spare time, but as @ishitatsuyuki mentioned, it's not easy.

…se-expr-fp, r=camsteffen FP fix and documentation for `branches_sharing_code` lint Closes rust-lang/rust-clippy#7369 Related rust-lang/rust-clippy#7452 I'm still thinking about the best way to fix this. I could simply add another visitor to ensure that the moved expressions don't modify values being used in the condition, but I'm not totally happy with this due to the complexity. I therefore only documented it for now changelog: [`branches_sharing_code`] fixed false positive where block expressions would sometimes be ignored.

Nadrieril · 2023-12-01T03:13:20Z

It's 2023 and -Z time-passes no longer has translation or match-checking. The only thing that's affected by huge matches is MIR_borrow_checking, which assume includes match checking and lowering. If I run:

for K in {8..16}; do
  N=$((1 << K))
  echo -n "$((N)): "
  python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |
    rustc - -Z time-passes 2>&1 |
    rg --color never 'MIR_borrow_checking'
done

with rustc 1.76.0-nightly (87e1447aa 2023-11-30) I get

256: time:   0.002; rss:   58MB ->   62MB (   +3MB)     MIR_borrow_checking
512: time:   0.005; rss:   59MB ->   64MB (   +5MB)     MIR_borrow_checking
1024: time:   0.014; rss:   60MB ->   65MB (   +5MB)    MIR_borrow_checking
2048: time:   0.049; rss:   62MB ->   69MB (   +7MB)    MIR_borrow_checking
4096: time:   0.161; rss:   65MB ->   76MB (  +12MB)    MIR_borrow_checking
8192: time:   0.599; rss:   72MB ->   92MB (  +20MB)    MIR_borrow_checking
16384: time:   2.309; rss:   85MB ->  110MB (  +25MB)   MIR_borrow_checking
32768: time:   9.270; rss:  114MB ->  161MB (  +47MB)   MIR_borrow_checking
65536: time:  48.641; rss:  154MB ->  242MB (  +88MB)   MIR_borrow_checking

which is still quadratic. I'd be curious to know if it's still match checking that drives this issue, how could I measure that?

If it is indeed match checking, then I'd like to close this as wontfix. I've done what I could to reduce this quadraticness (most notably #117611) and I don't believe tackling this further would be worth it for typical matches (I have ideas but they would require a lot of allocations and increased code complexity).

ishitatsuyuki · 2023-12-01T03:16:31Z

I think it's fine to close this as wontfix. Also, thanks for your work on rewriting the check passes! It's a quite complicated problem and I was surprised you were able to make it quite a bit faster.

Nadrieril · 2023-12-01T03:47:41Z

Thank you! I'll close it then, and someone can reopen if they feel it's relevant

Aatch mentioned this issue Jun 28, 2013

Improve LLVM IR Code generation #6819

Closed

dotdash mentioned this issue Jul 7, 2013

Implement scopes independent of LLVM basic blocks #7636

Closed

Mark-Simulacrum mentioned this issue May 20, 2017

Suggestion: make string/slice more efficient with match #39525

Open

Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 19, 2017

ishitatsuyuki changed the title ~~Code generated for match appears to have combinatorial problems~~ Match checking has quadratic average complexity Apr 22, 2018

This comment has been minimized.

Sign in to view

estebank mentioned this issue May 10, 2019

MIR building for match $c { true => $i, _ => $e } is slow #60571

Closed

jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 31, 2020

pnkfelix added the P-medium Medium priority label Feb 27, 2020

jonas-schievink added the A-patterns Relating to patterns and pattern matching label Sep 5, 2020

ishitatsuyuki mentioned this issue Sep 19, 2020

Add fast path for match checking #76918

Merged

Nadrieril closed this as not planned Won't fix, can't repro, duplicate, stale Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match checking has quadratic average complexity #7462

Match checking has quadratic average complexity #7462

graydon commented Jun 28, 2013

emberian commented Aug 13, 2013

emberian commented Aug 13, 2013

toddaaro commented Aug 21, 2013

steveklabnik commented Jan 20, 2015

emberian commented Jan 20, 2015

emberian commented Jan 20, 2015

steveklabnik commented Jan 20, 2015

emberian commented Jan 20, 2015

huonw commented Jan 5, 2016

brson commented Jun 10, 2016

arielb1 commented Jun 10, 2016

Mark-Simulacrum commented May 4, 2017

This comment has been minimized.

ishitatsuyuki commented Nov 5, 2018

estebank commented May 9, 2019 •

edited

Loading

pnkfelix commented Feb 27, 2020

pnkfelix commented Feb 27, 2020

workingjubilee commented Sep 5, 2020

Nadrieril commented Nov 6, 2020 •

edited

Loading

Nadrieril commented Dec 1, 2023

ishitatsuyuki commented Dec 1, 2023

Nadrieril commented Dec 1, 2023

Match checking has quadratic average complexity #7462

Match checking has quadratic average complexity #7462

Comments

graydon commented Jun 28, 2013

emberian commented Aug 13, 2013

emberian commented Aug 13, 2013

toddaaro commented Aug 21, 2013

steveklabnik commented Jan 20, 2015

emberian commented Jan 20, 2015

emberian commented Jan 20, 2015

steveklabnik commented Jan 20, 2015

emberian commented Jan 20, 2015

huonw commented Jan 5, 2016

brson commented Jun 10, 2016

arielb1 commented Jun 10, 2016

Mark-Simulacrum commented May 4, 2017

This comment has been minimized.

ishitatsuyuki commented Nov 5, 2018

estebank commented May 9, 2019 • edited Loading

pnkfelix commented Feb 27, 2020

pnkfelix commented Feb 27, 2020

workingjubilee commented Sep 5, 2020

Nadrieril commented Nov 6, 2020 • edited Loading

Nadrieril commented Dec 1, 2023

ishitatsuyuki commented Dec 1, 2023

Nadrieril commented Dec 1, 2023

estebank commented May 9, 2019 •

edited

Loading

Nadrieril commented Nov 6, 2020 •

edited

Loading