6x slower compilation for "simple" file than C #43578

alexcrichton · 2017-08-01T00:51:35Z

In attempting to find some local hot spots in rustc I've been playing around with various benchmarks. If you use this script to generate a Rust and a C file (which should be equivalent)

N=200000

echo > lots-of-used.rs
for i in `seq 1 $N`; do
  echo "pub fn foo$i() {}" >> lots-of-used.rs
done
echo 'fn main() {' >> lots-of-used.rs
for i in `seq 1 $N`; do
  echo "foo$i();" >> lots-of-used.rs
done
echo '}' >> lots-of-used.rs

echo > lots-of-used.c
for i in `seq 1 $N`; do
  echo "static void foo$i() {}" >> lots-of-used.c
done
echo 'void foo() {' >> lots-of-used.c
for i in `seq 1 $N`; do
  echo "foo$i();" >> lots-of-used.c
done
echo '}' >> lots-of-used.c

I've benchmarked with:

$ rustc +beta -V
rustc 1.20.0-beta.1 (e93aa3aa8 2017-07-18)
$ sh foo.sh
$ time clang -c lots-of-used.c
clang -c lots-of-used.c  14.90s user 0.36s system 99% cpu 15.259 total
$ time rustc +beta lots-of-used.rs --emit obj --crate-type rlib
rustc +beta lots-of-used.rs --emit obj --crate-type rlib  49.62s user 37.03s system 99% cpu 1:26.96 total

The -Z time-passes output is particularly illuminating, the highest portions being:

time: 57.439; rss: 2890MB translation
  time: 51.813; rss: 2497MB translation item collection
time: 10.094; rss: 534MB LLVM passes
  time: 9.238; rss: 536MB       codegen passes [1]
time: 5.100; rss: 2405MB borrow checking
time: 4.029; rss: 1656MB        item-bodies checking
time: 2.166; rss: 1781MB const checking
time: 1.543; rss: 1069MB wf checking

It looks like the main slowdown of the the translation item collection is related to the ElaborateDrops pass?

The text was updated successfully, but these errors were encountered:

These need to be inlined across crates to avoid showing up as one-instruction functions in profiles! In the benchmark from rust-lang#43578 this decreased the translation item collection step from 30s to 23s, and looks like it also allowed vectorization elsewhere of the operations!

alexcrichton · 2017-08-01T01:57:09Z

I fixed an "obvious omission" in #43581 but I'm not sure why we're creating such massive bit sets in the first place

alexcrichton · 2017-08-01T02:07:04Z

This also peaks at using 32GB (!) of memory, a massif dump here, the peak of which looks like this

michaelwoerister · 2017-08-01T08:34:09Z

cc @arielb1 because of ElaborateDrops.

Oh interesting, optimizing MIR now shows up as part of item collection because of on-demand.

…woerister rustc: Inline bitwise modification operators These need to be inlined across crates to avoid showing up as one-instruction functions in profiles! In the benchmark from rust-lang#43578 this decreased the translation item collection step from 30s to 23s, and looks like it also allowed vectorization elsewhere of the operations!

Avoid exhausting stack space in dominator compression Doesn't add a test case -- I ended up running into this while playing with the generated example from rust-lang#43578, which we could do with a run-make test (to avoid checking a large code snippet into tree), but I suspect we don't want to wait for it to compile (locally it takes ~14s -- not terrible, but doesn't seem worth it to me). In practice stack space exhaustion is difficult to test for, too, since if we set the bound too low a different call structure above us (e.g., a nearer ensure_sufficient_stack call) would let the test pass even with the old impl, most likely. Locally it seems like this manages to perform approximately equivalently to the recursion, but will run perf to confirm.

Mark-Simulacrum · 2024-01-14T18:51:22Z

Nightly rustc takes 16 seconds, vs. 12 seconds with clang-14. Looking over a profile I don't see any clear hot spots where we're doing obviously wasted/unnecessary work, so I think we can probably close this as no longer an interesting case study.

alexcrichton added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 1, 2017

alexcrichton changed the title ~~3x slower compilation for "simple" file than C~~ 6x slower compilation for "simple" file than C Aug 1, 2017

alexcrichton mentioned this issue Aug 1, 2017

rustc: Inline bitwise modification operators #43581

Merged

Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Aug 3, 2017

pnkfelix added the I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. label Apr 12, 2019

Mark-Simulacrum mentioned this issue Feb 23, 2022

Avoid exhausting stack space in dominator compression #94306

Merged

Mark-Simulacrum closed this as completed Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6x slower compilation for "simple" file than C #43578

6x slower compilation for "simple" file than C #43578

alexcrichton commented Aug 1, 2017

alexcrichton commented Aug 1, 2017

alexcrichton commented Aug 1, 2017

michaelwoerister commented Aug 1, 2017

Mark-Simulacrum commented Jan 14, 2024

6x slower compilation for "simple" file than C #43578

6x slower compilation for "simple" file than C #43578

Comments

alexcrichton commented Aug 1, 2017

alexcrichton commented Aug 1, 2017

alexcrichton commented Aug 1, 2017

michaelwoerister commented Aug 1, 2017

Mark-Simulacrum commented Jan 14, 2024