Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6x slower compilation for "simple" file than C #43578

Closed
alexcrichton opened this issue Aug 1, 2017 · 4 comments
Closed

6x slower compilation for "simple" file than C #43578

alexcrichton opened this issue Aug 1, 2017 · 4 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

In attempting to find some local hot spots in rustc I've been playing around with various benchmarks. If you use this script to generate a Rust and a C file (which should be equivalent)

N=200000

echo > lots-of-used.rs
for i in `seq 1 $N`; do
  echo "pub fn foo$i() {}" >> lots-of-used.rs
done
echo 'fn main() {' >> lots-of-used.rs
for i in `seq 1 $N`; do
  echo "foo$i();" >> lots-of-used.rs
done
echo '}' >> lots-of-used.rs

echo > lots-of-used.c
for i in `seq 1 $N`; do
  echo "static void foo$i() {}" >> lots-of-used.c
done
echo 'void foo() {' >> lots-of-used.c
for i in `seq 1 $N`; do
  echo "foo$i();" >> lots-of-used.c
done
echo '}' >> lots-of-used.c

I've benchmarked with:

$ rustc +beta -V
rustc 1.20.0-beta.1 (e93aa3aa8 2017-07-18)
$ sh foo.sh
$ time clang -c lots-of-used.c
clang -c lots-of-used.c  14.90s user 0.36s system 99% cpu 15.259 total
$ time rustc +beta lots-of-used.rs --emit obj --crate-type rlib
rustc +beta lots-of-used.rs --emit obj --crate-type rlib  49.62s user 37.03s system 99% cpu 1:26.96 total

The -Z time-passes output is particularly illuminating, the highest portions being:

time: 57.439; rss: 2890MB translation
  time: 51.813; rss: 2497MB translation item collection
time: 10.094; rss: 534MB LLVM passes
  time: 9.238; rss: 536MB       codegen passes [1]
time: 5.100; rss: 2405MB borrow checking
time: 4.029; rss: 1656MB        item-bodies checking
time: 2.166; rss: 1781MB const checking
time: 1.543; rss: 1069MB wf checking

https://i.imgur.com/IRBwT69.png

It looks like the main slowdown of the the translation item collection is related to the ElaborateDrops pass?

@alexcrichton alexcrichton added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 1, 2017
@alexcrichton alexcrichton changed the title 3x slower compilation for "simple" file than C 6x slower compilation for "simple" file than C Aug 1, 2017
alexcrichton added a commit to alexcrichton/rust that referenced this issue Aug 1, 2017
These need to be inlined across crates to avoid showing up as one-instruction
functions in profiles! In the benchmark from rust-lang#43578 this decreased the
translation item collection step from 30s to 23s, and looks like it also allowed
vectorization elsewhere of the operations!
@alexcrichton
Copy link
Member Author

I fixed an "obvious omission" in #43581 but I'm not sure why we're creating such massive bit sets in the first place

@alexcrichton
Copy link
Member Author

This also peaks at using 32GB (!) of memory, a massif dump here, the peak of which looks like this

@michaelwoerister
Copy link
Member

cc @arielb1 because of ElaborateDrops.

Oh interesting, optimizing MIR now shows up as part of item collection because of on-demand.

frewsxcv added a commit to frewsxcv/rust that referenced this issue Aug 2, 2017
…woerister

rustc: Inline bitwise modification operators

These need to be inlined across crates to avoid showing up as one-instruction
functions in profiles! In the benchmark from rust-lang#43578 this decreased the
translation item collection step from 30s to 23s, and looks like it also allowed
vectorization elsewhere of the operations!
@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Aug 3, 2017
@pnkfelix pnkfelix added the I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. label Apr 12, 2019
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Feb 26, 2022
Avoid exhausting stack space in dominator compression

Doesn't add a test case -- I ended up running into this while playing with the generated example from rust-lang#43578, which we could do with a run-make test (to avoid checking a large code snippet into tree), but I suspect we don't want to wait for it to compile (locally it takes ~14s -- not terrible, but doesn't seem worth it to me). In practice stack space exhaustion is difficult to test for, too, since if we set the bound too low a different call structure above us (e.g., a nearer ensure_sufficient_stack call) would let the test pass even with the old impl, most likely.

Locally it seems like this manages to perform approximately equivalently to the recursion, but will run perf to confirm.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 26, 2022
Avoid exhausting stack space in dominator compression

Doesn't add a test case -- I ended up running into this while playing with the generated example from rust-lang#43578, which we could do with a run-make test (to avoid checking a large code snippet into tree), but I suspect we don't want to wait for it to compile (locally it takes ~14s -- not terrible, but doesn't seem worth it to me). In practice stack space exhaustion is difficult to test for, too, since if we set the bound too low a different call structure above us (e.g., a nearer ensure_sufficient_stack call) would let the test pass even with the old impl, most likely.

Locally it seems like this manages to perform approximately equivalently to the recursion, but will run perf to confirm.
@Mark-Simulacrum
Copy link
Member

Nightly rustc takes 16 seconds, vs. 12 seconds with clang-14. Looking over a profile I don't see any clear hot spots where we're doing obviously wasted/unnecessary work, so I think we can probably close this as no longer an interesting case study.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants