-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6x slower compilation for "simple" file than C #43578
Comments
These need to be inlined across crates to avoid showing up as one-instruction functions in profiles! In the benchmark from rust-lang#43578 this decreased the translation item collection step from 30s to 23s, and looks like it also allowed vectorization elsewhere of the operations!
I fixed an "obvious omission" in #43581 but I'm not sure why we're creating such massive bit sets in the first place |
This also peaks at using 32GB (!) of memory, a massif dump here, the peak of which looks like this |
cc @arielb1 because of ElaborateDrops. Oh interesting, optimizing MIR now shows up as part of item collection because of on-demand. |
…woerister rustc: Inline bitwise modification operators These need to be inlined across crates to avoid showing up as one-instruction functions in profiles! In the benchmark from rust-lang#43578 this decreased the translation item collection step from 30s to 23s, and looks like it also allowed vectorization elsewhere of the operations!
Avoid exhausting stack space in dominator compression Doesn't add a test case -- I ended up running into this while playing with the generated example from rust-lang#43578, which we could do with a run-make test (to avoid checking a large code snippet into tree), but I suspect we don't want to wait for it to compile (locally it takes ~14s -- not terrible, but doesn't seem worth it to me). In practice stack space exhaustion is difficult to test for, too, since if we set the bound too low a different call structure above us (e.g., a nearer ensure_sufficient_stack call) would let the test pass even with the old impl, most likely. Locally it seems like this manages to perform approximately equivalently to the recursion, but will run perf to confirm.
Avoid exhausting stack space in dominator compression Doesn't add a test case -- I ended up running into this while playing with the generated example from rust-lang#43578, which we could do with a run-make test (to avoid checking a large code snippet into tree), but I suspect we don't want to wait for it to compile (locally it takes ~14s -- not terrible, but doesn't seem worth it to me). In practice stack space exhaustion is difficult to test for, too, since if we set the bound too low a different call structure above us (e.g., a nearer ensure_sufficient_stack call) would let the test pass even with the old impl, most likely. Locally it seems like this manages to perform approximately equivalently to the recursion, but will run perf to confirm.
Nightly rustc takes 16 seconds, vs. 12 seconds with clang-14. Looking over a profile I don't see any clear hot spots where we're doing obviously wasted/unnecessary work, so I think we can probably close this as no longer an interesting case study. |
In attempting to find some local hot spots in rustc I've been playing around with various benchmarks. If you use this script to generate a Rust and a C file (which should be equivalent)
I've benchmarked with:
The
-Z time-passes
output is particularly illuminating, the highest portions being:It looks like the main slowdown of the the translation item collection is related to the
ElaborateDrops
pass?The text was updated successfully, but these errors were encountered: