Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive compiler perf regression #31157

Closed
Gankra opened this issue Jan 24, 2016 · 27 comments
Closed

Massive compiler perf regression #31157

Gankra opened this issue Jan 24, 2016 · 27 comments
Assignees
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Gankra
Copy link
Contributor

Gankra commented Jan 24, 2016

First reported on Reddit here. See that post for more details and examples.

This crate has gone from a build time of 17 seconds with the compiler consuming 168MB of RAM at peak to 320 secs and 1670MB. Apparently this regression occurred somewhere between jan 15-18, and is in the current beta. The author hypothesizes that it may have to do with complex usage of traits.

@Gankra Gankra added I-slow Issue: Problems and improvements with respect to performance of generated code. A-compiler labels Jan 24, 2016
@dirk
Copy link
Contributor

dirk commented Jan 24, 2016

Quoting a comment from Reddit (see here) so it doesn't get lost:

Paging /u/arielby and /u/nikomatsakis, they always know what's going on in the type checker

EDIT: Does the Jan 18 nightly contain the obligation forest (#30533), by any chance?

@aturon
Copy link
Member

aturon commented Jan 24, 2016

cc @nikomatsakis @arielb1

@White-Oak
Copy link
Contributor

+1 on this (for reference output of cargo rustc -- -Z time-passes). When talking about usage of complex traits (example, once again, here), it was always been slow, but on an acceptible rate. The author of library even goes about that in a README.md:

Be aware, due to an ongoing issue with rustc, the compile time of your code will exponentially increase with the complexity of your parsers. In practice I've found things get bad after about 10 combinations or so. You can get around this by boxing a parser...

The issue he mentioned is #22204. This (current) may be considered duplicate of that, though, again, the problem was not of the current magnitude.

The author considers boxing to help:

This "flattens" the type signature of the parser into a trait object, which will improve compile-time at the cost of runtime performance due to dynamic dispath.

But, once again, it doesn't help much now.

@White-Oak
Copy link
Contributor

Ah, as well, this may be considered as a regression from stable to beta:
Code compiles rather quickly on stable 1.6, but slowly on beta 1.7. I just went further to find exact nightlies, which differ in compilation times.

@sfackler sfackler added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 24, 2016
@gereeter
Copy link
Contributor

Note that the obligation forest PR was merged on January 16, so it did get merged in the window in which the regression happened.

@nikomatsakis
Copy link
Contributor

Seems pretty likely to have to do with the obligation forest and also with the change in caching behavior that accompanied it. I can investigate some.

@MagaTailor
Copy link

Is the syntax_syntex issue related? It was always bad but now it's definitely worse.

@nikomatsakis
Copy link
Contributor

I can certainly confirm that this is traits-related; at least, the crate in question spends about 85% of its compilation time in middle::traits. I'm looking more in depth now.

@nikomatsakis
Copy link
Contributor

@petevine unlikely to be related to that, but I don't know

@nikomatsakis
Copy link
Contributor

Seems pretty clear that the caching of sized results is no longer kicking in like it once did. This probably affects the script crate from servo too.

@nikomatsakis
Copy link
Contributor

e.g., we repeatedly prove that T: Sized for the following types T. Here, the first number is the number of times we have proved it in my current logs. Note that compilation did not yet complete. :)

  79297 <*const alloc::rc::RcBox<fn() -> Box<peruse::parsers::Parser<I=[grammar_lexer::Token], O=grammar::Expr>> {parser::program::expression}>
  81467 <core::marker::PhantomData<alloc::rc::RcBox<fn() -> Box<peruse::parsers::Parser<I=[grammar_lexer::Token], O=grammar::Expr>> {parser::program::expression}>>
  81467 <core::nonzero::NonZero<*const alloc::rc::RcBox<fn() -> Box<peruse::parsers::Parser<I=[grammar_lexer::Token], O=grammar::Expr>> {parser::program::expression}>>
  82807 <core::ptr::Shared<alloc::rc::RcBox<fn() -> Box<peruse::parsers::Parser<I=[grammar_lexer::Token], O=grammar::Expr>> {parser::program::expression}>>
  82984 <peruse::parsers::RecursiveParser<[grammar_lexer::Token], grammar::Expr, fn() -> Box<peruse::parsers::Parser<I=[grammar_lexer::Token], O=grammar::Expr>> {parser::program::expression}>
  82999 <alloc::rc::Rc<fn() -> Box<peruse::parsers::Parser<I=[grammar_lexer::Token], O=grammar::Expr>> {parser::program::expression}>
 132273 <fn() -> Box<peruse::parsers::Parser<I=[grammar_lexer::Token], O=grammar::Expr>> {parser::program::expression}

@nikomatsakis
Copy link
Contributor

triage: P-high

@rust-highfive rust-highfive added P-high High priority and removed I-nominated labels Jan 28, 2016
bors added a commit that referenced this issue Feb 5, 2016
…he, r=aturon

Have the `ObligationForest` keep some per-tree state (or type `T`) and have it give a mutable reference for use when processing obligations. In this case, it will be a hashmap. This obviously affects the work that @soltanmm has been doing on snapshotting. I partly want to toss this out there for discussion.

Fixes #31157. (The test in question goes to approx. 30s instead of 5 minutes for me.)
cc #30977.
cc @aturon @arielb1 @soltanmm

r? @aturon who reviewed original `ObligationForest`
@White-Oak
Copy link
Contributor

So, as this was fixed -- is it (the fix) going to be in the next nightly?

@nikomatsakis
Copy link
Contributor

Ought to be, yes.

On Fri, Feb 5, 2016 at 3:25 PM, Oak notifications@github.com wrote:

So, as this was fixed -- is it (the fix) going to be in the next nightly?


Reply to this email directly or view it on GitHub
#31157 (comment).

@White-Oak
Copy link
Contributor

@nikomatsakis excuse me for bothering again.

  1. What about beta? Should this fix be backported to it? As states in a first comment -- this regression is in beta as well.
  2. Well, for me time went from 40 secs (2016-01-15) to 56 secs (current nightly: 2016-02-07) which is almost 30% time regression, but I guess that's best that could be done.

@nikomatsakis
Copy link
Contributor

@White-Oak

What about beta? Should this fix be backported to it? As states in a first comment -- this regression is in beta as well.

Good point. I've nominated the PR for backporting.

Well, for me time went from 40 secs (2016-01-15) to 56 secs (current nightly: 2016-02-07) which is almost 30% time regression, but I guess that's best that could be done.

It may be possible to improve further.

@Marwes
Copy link
Contributor

Marwes commented Feb 17, 2016

Is there any work done towards improving this further? combine still suffers badly from this change, even becoming so bad that Travis times out when building it (see https://travis-ci.org/Marwes/combine)

@White-Oak
Copy link
Contributor

@Marwes it is not backported to beta yet.

@bluss
Copy link
Member

bluss commented Feb 17, 2016

That travis link shows a regression in cargo build time there from 96 seconds in stable, 374 seconds in nightly. Can we keep an issue for that, or is there no way to recover the compilation time?

Edit: Ok, that's lopsided since nightly runs benchmarking and things. I guess those timings are not usable.

@Marwes
Copy link
Contributor

Marwes commented Feb 17, 2016

Here is a link to a build before the regression so there is still a significant slowdown. And even on nightly it fails occasionally https://travis-ci.org/Marwes/combine/jobs/109742755.

@nikomatsakis nikomatsakis reopened this Feb 18, 2016
@nikomatsakis
Copy link
Contributor

I'm willing to re-open the issue on the basis that there is still ground to get back, even if the worst part is solved.

@nikomatsakis
Copy link
Contributor

triage: P-medium

@rust-highfive rust-highfive added P-medium Medium priority and removed P-high High priority labels Feb 18, 2016
@nikomatsakis
Copy link
Contributor

However, I'm bumping priority to medium.

@sanxiyn sanxiyn added I-compiletime Issue: Problems and improvements with respect to compile times. and removed I-slow Issue: Problems and improvements with respect to performance of generated code. labels Mar 23, 2016
@brson
Copy link
Contributor

brson commented Oct 20, 2016

Needs to be retested and a test case added to perf.rlo cc @nrc @Mark-Simulacrum.

@Mark-Simulacrum
Copy link
Member

Will add test case to perf.rlo later, but currently time -v yields:

$ /usr/bin/time -v cargo build
   Compiling regex-syntax v0.2.2
   Compiling libc v0.2.5
   Compiling memchr v0.1.7
   Compiling aho-corasick v0.4.0
   Compiling regex v0.1.48
   Compiling peruse v0.3.0 (https://github.com/DanSimon/peruse.git#dbfc0054)
   Compiling parser v0.1.0 (file:///home/mark/Edit/rust-compilation-time-ram-regression)
    Finished debug [unoptimized + debuginfo] target(s) in 9.63 secs
    Command being timed: "cargo build"
    User time (seconds): 10.44
    System time (seconds): 0.31
    Percent of CPU this job got: 108%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.91
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 150980
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 122458
    Voluntary context switches: 101
    Involuntary context switches: 42
    Swaps: 0
    File system inputs: 0
    File system outputs: 66112
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

So I think this issue can be considered fixed. Can someone clarify exactly what part of the compiler was being stressed? I can add this crate to perf.rlo no problem, but perhaps something better can be designed?

@White-Oak
Copy link
Contributor

White-Oak commented Oct 21, 2016

I've reported this originally, and now I experience the same values as @Mark-Simulacrum. I've tested it with several nightlies over the year, and it seems it was getting better and better (from original 30 secs to 20, 15, 12 and finally 10 as of today). Feels great. Thanks to the rust team for the effort put on compiling speed optimizations!

This exact issue with the example crate may be closed as solved, however the original drop in performance was caused by the introduction of Obligation Forest as explained above by @nikomatsakis.
The latest thing I know, it was reverted on beta in #31851. Hope someone can clarify it better and further, although I'm afraid the track can be lost.

Edit: Actually, the crate itself takes only 5 seconds to build, which is awesome.

@Mark-Simulacrum
Copy link
Member

Closing in favor of rust-lang-deprecated/rustc-perf-collector#2 which tracks adding a test for this.

kennytm added a commit to kennytm/rust that referenced this issue Aug 10, 2018
Consider changing assert! to debug_assert! when it calls visit_with

The perf run from rust-lang#52956 revealed that there were 3 benchmarks that benefited most from changing `assert!`s to `debug_assert!`s:

- issue rust-lang#46449: avg -4.7% for -check
- deeply-nested (AKA rust-lang#38528): avg -3.4% for -check
- regression rust-lang#31157: avg -3.2% for -check

I analyzed their fixing PRs and decided to look for potentially heavy assertions in the files they modified. I noticed that all of the non-trivial ones contained indirect calls to `visit_with()`.

It might be a good idea to consider changing `assert!` to `debug_assert!` in those places in order to get the performance wins shown by the benchmarks.
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Aug 12, 2018
Consider changing assert! to debug_assert! when it calls visit_with

The perf run from rust-lang#52956 revealed that there were 3 benchmarks that benefited most from changing `assert!`s to `debug_assert!`s:

- issue rust-lang#46449: avg -4.7% for -check
- deeply-nested (AKA rust-lang#38528): avg -3.4% for -check
- regression rust-lang#31157: avg -3.2% for -check

I analyzed their fixing PRs and decided to look for potentially heavy assertions in the files they modified. I noticed that all of the non-trivial ones contained indirect calls to `visit_with()`.

It might be a good idea to consider changing `assert!` to `debug_assert!` in those places in order to get the performance wins shown by the benchmarks.
@pnkfelix pnkfelix mentioned this issue Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests