Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in 1.72 related to normalization of opaques with late-bound vars #115283

Open
kornelski opened this issue Aug 27, 2023 · 27 comments
Labels
C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance

Comments

@kornelski
Copy link
Contributor

kornelski commented Aug 27, 2023

Upgrade from 1.71 to 1.72 has made compilation time of my async-heavy actix server 350 times slower (from under 5s to 30 minutes, on a 32GB M1 Max CPU).

I've bisected it to commit a20a04e (#113108).

The slowdown is still in the latest nightly (1.74.0 69e97df 2023-08-26). 1.72 and the nightly spend 30% of all compilation time in Interners::intern_ty, mainly called by <rustc_middle[b15c2eca62e7285b]::ty::Ty as rustc_type_ir[f43ddb3fb00a443c]::fold::TypeSuperFoldable<rustc_middle[b15c2eca62e7285b]::ty::context::TyCtxt>>::try_super_fold_with::<rustc_middle[b15c2eca62e7285b]::ty::generic_args::ArgFolder> and <rustc_middle[b15c2eca62e7285b]::ty::Ty as rustc_type_ir[f43ddb3fb00a443c]::fold::TypeSuperFoldable<rustc_middle[b15c2eca62e7285b]::ty::context::TyCtxt>>::try_super_fold_with::<rustc_trait_selection[961e83e42176765a]::traits::project::AssocTypeNormalizer>.

The reduced code is in: https://gitlab.com/lib.rs/main/-/commits/reproducer

outdated info Apologies for a big spaghetti reproducer. Let me know if it'd be useful to reduce it.
git clone --recursive https://gitlab.com/lib.rs/main
cd main
git reset --hard 81c612b
cargo build -p crates-server

I suspect the culprit is this function, which is used in several places in server/src/main.rs:

async fn with_file_cache<F: Send>(state: &AServerState, cache_file_name: &str, cache_time: u32, allow_stale: bool, generate: F) -> Result<Rendered, anyhow::Error>
    where F: Future<Output=Result<Rendered, anyhow::Error>> + 'static {

@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged

@kornelski kornelski added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Aug 27, 2023
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-untriaged Untriaged performance or correctness regression. labels Aug 27, 2023
@compiler-errors
Copy link
Member

Yeah, I'd appreciate if you could minimize it. @Dirbaio helped me with a similar regression that was fixed in #114948, so I wonder if there's some other missing normalization that needs to be added somewhere...

@compiler-errors
Copy link
Member

In the mean time, I'll try to look at the flamegraphs for the code you shared...

@compiler-errors
Copy link
Member

@kornelski: Is there another step I need to do in this build? 😅

error: couldn't read front_end/src/../../style/public/critical.css: No such file or directory (os error 2)
   --> front_end/src/front_end.rs:101:53
    |
101 |         let data = self.critical_css_data.unwrap_or(include_str!("../../style/public/critical.css"));
    |                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@kornelski
Copy link
Contributor Author

kornelski commented Aug 27, 2023

Thank you for checking it out — sorry about the build error. I though I removed all of these. CSS is irrelevant here, can be an empty string.

https://gitlab.com/lib.rs/main/-/commits/reproducer

@compiler-errors
Copy link
Member

I'm finding several main sources of slowness here:

  1. Monomorphization causing us to repeatedly normalize opaques that show up in MIR bodies -- enable RevealAll pass always #115299 causes the compile time to go from 28 minutes to 11 minutes, for example.
  2. We're repeatedly checking FnOnce obligations for the FnDef1 types of async fns. This leads us to repeatedly compute the signatures of these async fns and then normalize their output futures over and over again.

Footnotes

  1. The ZST corresponding to each function item.

@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

@rustbot rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Aug 29, 2023
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 29, 2023
@pnkfelix pnkfelix added the I-compiletime Issue: Problems and improvements with respect to compile times. label Aug 31, 2023
@darkprokoba
Copy link

I'm probably not helping much, but I have a warp app with a decent number of warp filters and I'm typically seeing about 16s for incremental builds with 1.71 and around 10 minutes for the same incremental build with 1.72.
I can't share my source code, but I'll be happy to test a potential fix in nightly if/when available...

@apiraino
Copy link
Contributor

apiraino commented Sep 1, 2023

@darkprokoba Warp (the web framework) .filter() mechanism has a history of being pretty sensible to perf. changes, especially when you start nesting them deeply - that's the way the framework is intended to be used.

Last time I checked, one workaround to dramatically speed up compile times was to add .boxed() to the filters. My memory could be failing me, so (if you haven't already) try having a look at the Warp issue tracker, you might find some useful tips.

@darkprokoba
Copy link

I'm already using .boxed() for my debug builds...

@lqd
Copy link
Member

lqd commented Sep 1, 2023

For any issues with 1.72.0, it would be good to check with beta and nightly as well. #114948 has fixed some 1.72.0 regressions already, for example. If the issues still happen on nightly, then some details on how to reproduce the regressions will also be very useful.

@zachs18
Copy link
Contributor

zachs18 commented Sep 1, 2023

At https://github.com/zachs18/zbus-repro/tree/zachs18-minimized I have a similar perf regression using zbus's async API (somewhat minimized from @rgwood ) that I think is related (I bisected to the same commit; -Zself-profile shows that the extra time is spent in evaluate_obligation). Each additional set of functions of the following form makes compliation take ~1s longer on my machine.

// this `async fn` is important, inlining the body into an `async` block in its use below does not cause perf issue
async fn asyncfn() {
   let _ = zbus::Connection::session().await;
}
pub fn box_pin_asyncfn() {
  // this binding is important, inlining the variable into its use below does not cause perf issue
  let future = asyncfn();
  let _ = Box::pin(future);
}

This is reproducible on the latest nightly (rustc 1.74.0-nightly (2f5df8a94 2023-08-31))

@lqd
Copy link
Member

lqd commented Sep 2, 2023

Note for people doing triage:

  • enable RevealAll pass always #115299 doesn't help with the zbus-repro above.
  • (I'm not sure the lib.rs/main reproducer in the OP builds successfully anymore, there are now errors in front_end)

@darkprokoba
Copy link

@lqd
I tried with rustc 1.74.0-nightly (9f5fc1b 2023-09-02)
There was no improvement compared to 1.72, compile times remain much worse than 1.71

@lsunsi
Copy link

lsunsi commented Sep 3, 2023

I can't provide a reproducible example, but I have a medium-to-large project and I'm seeing the exact same issue. I held off upgrading the project to 1.72 due to 5x compile time overall. For what it's worth, it seems the issue is more connected to the 'diesel' part of our project, which has a lot of proc-macros. I can't say for sure it's related, but it seems like it.

@kornelski
Copy link
Contributor Author

kornelski commented Sep 3, 2023

I've found that removing of #[tracing::instrument] macros gives a huge speed improvement for my server. Expansion of the macro is interesting. It includes type inference for return type of an async block.

    let __tracing_instrument_future = async move {
        if false {
            let __tracing_attr_fake_return: Result<HttpResponse, ServerError> = {
                ::core::panicking::panic_fmt(
                    format_args!(
                        "internal error: entered unreachable code: {0}",
                        format_args!(
                            "this is just for type inference, and is unreachable code",
                        ),
                    ),
                );
            };
            return __tracing_attr_fake_return;
        }
        // …
    }

Here's a branch with a lot of the code cut out:

https://gitlab.com/lib.rs/main/-/commits/reproducer

@kornelski
Copy link
Contributor Author

kornelski commented Sep 5, 2023

As of rustc 1.74.0-nightly (ec08a03 2023-09-04) and https://gitlab.com/lib.rs/main/-/commit/422f6790dd7b6c0798a8dd9cc10ca6f22fbe3de6

the heaviest stack trace is:

  • 128087 <rustc_middle[f5a94f44b73603e9]::ty::context::CtxtInterners>::intern_ty
  • 83592 <rustc_middle[f5a94f44b73603e9]::ty::Ty as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeSuperFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_super_fold_with::<rustc_middle[f5a94f44b73603e9]::ty::generic_args::ArgFolder>
  • 28390 rustc_middle[f5a94f44b73603e9]::ty::util::fold_list::<rustc_middle[f5a94f44b73603e9]::ty::generic_args::ArgFolder, rustc_middle[f5a94f44b73603e9]::ty::generic_args::GenericArg, <&rustc_middle[f5a94f44b73603e9]::ty::list::List<rustc_middle[f5a94f44b73603e9]::ty::generic_args::GenericArg> as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_fold_with<rustc_middle[f5a94f44b73603e9]::ty::generic_args::ArgFolder>::{closure#0}>
  • 28390 <rustc_middle[f5a94f44b73603e9]::ty::Ty as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeSuperFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_super_fold_with::<rustc_middle[f5a94f44b73603e9]::ty::generic_args::ArgFolder>
  • 26034 <rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeFolder<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::fold_ty
  • 20243 <&rustc_middle[f5a94f44b73603e9]::ty::list::List<rustc_middle[f5a94f44b73603e9]::ty::Ty> as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 20243 <rustc_middle[f5a94f44b73603e9]::ty::Ty as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeSuperFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_super_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 20243 <&rustc_middle[f5a94f44b73603e9]::ty::list::List<rustc_middle[f5a94f44b73603e9]::ty::generic_args::GenericArg> as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 20243 <rustc_middle[f5a94f44b73603e9]::ty::Ty as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeSuperFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_super_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 20134 <rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeFolder<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::fold_ty
  • 10835 <&rustc_middle[f5a94f44b73603e9]::ty::list::List<rustc_middle[f5a94f44b73603e9]::ty::generic_args::GenericArg> as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 10835 <rustc_middle[f5a94f44b73603e9]::ty::Ty as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeSuperFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_super_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 7311 <&rustc_middle[f5a94f44b73603e9]::ty::list::List<rustc_middle[f5a94f44b73603e9]::ty::Ty> as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 7311 <rustc_middle[f5a94f44b73603e9]::ty::Ty as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeSuperFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_super_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 7311 <&rustc_middle[f5a94f44b73603e9]::ty::list::List<rustc_middle[f5a94f44b73603e9]::ty::generic_args::GenericArg> as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 7311 <rustc_middle[f5a94f44b73603e9]::ty::Ty as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeSuperFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_super_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 7302 <rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeFolder<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::fold_ty
  • 3751 <&rustc_middle[f5a94f44b73603e9]::ty::list::List<rustc_middle[f5a94f44b73603e9]::ty::Ty> as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>
  • 3751 <rustc_middle[f5a94f44b73603e9]::ty::Ty as rustc_type_ir[e05eb7d4e822ea18]::fold::TypeSuperFoldable<rustc_middle[f5a94f44b73603e9]::ty::context::TyCtxt>>::try_super_fold_with::<rustc_trait_selection[c684520e355ad753]::traits::project::AssocTypeNormalizer>

@kornelski
Copy link
Contributor Author

kornelski commented Sep 5, 2023

I think I have another clue: async calls across crates seem to be particularly expensive. I've been deleting thousands of lines of code from other parts the project without noticeable speed improvement, but when I removed all calls from server to front_end crate, compilation speed dropped to the normal 2 seconds! Adding back just a couple of calls makes it jump to 1 minute:

https://gitlab.com/lib.rs/main/-/commits/reproducer

@compiler-errors
Copy link
Member

Cross-crate versus local async calls shouldn't matter at this point in the compilation. The problem here has to do with the depth of futures (futures that contain many futures inside of them are problematic). Probably by removing a cross-crate future, you're just pruning it and the many many successive futures that it contains.

zacharyhamm added a commit to systeminit/si that referenced this issue Sep 8, 2023
Rust 1.72.0 introduced compile time regressions for async heavy code:

rust-lang/rust#115283

Reverting to 1.71.0 gets us back to reasonable compile times. We should
stay there until a fix is released.
si-bors-ng bot added a commit to systeminit/si that referenced this issue Sep 8, 2023
2734: fix(*): pin rust to 1.71.0 until compile time regression fixed r=zacharyhamm a=zacharyhamm

Rust 1.72.0 introduced compile time regressions for async heavy code:

rust-lang/rust#115283

Reverting to 1.71.0 gets us back to reasonable compile times for sdf. We should stay there until a fix is released.

Co-authored-by: Zachary Hamm <zack@systeminit.com>
@darkprokoba
Copy link

I don't see any improvement in my compile times with neither 1.72.1 nor 1.74.0-nightly (bdb0fa3 2023-09-19)

@apiraino
Copy link
Contributor

@darkprokoba as per your previous comment you seem to be using the Warp web framework. As mentioned before, would it be possible to support your case with a minimal reproducible, even a small Warp application showing the regression? It could help understand if the slowdown you experience is connected to this issue or perhaps to something else.

@darkprokoba
Copy link

darkprokoba commented Sep 27, 2023

would it be possible to support your case with a minimal reproducible

So I tried to do this with a dummy warp application with 600 filters, each with its own async handler function. I see no difference in compilation times between 1.71.1 and 1.72.2 :-(
Now my real app has around 100 filters, but unlike my dummy app my actual handler functions typically invoke tons of additional async functions and this goes many levels deep and across a dozen crates.

I have no idea how to arrive at a minimal reproducible case for this problem. Any pointers would be much appreciated. Can I run the compiler thru a profiler and post the gathered data somewhere, like explained here: https://rustc-dev-guide.rust-lang.org/profiling/with_perf.html

Anything else I should try?

@darkprokoba
Copy link

Great news, 1.74.0-nightly (5ae769f 2023-09-26) sees to be slightly faster than 1.71.1 :-)

Not sure who fixed this or when, but kudos to them!

@kornelski
Copy link
Contributor Author

I'm also seeing big improvement in the latest nightly: compilation now takes 6s-16s, instead of minutes. In 1.71.1 it takes 3s-5s, so strictly speaking it's still a couple of times slower, but it's usable now.

It might be thanks to one of these commits (I'm unsure if cargo-bisect-rustc with an inverted condition was accurate):
3050938 13e6f24

@compiler-errors
Copy link
Member

I wonder what fixed it -- maybe drop_tracking_mir getting stabilized?

@kornelski
Copy link
Contributor Author

kornelski commented Sep 28, 2023

I've created a minimal example (maybe this could be added to Rust's perf tests?)

It requires interleaving async and closures, and a lifetime in opaque types.

use core::future::Future;

fn main() {
    // let a = String::new(); // fast
    let a = &String::new(); // slow

    let b = async_wrap2(async_wrap2(async_wrap2(async_wrap2(async_wrap2(async_wrap2(async {
        let _d = vec![a];
    }))))));
    let c = closure_wrap(|| b);
    _ = async { c().await }
}

fn closure_wrap<R>(f: impl FnOnce() -> R) -> impl FnOnce() -> R {
    || f()
}

async fn async_wrap2<F, R>(f: F) -> R
where
    F: Future<Output = R>,
{
    (|| async { f.await })().await
}

@lsunsi
Copy link

lsunsi commented Oct 6, 2023

On my side, on 1.72, 1.73 and current nightly it's still the same 4x from 1.71.
Is there any profiling command I could run in order to help out the investigation?

I ran self-profile, and summarize it (as per https://fasterthanli.me/articles/why-is-my-rust-build-so-slow), and I could see that on 1.71, the top 2 are evaluate_obligation (19.31s) and typeck (17.86s). On nightly, the top ones are layout_of (170.49s!), mir_built (94.55s!), check_liveness (93.45s), then evaluate_obligation (14.89s) and typeck (14.61s).

So it seems like previous slowest steps got faster, but these 3 steps that were previously on the milliseconds, now take over 7x the time of the total previous build.

@wesleywiser wesleywiser added the WG-compiler-performance Working group: Compiler Performance label Oct 6, 2023
@lsunsi
Copy link

lsunsi commented Oct 14, 2023

Upon further investigation, I see that the issue is related in some way with: diesel-rs/diesel#3223 on my project. Some bad interaction from diesel with some change in the compiler from 1.72-onwards until current nightly

rgwood added a commit to rgwood/systemctl-tui that referenced this issue Jan 26, 2024
rgwood added a commit to rgwood/systemctl-tui that referenced this issue Jan 27, 2024
@wesleywiser wesleywiser added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. P-medium Medium priority and removed P-high High priority labels Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance
Projects
None yet
Development

No branches or pull requests