Always format to internal String in FmtPrinter #94131

Mark-Simulacrum · 2022-02-18T21:36:09Z

This avoids monomorphizing for different parameters, decreasing generic code
instantiated downstream from rustc_middle -- locally seeing 7% unoptimized LLVM IR
line wins on rustc_borrowck, for example.

We likely can't/shouldn't get rid of the Result-ness on most functions, though some
further cleanup avoiding fmt::Error where we now know it won't occur may be possible,
though somewhat painful -- fmt::Write is a pretty annoying API to work with in practice
when you're trying to use it infallibly.

rust-highfive · 2022-02-18T21:36:12Z

Some changes occured to the CTFE / Miri engine

cc @rust-lang/miri

rust-highfive · 2022-02-18T21:36:14Z

r? @oli-obk

(rust-highfive has picked a reviewer for you, use r? to override)

bors · 2022-02-18T22:55:09Z

☀️ Try build successful - checks-actions
Build commit: e19edf03ef30826c4928f4a631b37e4871e43ab4 (e19edf03ef30826c4928f4a631b37e4871e43ab4)

rust-timer · 2022-02-18T22:55:11Z

Queued e19edf03ef30826c4928f4a631b37e4871e43ab4 with parent b8c56fa, future comparison URL.

rust-timer · 2022-02-19T12:17:43Z

Finished benchmarking commit (e19edf03ef30826c4928f4a631b37e4871e43ab4): comparison url.

Summary: This benchmark run shows 32 relevant regressions 😿 to instruction counts.

Average relevant regression: 1.2%
Largest regression in instruction counts: 3.0% on full builds of deeply-nested-async check

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

oli-obk · 2022-02-19T12:43:19Z

While we reduce bootstrap time by 9s, these regressions seem fairly bad. I see three other solutions that could be tried:

use dynamic dispatch
explicitly monomorphize the two cases and expose only those, preventing duplication in every crate
replace the generic param with fmt::Formatter and use some formatting wrapper for the string cases

Mark-Simulacrum · 2022-02-19T12:50:44Z

I plan to dig into the regressions - my guess is that they're fixable along one of those lines. Maybe also preallocating some space in the buffer since we expect to print something.

Mark-Simulacrum · 2022-02-19T13:52:18Z

I am pretty confident that the regressions here seem to be inlining decisions slightly changing -- looking at cachegrind diffs, it doesn't seem like the diffs belong to formatting code (mostly in ty_relate and similar generic code).

I'm going to try and see if I can confirm that more closely with some disassembly, and maybe see if there's a few optimizations that can be applied regardless (e.g., by making use of the new information that our destination buffer is infallible), but my initial sense is that these regressions are likely somewhat outside the direct control here. We're just moving a bunch of code to being codegen'd just the one time, which naturally means there's a little less information for LLVM to make use of in some places -- things like which flags were set on FmtPrinter, for example -- which leads to slightly worse performance.

oli-obk · 2022-02-19T14:33:59Z

compiler/rustc_middle/src/ty/print/pretty.rs

-impl<'a, 'tcx, F> Deref for FmtPrinter<'a, 'tcx, F> {
-    type Target = FmtPrinterData<'a, 'tcx, F>;
+impl<'a, 'tcx> Deref for FmtPrinter<'a, 'tcx> {
+    type Target = FmtPrinterData<'a, 'tcx>;
    fn deref(&self) -> &Self::Target {


It may be enough to slap #[inline] on all the functions in this file that appear in cachegrind for the first time after this PR

None of these functions appear in cachegrind -- they're not the direct problem. For example on stm32f4-check-full, we see a diff like this:

245,045,449 PROGRAM TOTALS -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 372,484,116 ???:<core::iter::adapters::map::Map<core::iter::adapters::enumerate::Enumerate<core::iter::adapters::zip::Zip<core::iter::adapters::copied::Copied<core::slice::iter::Iter<rustc_middle::ty::subst::> -326,700,323 ???:rustc_middle::ty::relate::relate_substs::<rustc_infer::infer::equate::Equate> 123,186,012 ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_ty 86,186,243 ???:rustc_middle::ty::relate::super_relate_tys::<rustc_infer::infer::equate::Equate> -55,808,143 ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeFoldable>::super_fold_with::<rustc_middle::ty::subst::SubstFolder> 30,561,468 ???:<alloc::collections::btree::map::entry::OccupiedEntry<rustc_infer::infer::region_constraints::Constraint, rustc_infer::infer::SubregionOrigin>>::remove_entry -27,764,057 ???:<alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut, rustc_infer::infer::region_constraints::Constraint, rustc_infer:> 20,749,456 ???:<&mut alloc::vec::Vec<ena::unify::VarValue<rustc_type_ir::TyVid>> as core::convert::AsRef<[ena::unify::VarValue<rustc_type_ir::TyVid>]>>::as_ref 18,135,210 ???:<rustc_middle::mir::Rvalue>::ty::<rustc_middle::mir::Body> 15,490,475 ???:<rustc_middle::ty::fold::RegionFolder as rustc_middle::ty::fold::FallibleTypeFolder>::try_fold_ty -12,892,781 ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeFoldable>::super_fold_with::<rustc_middle::ty::fold::RegionFolder> 12,520,606 ???:<rustc_infer::infer::combine::CombineFields>::instantiate 12,024,623 ???:<rustc_data_structures::stable_hasher::StableHasher>::finish::<u128> -11,945,001 ???:<core::iter::adapters::map::Map<std::collections::hash::map::Iter<rustc_span::def_id::LocalDefId, rustc_hir::hir_id::ItemLocalId>, rustc_data_structures::stable_hasher::stable_hash_reduce<rust> -10,922,924 ???:<rustc_middle::ty::subst::GenericArg as rustc_middle::ty::relate::Relate>::relate::<rustc_infer::infer::nll_relate::TypeRelating<rustc_borrowck::type_check::relate_tys::NllTypeRelatingDelegate> 9,837,670 ???:<alloc::vec::Vec<rustc_type_ir::TyVid> as alloc::vec::spec_from_iter::SpecFromIter<rustc_type_ir::TyVid, core::iter::adapters::filter_map::FilterMap<core::ops::range::Range<usize>, <rustc_infe> -8,795,699 ???:<rustc_infer::infer::sub::Sub as rustc_middle::ty::relate::TypeRelation>::relate_with_variance::<rustc_middle::ty::Ty> 8,712,904 ???:<&mut rustc_middle::ty::relate::relate_substs<rustc_infer::infer::nll_relate::TypeRelating<rustc_borrowck::type_check::relate_tys::NllTypeRelatingDelegate>>::{closure#0} as core::ops::function> 8,332,971 ???:<rustc_infer::infer::equate::Equate as rustc_middle::ty::relate::TypeRelation>::relate_with_variance::<rustc_middle::ty::Ty> 8,262,191 ???:<rustc_infer::infer::sub::Sub as rustc_middle::ty::relate::TypeRelation>::tys -8,099,008 ???:<rustc_infer::infer::InferCtxt>::unsolved_variables

Ah lol, yea, inlining cost boundaries are fun. Yea, just slap the attribute on all previously generic methods, that should make callers big enough again to stop them from being inlined. Alternatively add inline(never) with a comment to functions that disappeared from cachegrind.

That's not practical -- almost every single method became non-generic after removing the parameter; that's the whole reason for the big win on compile times. IMO trying to second-guess LLVM here on inlining (in either direction) is not the right approach, we're not likely to find a magic bullet.

The diff above is illustrative of typical inlining decisions slightly changing, but trying to recover from that by adding attributes isn't likely to really help that much -- the changes are too scattered and ultimately I suspect arise from things that we can't really influence easily.

I thought the gains are from not instantiating the generics in 5 different crates and generating LLVM IR for them and optimizing them?

Having them only get compiled once is a different aspect than removing the implicit #[inline] that generic function have

#[inline] on all of these methods will -- AFAIK -- force generation in every crate that uses them; it's equivalent to generics in that respect. It also adds an inline hint, but that's a separate effect and unlikely to matter as much given the large size of many of the functions here. We save a little of course on generating just one copy per crate instead of 2-3, but that's still a higher cost.

Mark-Simulacrum · 2022-02-19T19:53:38Z

@bors try @rust-timer queue

FWIW, I tried adding some eprintln! to the FmtPrinter::new and FmtPrinter::into_buffer functions, and neither are used at all for deeply-nested-async check builds. I think that further confirms that this is not in and of itself likely causing the regressions, rather that other functions are being shuffled between CGUs or similar due to these functions being omitted.

I've added a commit to preallocate a 64-byte buffer which might help with some benchmarks and rebased, let's see if we still see the same impacts.

rust-timer · 2022-02-19T19:53:40Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-02-19T19:53:46Z

⌛ Trying commit 87bbb05815e1b44c940809f9ab261411bf0068fc with merge c65d716943343dccd7e2f9a1ad2455e7ebf9ae49...

bors · 2022-02-19T21:30:36Z

☀️ Try build successful - checks-actions
Build commit: c65d716943343dccd7e2f9a1ad2455e7ebf9ae49 (c65d716943343dccd7e2f9a1ad2455e7ebf9ae49)

rust-timer · 2022-02-19T21:30:37Z

Queued c65d716943343dccd7e2f9a1ad2455e7ebf9ae49 with parent e08d569, future comparison URL.

Mark-Simulacrum · 2022-02-19T22:51:08Z

Preliminary local results suggest the 64-byte preallocation is not likely to have a significant impact here, so probably doesn't matter that much.

My suggested approach I think is still to call these relatively minor regressions justified -- they appear to be not directly related to the changes in this PR based on investigatory work I've done locally (cachegrind diffs, the fact that the code modified doesn't execute at all in some of the regressed benchmarks); I am inclined to take the win on bootstrap time and leave the runtime performance aside for this case. The folding code may have a few orthogonal wins that will buy us back some of this regression, but those wins are never going to really relate to this PR in a direct fashion.

This avoids monomorphizing for different parameters, decreasing generic code instantiated downstream from rustc_middle.

oli-obk · 2022-02-23T17:04:55Z

@bors r+

bors · 2022-02-23T17:04:57Z

📌 Commit 2ee6d55 has been approved by oli-obk

bors · 2022-02-24T16:34:28Z

⌛ Testing commit 2ee6d55 with merge 8b9e41cd50112f711aef7f5d28d8458430bee486...

bors · 2022-02-24T16:37:51Z

💔 Test failed - checks-actions

Mark-Simulacrum · 2022-02-24T17:05:07Z

@bors treeclosed=100 retry

Mark-Simulacrum · 2022-02-24T17:10:44Z

@bors treeclosed-

bors · 2022-02-24T17:10:52Z

⌛ Testing commit 2ee6d55 with merge a26f75a3c88c9ff5b5bcc90d944e60a37ef8a0c3...

rust-log-analyzer · 2022-02-24T17:15:05Z

The job aarch64-gnu failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

Mark-Simulacrum · 2022-02-24T17:18:00Z

@bors retry

bors · 2022-02-24T17:18:09Z

⌛ Testing commit 2ee6d55 with merge 4b043fa...

bors · 2022-02-24T19:40:41Z

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 4b043fa to master...

rust-timer · 2022-02-24T21:38:42Z

Finished benchmarking commit (4b043fa): comparison url.

Summary: This benchmark run did not return any relevant results. 30 results were found to be statistically significant but too small to be relevant.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

rust-log-analyzer · 2022-02-24T22:41:10Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

rust-highfive assigned oli-obk Feb 18, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 18, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 18, 2022

This comment was marked as resolved.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 18, 2022

This comment was marked as resolved.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 19, 2022

Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 19, 2022

oli-obk reviewed Feb 19, 2022

View reviewed changes

Mark-Simulacrum force-pushed the fmt-string branch from 81611d8 to 87bbb05 Compare February 19, 2022 19:50

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 19, 2022

Mark-Simulacrum removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Feb 19, 2022

Mark-Simulacrum added 2 commits February 20, 2022 19:32

Always format to internal String in FmtPrinter

efb99d7

This avoids monomorphizing for different parameters, decreasing generic code instantiated downstream from rustc_middle.

Preallocate a buffer in FmtPrinter

2ee6d55

Mark-Simulacrum force-pushed the fmt-string branch from 87bbb05 to 2ee6d55 Compare February 21, 2022 00:33

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 23, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 24, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 24, 2022

bors mentioned this pull request Feb 24, 2022

better ObligationCause for normalization errors in can_type_implement_copy #93714

Merged

bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 24, 2022

bors merged commit 4b043fa into rust-lang:master Feb 24, 2022

rustbot added this to the 1.61.0 milestone Feb 24, 2022

rustbot removed the perf-regression Performance regression. label Feb 24, 2022

Always format to internal String in FmtPrinter #94131

Always format to internal String in FmtPrinter #94131

Conversation

Mark-Simulacrum commented Feb 18, 2022

rust-highfive commented Feb 18, 2022

rust-highfive commented Feb 18, 2022

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

bors commented Feb 18, 2022

rust-timer commented Feb 18, 2022

rust-timer commented Feb 19, 2022

oli-obk commented Feb 19, 2022

Mark-Simulacrum commented Feb 19, 2022

Mark-Simulacrum commented Feb 19, 2022

oli-obk Feb 19, 2022

Choose a reason for hiding this comment

Mark-Simulacrum Feb 19, 2022

Choose a reason for hiding this comment

oli-obk Feb 19, 2022

Choose a reason for hiding this comment

Mark-Simulacrum Feb 19, 2022

Choose a reason for hiding this comment

oli-obk Feb 19, 2022

Choose a reason for hiding this comment

Mark-Simulacrum Feb 19, 2022

Choose a reason for hiding this comment

Mark-Simulacrum commented Feb 19, 2022

rust-timer commented Feb 19, 2022

bors commented Feb 19, 2022

bors commented Feb 19, 2022

rust-timer commented Feb 19, 2022

Mark-Simulacrum commented Feb 19, 2022

oli-obk commented Feb 23, 2022

bors commented Feb 23, 2022

bors commented Feb 24, 2022

bors commented Feb 24, 2022

Mark-Simulacrum commented Feb 24, 2022

Mark-Simulacrum commented Feb 24, 2022

bors commented Feb 24, 2022

rust-log-analyzer commented Feb 24, 2022

Mark-Simulacrum commented Feb 24, 2022

bors commented Feb 24, 2022

bors commented Feb 24, 2022

rust-timer commented Feb 24, 2022

rust-log-analyzer commented Feb 24, 2022