Benchmark performance doesn't only depend on rustc #239

nnethercote · 2018-05-29T01:44:58Z

I've been looking at the compile times for cargo where procedural macro handling takes a significant fraction of the time in some of the runs. For example, see dtolnay/proc-macro2#94, which has info on two changes that speed things up.

Those changes are not to rustc itself, but to external crates (quote and proc-macro2) involved in procedural macro generation. If/when these changes land, rustc-perf's results won't change unless we update the versions of those crates used within rustc-perf (and possibly the crates that depend on them; the dependencies can be a couple of levels deep).

I'm not sure how to handle this in general. We might be able to muddle through with significant crate version updates when we know about them. But there is a gap in our coverage. For example, if a crate used by procedural macros (such as quote) introduced a change that caused a large compile-time regression, users would experience that regression but rustc-perf would not.

The text was updated successfully, but these errors were encountered:

Mark-Simulacrum · 2018-05-29T02:34:27Z

In theory though optimizations in this area (proc macro crates) are orthogonal to rustc optimizations -- that is, we don't actually have to change anything to measure rustc wins.

Having said that, it may be worth talking to people who maintain proc macro crates (@dtolnay @alexcrichton are two who I know of) who may have ideas of how to better integrate those benchmarks into rustc-perf. It's possible this is a case where we're ultimately measuring "runtime" performance instead of compiler performance, something that's not really supported by perf.r-l.o today. Long-term though I could see us utilizing derive(Serialize)'s underlying calls to quote as a benchmark for quote that we'd then track over time. Once again that should stay steady unless we make changes to MIR optimizations or updates to LLVM/codegen.

I guess the summary here is that this is true -- but also seems "irrelevant" sort of to the current suite: optimizations in this area can probably be made without knowledge of, or touching, the compiler itself.

nnethercote · 2018-05-29T04:21:37Z

AIUI, rustc-perf is measuring the performance of (a) whatever version of rustc you run it with, combined with (b) whatever procedural macro crate versions are specified by the Cargo.lock files, which will gradually become out of date.

Perhaps I'm misunderstanding something, but it doesn't seem "irrelevant" to me. Users care about compile times; non-rustc components contribute to those times; rustc-perf has shortcomings in its measurement of those components.

Mark-Simulacrum · 2018-05-29T14:19:55Z

Well, perhaps irrelevant was the wrong wording: what I mean is that we have no good way of measuring changes to quote and other derive infrastructure, and maybe we shouldn't be anyway: it seems like quote could add benchmarks to itself and then optimize/iterate on those. Similarly, none of these "optimizations" are going to be easily representable on our existing graphs, as they are likely rustc version agnostic.

alexcrichton · 2018-05-29T16:16:20Z

I see perf.r-l.o as basically exclusively for the compiler in the sense that performance of crates.io crates, while important, aren't necesarily important for our own perf tracking. We can't land changes to rustc to speed up quote or proc-macro2 all the time, and we get more signal from maintaining those versions as constant over time.

To be clear I think it's very important to actually optimize things like quote and proc-macro2. I just don't think it's critical that we track and reflect those improvements on perf.r-l.o specifically. We'd probably need a different mechanism or graph for tracking this which would account for not only changes to rustc but also changes to Cargo.lock

nnethercote · 2018-06-07T23:41:45Z

I'm worried about the potential for wasting developers' time. Imagine somebody new comes along and starts profiling rustc via the rustc-perf benchmarks. They might do what I did, and discover that cargo compilation spends a bunch of time in proc-macro2, and investigate. After some time, they may eventually discover that I already optimized that in dtolnay/proc-macro2#94... or they may just end up getting confused. Whatever happens, it'll be a frustrating experience for them.

(Similarly, every time I look at a profile for cargo, I have to remember which bits of the profile I should ignore. I'm in a better position than a newbie, but it's still annoying.)

Mark-Simulacrum · 2018-06-07T23:54:54Z

We could periodically run cargo update across benchmarks, but it seems to me that keeping benchmarks unchanging and focused solely on rustc performance is better for regression tracking, to an extent. That is, currently I can look at a graph of rustc performance over time and "trust" it, whereas if we updated crate versions I'd have to rely somewhat on intuition -- is that change because the ecosystem updated, or because rustc is faster?

I think the distinction here is that perf.r-l.o today is largely focused on tracking compiler performance over time, not ecosystem performance. However, maybe that's the wrong approach to take. I'm not sure which is more valuable.

As a corollary, we have regex at 0.1.80 in the project, which is a huge diff from today's regex (4678 insertions(+), 2564 deletions(-)). With the ecosystem performance mindset, we should be tracking the latest release of regex more closely.

I think the question here is whether you'd find it more helpful if we tracked the ecosystem or just the compiler. AIUI, you're suggesting that tracking the ecosystem would be better: is that an accurate assessment of your beliefs?

nnethercote · 2018-06-08T01:55:02Z

I want to track whatever best matches the experience of users.

The tricky thing with ecosystem performance is that there's a small number of crates (proc macro ones, basically) that have a significant effect. They're probably the only ones that matter. Doing a full cargo update feels wrong... but maybe it's an accurate reflection of a user's experience? If a user freshly compiles a revision of Cargo that is two years old they will get current versions of external crates, rather than two year old crates, right? (Assuming crate versions haven't been pinned.)

But for the benchmarks that are live projects, if we do a cargo update, the benchmark is effectively changing, and so seems like we might as well update the benchmark code itself to the latest version too. Hmm.

It's a weird situation: for each benchmark that uses proc macros, part of the benchmark itself effectively constitutes part of the compiler; freezing the benchmark effectively freezes part of the compiler.

Mark-Simulacrum · 2018-06-08T02:05:48Z

We could just update proc macro dependencies... that might be the best of both worlds, and shouldn't be too hard. What do you think about that approach?

nnethercote · 2018-06-08T02:13:49Z

If there is a clear definition of "proc macro dependencies", then that seems reasonable?

But consider this comment:

I see that the strnom.rs file in proc-macro2 is also hot, presumably doing parsing related to custom derive. strnom.rs is a custom version of nom.

Similarly, when compiling style-servo from rustc-perf the synom crate is hot. It is a different customized version of nom, though with similar characteristics to strnom.rs. It's used by the syn crate, which is used by the cssparser-macros crate.

Sounds like there is a lot of ad hoc customization going on, which might make it hard to judge which dependencies should be updated?

Mark-Simulacrum · 2018-06-08T02:40:00Z

Well, we'd decide by way of looking at the crates whose type is proc-macro and then recursively examining their dependencies. This shouldn't be too hard, and overlap I'd presume would be fairly minimal (and not in the crate we're measuring, anyway).

nnethercote · 2018-06-08T03:08:37Z

Ok, that sounds reasonable.

nnethercote · 2022-07-20T01:49:33Z

I'm less concerned about this now that we have a policy to regularly update benchmarks.

Mark-Simulacrum mentioned this issue Jun 10, 2018

Should proc macro crates always be compiled in release even if target crate is being compiled in debug? rust-lang/cargo#5622

Closed

This was referenced Aug 20, 2018

Decouple proc_macro from the rest of the compiler. rust-lang/rust#49219

Merged

Enable proc-macro2's nightly feature, to benchmark rustc's proc_macro implementation. #277

Closed

nnethercote closed this as completed Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark performance doesn't only depend on rustc #239

Benchmark performance doesn't only depend on rustc #239

nnethercote commented May 29, 2018

Mark-Simulacrum commented May 29, 2018

nnethercote commented May 29, 2018

Mark-Simulacrum commented May 29, 2018

alexcrichton commented May 29, 2018

nnethercote commented Jun 7, 2018

Mark-Simulacrum commented Jun 7, 2018

nnethercote commented Jun 8, 2018

Mark-Simulacrum commented Jun 8, 2018

nnethercote commented Jun 8, 2018

Mark-Simulacrum commented Jun 8, 2018

nnethercote commented Jun 8, 2018

nnethercote commented Jul 20, 2022

Benchmark performance doesn't only depend on rustc #239

Benchmark performance doesn't only depend on rustc #239

Comments

nnethercote commented May 29, 2018

Mark-Simulacrum commented May 29, 2018

nnethercote commented May 29, 2018

Mark-Simulacrum commented May 29, 2018

alexcrichton commented May 29, 2018

nnethercote commented Jun 7, 2018

Mark-Simulacrum commented Jun 7, 2018

nnethercote commented Jun 8, 2018

Mark-Simulacrum commented Jun 8, 2018

nnethercote commented Jun 8, 2018

Mark-Simulacrum commented Jun 8, 2018

nnethercote commented Jun 8, 2018

nnethercote commented Jul 20, 2022