Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark performance doesn't only depend on rustc #239

Closed
nnethercote opened this issue May 29, 2018 · 12 comments
Closed

Benchmark performance doesn't only depend on rustc #239

nnethercote opened this issue May 29, 2018 · 12 comments

Comments

@nnethercote
Copy link
Contributor

I've been looking at the compile times for cargo where procedural macro handling takes a significant fraction of the time in some of the runs. For example, see dtolnay/proc-macro2#94, which has info on two changes that speed things up.

Those changes are not to rustc itself, but to external crates (quote and proc-macro2) involved in procedural macro generation. If/when these changes land, rustc-perf's results won't change unless we update the versions of those crates used within rustc-perf (and possibly the crates that depend on them; the dependencies can be a couple of levels deep).

I'm not sure how to handle this in general. We might be able to muddle through with significant crate version updates when we know about them. But there is a gap in our coverage. For example, if a crate used by procedural macros (such as quote) introduced a change that caused a large compile-time regression, users would experience that regression but rustc-perf would not.

@Mark-Simulacrum
Copy link
Member

In theory though optimizations in this area (proc macro crates) are orthogonal to rustc optimizations -- that is, we don't actually have to change anything to measure rustc wins.

Having said that, it may be worth talking to people who maintain proc macro crates (@dtolnay @alexcrichton are two who I know of) who may have ideas of how to better integrate those benchmarks into rustc-perf. It's possible this is a case where we're ultimately measuring "runtime" performance instead of compiler performance, something that's not really supported by perf.r-l.o today. Long-term though I could see us utilizing derive(Serialize)'s underlying calls to quote as a benchmark for quote that we'd then track over time. Once again that should stay steady unless we make changes to MIR optimizations or updates to LLVM/codegen.

I guess the summary here is that this is true -- but also seems "irrelevant" sort of to the current suite: optimizations in this area can probably be made without knowledge of, or touching, the compiler itself.

@nnethercote
Copy link
Contributor Author

AIUI, rustc-perf is measuring the performance of (a) whatever version of rustc you run it with, combined with (b) whatever procedural macro crate versions are specified by the Cargo.lock files, which will gradually become out of date.

Perhaps I'm misunderstanding something, but it doesn't seem "irrelevant" to me. Users care about compile times; non-rustc components contribute to those times; rustc-perf has shortcomings in its measurement of those components.

@Mark-Simulacrum
Copy link
Member

Well, perhaps irrelevant was the wrong wording: what I mean is that we have no good way of measuring changes to quote and other derive infrastructure, and maybe we shouldn't be anyway: it seems like quote could add benchmarks to itself and then optimize/iterate on those. Similarly, none of these "optimizations" are going to be easily representable on our existing graphs, as they are likely rustc version agnostic.

@alexcrichton
Copy link
Member

I see perf.r-l.o as basically exclusively for the compiler in the sense that performance of crates.io crates, while important, aren't necesarily important for our own perf tracking. We can't land changes to rustc to speed up quote or proc-macro2 all the time, and we get more signal from maintaining those versions as constant over time.

To be clear I think it's very important to actually optimize things like quote and proc-macro2. I just don't think it's critical that we track and reflect those improvements on perf.r-l.o specifically. We'd probably need a different mechanism or graph for tracking this which would account for not only changes to rustc but also changes to Cargo.lock

@nnethercote
Copy link
Contributor Author

I'm worried about the potential for wasting developers' time. Imagine somebody new comes along and starts profiling rustc via the rustc-perf benchmarks. They might do what I did, and discover that cargo compilation spends a bunch of time in proc-macro2, and investigate. After some time, they may eventually discover that I already optimized that in dtolnay/proc-macro2#94... or they may just end up getting confused. Whatever happens, it'll be a frustrating experience for them.

(Similarly, every time I look at a profile for cargo, I have to remember which bits of the profile I should ignore. I'm in a better position than a newbie, but it's still annoying.)

@Mark-Simulacrum
Copy link
Member

We could periodically run cargo update across benchmarks, but it seems to me that keeping benchmarks unchanging and focused solely on rustc performance is better for regression tracking, to an extent. That is, currently I can look at a graph of rustc performance over time and "trust" it, whereas if we updated crate versions I'd have to rely somewhat on intuition -- is that change because the ecosystem updated, or because rustc is faster?

I think the distinction here is that perf.r-l.o today is largely focused on tracking compiler performance over time, not ecosystem performance. However, maybe that's the wrong approach to take. I'm not sure which is more valuable.

As a corollary, we have regex at 0.1.80 in the project, which is a huge diff from today's regex (4678 insertions(+), 2564 deletions(-)). With the ecosystem performance mindset, we should be tracking the latest release of regex more closely.


I think the question here is whether you'd find it more helpful if we tracked the ecosystem or just the compiler. AIUI, you're suggesting that tracking the ecosystem would be better: is that an accurate assessment of your beliefs?

@nnethercote
Copy link
Contributor Author

I want to track whatever best matches the experience of users.

The tricky thing with ecosystem performance is that there's a small number of crates (proc macro ones, basically) that have a significant effect. They're probably the only ones that matter. Doing a full cargo update feels wrong... but maybe it's an accurate reflection of a user's experience? If a user freshly compiles a revision of Cargo that is two years old they will get current versions of external crates, rather than two year old crates, right? (Assuming crate versions haven't been pinned.)

But for the benchmarks that are live projects, if we do a cargo update, the benchmark is effectively changing, and so seems like we might as well update the benchmark code itself to the latest version too. Hmm.

It's a weird situation: for each benchmark that uses proc macros, part of the benchmark itself effectively constitutes part of the compiler; freezing the benchmark effectively freezes part of the compiler.

@Mark-Simulacrum
Copy link
Member

We could just update proc macro dependencies... that might be the best of both worlds, and shouldn't be too hard. What do you think about that approach?

@nnethercote
Copy link
Contributor Author

If there is a clear definition of "proc macro dependencies", then that seems reasonable?

But consider this comment:

I see that the strnom.rs file in proc-macro2 is also hot, presumably doing parsing related to custom derive. strnom.rs is a custom version of nom.

Similarly, when compiling style-servo from rustc-perf the synom crate is hot. It is a different customized version of nom, though with similar characteristics to strnom.rs. It's used by the syn crate, which is used by the cssparser-macros crate.

Sounds like there is a lot of ad hoc customization going on, which might make it hard to judge which dependencies should be updated?

@Mark-Simulacrum
Copy link
Member

Well, we'd decide by way of looking at the crates whose type is proc-macro and then recursively examining their dependencies. This shouldn't be too hard, and overlap I'd presume would be fairly minimal (and not in the crate we're measuring, anyway).

@nnethercote
Copy link
Contributor Author

Ok, that sounds reasonable.

@nnethercote
Copy link
Contributor Author

I'm less concerned about this now that we have a policy to regularly update benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants