Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up TokenCursor #96210

Merged
merged 20 commits into from
Apr 21, 2022
Merged

Speed up TokenCursor #96210

merged 20 commits into from
Apr 21, 2022

Conversation

nnethercote
Copy link
Contributor

Plus a few related clean-ups.

r? @petrochenkov

This makes it more like `CursorRef::next_with_spacing`. There is no
performance effect, just a consistency improvement.
They both have a single call site.
It has a single call site.
And likewise for the inlined variants.

I did this for simplicity, but interesting it was a performance win as
well.
In particular, avoid wrapping a token within `TokenTree::Token` and then
immediately matching it and returning the token within. Just return the
token immediately.
This will facilitate the change in the next commit.

`boolean` arguments aren't great, but the function is only used in three
places within this one file.
Instead of letting the next iteration of the loop handle it.
@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 19, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 19, 2022
@nnethercote
Copy link
Contributor Author

Some local instruction count measurements:

Benchmark Profile Scenario % Change Significance Factor?
time-macros-0.2.3 check full -4.31% 21.54x
async-std-1.10.0 check full -4.17% 20.85x
tt-muncher check full -3.34% 16.68x
yansi-0.5.0 check full -2.16% 10.80x
quote-stress check full -2.11% 10.57x
web-sys-0.3.56 check full -1.09% 5.46x
num-derive-0.3.3 check full -1.09% 5.45x
ctor-0.1.21 check full -1.05% 5.27x
futures-macro-0.3.19 check full -0.95% 4.76x
mockall_derive-0.11.0 check full -0.89% 4.47x
wasm-bindgen-backend-0.2.79 check full -0.88% 4.42x
tonic-build-0.7.0 check full -0.87% 4.33x
vsdb_derive-0.21.1 check full -0.84% 4.20x
pest_generator-2.1.3 check full -0.83% 4.13x
enum-as-inner-0.4.0 check full -0.81% 4.06x
stdweb-derive-0.5.3 check full -0.81% 4.03x
scroll_derive-0.11.0 check full -0.80% 4.02x
serde_derive-1.0.136 check full -0.75% 3.73x
clap_derive-3.1.7 check full -0.73% 3.67x
ref-cast-impl-1.0.6 check full -0.73% 3.63x
wayland-scanner-0.29.4 check full -0.71% 3.55x
prost-derive-0.10.0 check full -0.69% 3.47x
pyo3-macros-backend-0.16.3 check full -0.69% 3.46x
diesel_derives-1.4.1 check full -0.67% 3.36x
inotify-0.10.0 check full -0.66% 3.32x
funty-2.0.0 check full -0.63% 3.13x
enumflags2_derive-0.7.4 check full -0.61% 3.06x
deep-vector check full -0.48% 2.40x
nix-0.23.1 check full -0.47% 2.33x
futures-lite-1.12.0 check full -0.33% 1.66x
externs check full -0.31% 1.55x

The cycles counts are even better, e.g.:

Benchmark Profile Scenario % Change Significance Factor?
tt-muncher check full -9.67% 48.36x
async-std-1.10.0 check full -8.13% 40.67x
quote-stress check full -7.13% 35.65x
time-macros-0.2.3 check full -6.77% 33.85x
token-stream-stress check full -3.95% 19.75x
yansi-0.5.0 check full -3.74% 18.72x
web-sys-0.3.56 check full -3.50% 17.48x
tokio-1.17.0 check full -3.22% 16.11x
log-0.4.16 check full -3.20% 15.98x
futures-macro-0.3.19 check full -3.18% 15.92x
inotify-0.10.0 check full -3.14% 15.69x
issue-88862 check full 2.98% 14.89x
externs check full -2.62% 13.08x
unify-linearly check full -2.47% 12.36x
enumflags2_derive-0.7.4 check full -2.14% 10.69x
mockall_derive-0.11.0 check full -2.09% 10.43x
ref-cast-impl-1.0.6 check full -1.98% 9.88x
tonic-build-0.7.0 check full -1.92% 9.59x
scroll_derive-0.11.0 check full -1.86% 9.31x
stdweb-derive-0.5.3 check full -1.76% 8.79x

@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 19, 2022
@bors
Copy link
Contributor

bors commented Apr 19, 2022

⌛ Trying commit 36c7c829e99b3785de94e6113b4abb4ff0410e9c with merge efe204aef96605ab9df88a21434b2b571a770ede...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 19, 2022

☀️ Try build successful - checks-actions
Build commit: efe204aef96605ab9df88a21434b2b571a770ede (efe204aef96605ab9df88a21434b2b571a770ede)

@rust-timer
Copy link
Collaborator

Queued efe204aef96605ab9df88a21434b2b571a770ede with parent e2661ba, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (efe204aef96605ab9df88a21434b2b571a770ede): comparison url.

Summary:

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 0 22 20 22
mean2 N/A N/A -0.3% -1.5% -0.3%
max N/A N/A -0.6% -4.1% -0.6%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 19, 2022
@nnethercote
Copy link
Contributor Author

tt-muncher is most affected. The instruction count results for it are good, with 1-4% reductions. But the wall-time results are amazing, with 6-16% reductions. My guess would be this is due to fewer branch mispredictions.

It's not hot, so shouldn't be within the always inlined part.
@nnethercote nnethercote force-pushed the speed-up-TokenCursor branch from 36c7c82 to e0b2272 Compare April 19, 2022 22:53
@nnethercote
Copy link
Contributor Author

I have fixed the tidy error.

Cachegrind indicates that the outsized wall-time speedup for tt-muncher is actually due to fewer loads and stores in and around Parser::bump(), though it was hard to tell exactly which part of the change was responsible.

The `DelimToken` here is `NoDelim`, which means the returned delim
tokens will just be ignored by `Parser::bump()`. This commit changes
things so the delim tokens won't be returned.
Because it's now always true.
The loop is there to handle a `NoDelim` open/close token. This commit
changes `TokenCursor::inlined_next` so it never returns such a token.
This is a performance win because the conditional test in `bump()` is
removed.

If the parser needs changing in the future to handle `NoDelim` tokens,
then `inlined_next()` can easily be changed to return them.
compiler/rustc_parse/src/parser/mod.rs Outdated Show resolved Hide resolved
if self.prev_token.kind == TokenKind::Eof {
let msg = "attempted to bump the parser past EOF (may be stuck in a loop)";
self.span_bug(self.token.span, msg);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We 100% have seen this error, and relatively commonly, until relevant bugs in the parser were fixed.
But it was several years ago already.

compiler/rustc_parse/src/parser/mod.rs Show resolved Hide resolved
compiler/rustc_parse/src/parser/mod.rs Outdated Show resolved Hide resolved
@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 20, 2022
Surprisingly, this is a non-trivial performance win.
This makes `CloseDelim` handling more like `OpenDelim` handling, which
produces `OpenDelim` and pushes the stack at the same time. It requires
some adjustment to `parse_token_tree` now that we don't remain within
the frame after getting the `CloseDelim`.
This lets us clone just the parts within a `TokenTree` that need
cloning, rather than the entire thing. This is a surprisingly large
performance win, up to 4% on `async-std-1.10.0`.
@nnethercote nnethercote force-pushed the speed-up-TokenCursor branch from e5d0a36 to 643e9f7 Compare April 21, 2022 05:25
@nnethercote
Copy link
Contributor Author

I have added two new commits. The first changes how CloseDelim is handled, making things neater, and also putting all the NoDelim handling within inlined_next. The second avoids some cloning for more performance improvements. Updated instruction counts:

Benchmark Profile Scenario % Change Significance Factor?
async-std-1.10.0 check full -18.07% 90.33x
time-macros-0.2.3 check full -16.77% 83.87x
tt-muncher check full -15.11% 75.53x
yansi-0.5.0 check full -9.16% 45.82x
ctor-0.1.21 check full -4.48% 22.42x
num-derive-0.3.3 check full -4.47% 22.33x
web-sys-0.3.56 check full -4.43% 22.14x
futures-macro-0.3.19 check full -3.90% 19.48x
tonic-build-0.7.0 check full -3.70% 18.51x
mockall_derive-0.11.0 check full -3.69% 18.46x
wasm-bindgen-backend-0.2.79 check full -3.68% 18.42x
vsdb_derive-0.21.1 check full -3.61% 18.03x
scroll_derive-0.11.0 check full -3.60% 18.02x
pest_generator-2.1.3 check full -3.53% 17.63x
stdweb-derive-0.5.3 check full -3.43% 17.16x
enum-as-inner-0.4.0 check full -3.41% 17.05x
serde_derive-1.0.136 check full -3.23% 16.17x
clap_derive-3.1.7 check full -3.22% 16.10x
ref-cast-impl-1.0.6 check full -3.12% 15.61x
prost-derive-0.10.0 check full -3.07% 15.36x
wayland-scanner-0.29.4 check full -3.05% 15.23x
pyo3-macros-backend-0.16.3 check full -3.02% 15.09x

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 21, 2022
@petrochenkov
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Apr 21, 2022

📌 Commit 643e9f7 has been approved by petrochenkov

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 21, 2022
@bors
Copy link
Contributor

bors commented Apr 21, 2022

⌛ Testing commit 643e9f7 with merge e35a533fb398e98b318502c2920f82e110aad604...

@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@bors
Copy link
Contributor

bors commented Apr 21, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 21, 2022
@lqd
Copy link
Member

lqd commented Apr 21, 2022

@bors retry "The hosted runner: GitHub Actions 5 lost communication with the server" on dist-aarch64-apple

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 21, 2022
@bors
Copy link
Contributor

bors commented Apr 21, 2022

⌛ Testing commit 643e9f7 with merge b04c532...

@bors
Copy link
Contributor

bors commented Apr 21, 2022

☀️ Test successful - checks-actions
Approved by: petrochenkov
Pushing b04c532 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 21, 2022
@bors bors merged commit b04c532 into rust-lang:master Apr 21, 2022
@rustbot rustbot added this to the 1.62.0 milestone Apr 21, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b04c532): comparison url.

Summary:

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 0 94 83 94
mean2 N/A N/A -0.9% -2.4% -0.9%
max N/A N/A -2.4% -17.2% -2.4%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

@nnethercote nnethercote deleted the speed-up-TokenCursor branch April 21, 2022 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants