-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite collect_tokens
implementations to use a flattened buffer
#77250
Rewrite collect_tokens
implementations to use a flattened buffer
#77250
Conversation
r? @oli-obk (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 653edbfd18abfb56203811db0d974b9d5dc89d30 with merge f1e10e3e4f98fcb126962244ad3a9dc5b82e9dfa... |
☀️ Try build successful - checks-actions, checks-azure |
Queued f1e10e3e4f98fcb126962244ad3a9dc5b82e9dfa with parent 623fb90, future comparison URL. |
Finished benchmarking try commit (f1e10e3e4f98fcb126962244ad3a9dc5b82e9dfa): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
From doing some local profiling, the slowdown appears to be due to the fact that there |
Hmm, why didn't this show up in #76130? |
@petrochenkov: I added logic to skip token collection for attribute targets in more cases (for example, https://github.com/Aaron1011/rust/blob/feature/new-preexp-cfg-tmp/compiler/rustc_parse/src/parser/attr.rs#L541-L555). It looks like this may have masked the fact that token collection was more expensive when we actually performed it. |
Tokens that need to be replaced in #76130 should all be at the same level, right? |
Yes - however, the implementation ended up being significantly more complicated when I tried to do it that way. The issue is that we can have code like this: #[derive(MyDerive)]
struct Bar {
val: [u8; {
struct Inner {
#[cfg(FALSE)] field: u8
}
0
}]
} We want the tokens for The current implementation of
I think either of these implementations would be much more complicated than the current approach. I'm going to see if it's possible to eliminate the performance impact of this PR before exploring the alternatives. |
653edbf
to
5a048c4
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 5a048c438839606ab9e3db8d1055aa7b9c78b7d9 with merge f561c24c1fe21075a1a50561923515524aa1d33c... |
☀️ Try build successful - checks-actions, checks-azure |
Queued f561c24c1fe21075a1a50561923515524aa1d33c with parent 381b445, future comparison URL. |
Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR rust-lang#76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see rust-lang#77250 (comment)). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
f33fc5b
to
593fdd3
Compare
@bors r=petrochenkov |
📌 Commit 593fdd3 has been approved by |
⌛ Testing commit 593fdd3 with merge b996a41c46a27fac7c931f52cbf04266cb213d12... |
💔 Test failed - checks-actions |
@bors retry |
☀️ Test successful - checks-actions, checks-azure |
It seems like this ended up being a regression on a scattering of real world crates. @Aaron1011 -- do you think that's basically inevitable? I don't think we should revert this regardless. |
That's odd - an earlier perf run appeared net neutral. Right now, we're collecting tokens in more cases than is strictly necessary (builtin attributes other than |
@Aaron1011
I don't think we need a list here, only a single number - index of the position in DFS traversal, or something similar to the number of
|
…, r=petrochenkov Unconditionally capture tokens for attributes. This allows us to avoid synthesizing tokens in `prepend_attr`, since we have the original tokens available. We still need to synthesize tokens when expanding `cfg_attr`, but this is an unavoidable consequence of the syntax of `cfg_attr` - the user does not supply the `#` and `[]` tokens that a `cfg_attr` expands to. This is based on PR rust-lang#77250 - this PR exposes a bug in the current `collect_tokens` implementation, which is fixed by the rewrite.
For the `MiddleDot` case, current behaviour: - For a case like `1.2`, `sym1` is `1` and `sym2` is `2`, and `self.token` holds `1.2`. - It creates a new ident token from `sym1` that it puts into `self.token`. - Then it does `bump_with` with a new dot token, which moves the `sym1` token into `prev_token`. - Then it does `bump_with` with a new ident token from `sym2`, which moves the `dot` token into `prev_token` and discards the `sym1` token. - Then it does `bump`, which puts whatever is next into `self.token`, moves the `sym2` token into `prev_token`, and discards the `dot` token altogether. New behaviour: - Skips creating and inserting the `sym1` and dot tokens, because they are unnecessary. - This also demonstrates that the comment about `Spacing::Alone` is wrong -- that value is never used. That comment was added in rust-lang#77250, and AFAICT it has always been incorrect. The commit also expands comments. I found this code hard to read previously, the examples in comments make it easier.
For the `MiddleDot` case, current behaviour: - For a case like `1.2`, `sym1` is `1` and `sym2` is `2`, and `self.token` holds `1.2`. - It creates a new ident token from `sym1` that it puts into `self.token`. - Then it does `bump_with` with a new dot token, which moves the `sym1` token into `prev_token`. - Then it does `bump_with` with a new ident token from `sym2`, which moves the `dot` token into `prev_token` and discards the `sym1` token. - Then it does `bump`, which puts whatever is next into `self.token`, moves the `sym2` token into `prev_token`, and discards the `dot` token altogether. New behaviour: - Skips creating and inserting the `sym1` and dot tokens, because they are unnecessary. - This also demonstrates that the comment about `Spacing::Alone` is wrong -- that value is never used. That comment was added in rust-lang#77250, and AFAICT it has always been incorrect. The commit also expands comments. I found this code hard to read previously, the examples in comments make it easier.
Instead of trying to collect tokens at each depth, we 'flatten' the
stream as we go allong, pushing open/close delimiters to our buffer
just like regular tokens. One capturing is complete, we reconstruct a
nested
TokenTree::Delimited
structure, producing a normalTokenStream
.The reconstructed
TokenStream
is not created immediately - instead, it isproduced on-demand by a closure (wrapped in a new
LazyTokenStream
type). Thisclosure stores a clone of the original
TokenCursor
, plus a record of thenumber of calls to
next()/next_desugared()
. This is sufficient to reconstructthe tokenstream seen by the callback without storing any additional state. If
the tokenstream is never used (e.g. when a captured
macro_rules!
argument isnever passed to a proc macro), we never actually create a
TokenStream
.This implementation has a number of advantages over the previous one:
It is significantly simpler, with no edge cases around capturing the
start/end of a delimited group.
It can be easily extended to allow replacing tokens an an arbitrary
'depth' by just using
Vec::splice
at the proper position. This isimportant for PR [WIP] Token-based outer attributes handling #76130, which requires us to track information about
attributes along with tokens.
The lazy approach to
TokenStream
construction allows us to easilyparse an AST struct, and then decide after the fact whether we need a
TokenStream
. This will be useful when we start collecting tokens forAttribute
- we can discard theLazyTokenStream
if the parsedattribute doesn't need tokens (e.g. is a builtin attribute).
The performance impact seems to be neglibile (see
#77250 (comment)). There is a
small slowdown on a few benchmarks, but it only rises above 1% for incremental
builds, where it represents a larger fraction of the much smaller instruction
count. There a ~1% speedup on a few other incremental benchmarks - my guess is
that the speedups and slowdowns will usually cancel out in practice.