-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Move jointness info from TokenStream to Token #75528
Conversation
r? @varkor (rust_highfive has picked a reviewer for you, use r? to override) |
☔ The latest upstream changes (presumably #73851) made this pull request unmergeable. Please resolve the merge conflicts. |
So, I recently tried to do something like this too (to be able to address pretty-printing regressions in #73345).
So, I think, the conclusion is that we should:
This PR moves things closer to that state, but enters a pretty weird intermediate state where |
On jointness-from-the-left vs jointness-from-the-right. Proc macro API currently uses the right jointness because it's more convenient for parsers - less need for lookahead. It's less convenient for lexers, because to obtain jointness for a token we need to lex an arbitrary number of the following (whitespace) tokens. However, if we are producing jointness flags in a pre-parser (token stream parser) rather than in a lexer, like it's mentioned in the comment above, then keeping jointness-from-the-right everywhere should be ok, I think. |
Yup, this all makes sense, especially "avoiding unstable intermediate states". I do think I now have capacity to pour into these work (20 hours per week for months), but it makes sense to make more sure steps, especially in the beginning. So, I'll look into removing I am less sure about merging Longer explanation: I think the core idea we have here is that "there are many tokens in rust". proc macro tokens != mbe tokens, for example, so some amount of bridging would be required somewhere. Here, I'd like to distill the notion of "parser tokens" -- what the input to the parser looks like. This is an unorthodox viewpoint, but I am not sure that rust parser operates on token trees ( :D ). Today, it uses fn main() {
let x = foo(
let y = ();
} Where "reasonably parse" means "the syntax tree should have So, long-term, I think the parser should work just with Though, even if the above is a good idea, it might make sense to collapse Token & TokenTree as an intermediate state. |
@petrochenkov could you clarify if "Make one more step towards fully token-based expansion" refers only to proc-macros, or do we want to make mbe work without non-terminals as well? That is, I am reading this as "we can bend backwards compat gently enough to completely remove Nt from everywhere", but I wonder if I am overly optimistic here :) |
FWIW,
AFAIK, rustc recovers unbalanced delimiters during pre-parsing aka token stream parsing now, not during regular parsing (but I'm not sure).
It refers to everything, including MBEs. * With rare exceptions like |
Let's close this for now, to keep the set of open PRs smaller. There are several baby yaks to be shaved before this one |
Part of #63689.
The TL;RD of that issue is that rustc represents
>>
as a single token at the moment, while proc_macros represent it as a pair of tokens(>, Joint), (>, _)
. And we want to move the parser to proc_macro representation of tokens, with the two main motivations being: a) don't having two things in the compiler, b) making parser's interface more obviously right, to help with extracting the parser into a separate library.Moreover, at the moment
rustc
actually tracks jointness in two ways in a single data structure.TokenStream
, before this PR stores(TokenTree, IsJoin)
pairs, while theTokenTree
itself can be a composite or decomposed token. And this jointness info is pretty easily lost via this impl. This PR by itself doesn't solve the two reprs problem, but it does help with not accidentally loosing jointness. In particular, macros by example now preserve jointness, while they erased it previously.Rebase of #64782