-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove special code-path for handing unknown tokens #63017
Conversation
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☔ The latest upstream changes (presumably #63015) made this pull request unmergeable. Please resolve the merge conflicts. |
It seems like unknown tokens are usually some weird unicode punctuation (at least from tests). #62963 provided the best solution to this, IMO - treating unknown punctuation as some similar known punctuation. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Yeah, just skipping it on the parser level seems the right behavior. I am not sure that using |
d43949e
to
e6ce091
Compare
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
self.lexer.try_next_token().map_err(|()| HighlightError::LexError) | ||
let token = self.lexer.next_token(); | ||
if let token::Unknown(..) = &token.kind { | ||
return Err(HighlightError::LexError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is HighlightError::LexError
necessary?
Can't the highlighter treat the token as a whitespace and continue, similarly to parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly to minimize the diff in code and tests. I think rustdoc side needs a different approach altogether, to avoid duplicating lexer errors in two passes. But rusdoc can be improved separatelly
Well, since their recovery strategy is different ("skip" vs "parse as I'm mildly skeptical about introducing |
☔ The latest upstream changes (presumably #63194) made this pull request unmergeable. Please resolve the merge conflicts. |
Filed #63284 for rustdoc fix (which we probably should do regardless of the current PR) and updated the tests to squash irrelevant errors.
Yeah, I get roughly the same feelings. However it seems that merging whitespace and unknown later would be easier than splitting them. |
@bors r+ |
📌 Commit b3e8c8b has been approved by |
Remove special code-path for handing unknown tokens In `StringReader`, we have a buffer of fatal errors, which is used only in a single case: when we see something which is not a reasonable token at all, like `🦀`. I think a more straightforward thing to do here is to produce an explicit error token in this case, and let the next layer (the parser), deal with it. However currently this leads to duplicated error messages. What should we do with this? Naively, I would think that emitting (just emitting, not raising) `FatalError` should stop other errors, but looks like this is not the case? We can also probably tweak parser on the case-by-case basis, to avoid emitting "expected" errors if the current token is an `Err`. I personally also fine with cascading errors in this case: it's quite unlikely that you actually type a fully invalid token. @petrochenkov, which approach should we take to fight cascading errors?
Remove special code-path for handing unknown tokens In `StringReader`, we have a buffer of fatal errors, which is used only in a single case: when we see something which is not a reasonable token at all, like `🦀`. I think a more straightforward thing to do here is to produce an explicit error token in this case, and let the next layer (the parser), deal with it. However currently this leads to duplicated error messages. What should we do with this? Naively, I would think that emitting (just emitting, not raising) `FatalError` should stop other errors, but looks like this is not the case? We can also probably tweak parser on the case-by-case basis, to avoid emitting "expected" errors if the current token is an `Err`. I personally also fine with cascading errors in this case: it's quite unlikely that you actually type a fully invalid token. @petrochenkov, which approach should we take to fight cascading errors?
Remove special code-path for handing unknown tokens In `StringReader`, we have a buffer of fatal errors, which is used only in a single case: when we see something which is not a reasonable token at all, like `🦀`. I think a more straightforward thing to do here is to produce an explicit error token in this case, and let the next layer (the parser), deal with it. However currently this leads to duplicated error messages. What should we do with this? Naively, I would think that emitting (just emitting, not raising) `FatalError` should stop other errors, but looks like this is not the case? We can also probably tweak parser on the case-by-case basis, to avoid emitting "expected" errors if the current token is an `Err`. I personally also fine with cascading errors in this case: it's quite unlikely that you actually type a fully invalid token. @petrochenkov, which approach should we take to fight cascading errors?
Remove special code-path for handing unknown tokens In `StringReader`, we have a buffer of fatal errors, which is used only in a single case: when we see something which is not a reasonable token at all, like `🦀`. I think a more straightforward thing to do here is to produce an explicit error token in this case, and let the next layer (the parser), deal with it. However currently this leads to duplicated error messages. What should we do with this? Naively, I would think that emitting (just emitting, not raising) `FatalError` should stop other errors, but looks like this is not the case? We can also probably tweak parser on the case-by-case basis, to avoid emitting "expected" errors if the current token is an `Err`. I personally also fine with cascading errors in this case: it's quite unlikely that you actually type a fully invalid token. @petrochenkov, which approach should we take to fight cascading errors?
Rollup of 14 pull requests Successful merges: - #61457 (Implement DoubleEndedIterator for iter::{StepBy, Peekable, Take}) - #63017 (Remove special code-path for handing unknown tokens) - #63184 (Explaining the reason why validation is performed in to_str of path.rs) - #63230 (Make use of possibly uninitialized data [E0381] a hard error) - #63260 (fix UB in a test) - #63264 (Revert "Rollup merge of #62696 - chocol4te:fix_#62194, r=estebank") - #63272 (Some more libsyntax::attr cleanup) - #63285 (Remove leftover AwaitOrigin) - #63287 (Don't store &Span) - #63293 (Clarify align_to's requirements and obligations) - #63295 (improve align_offset docs) - #63299 (Make qualify consts in_projection use PlaceRef) - #63312 (doc: fix broken sentence) - #63315 (Fix #63313) Failed merges: r? @ghost
buffer lexer errors in rustdoc syntax checking The code isn't ideal (I really would like to display the errors inline), but this at least gets us to where we were before #63017.
In
StringReader
, we have a buffer of fatal errors, which is used only in a single case: when we see something which is not a reasonable token at all, like🦀
. I think a more straightforward thing to do here is to produce an explicit error token in this case, and let the next layer (the parser), deal with it.However currently this leads to duplicated error messages. What should we do with this? Naively, I would think that emitting (just emitting, not raising)
FatalError
should stop other errors, but looks like this is not the case? We can also probably tweak parser on the case-by-case basis, to avoid emitting "expected" errors if the current token is anErr
. I personally also fine with cascading errors in this case: it's quite unlikely that you actually type a fully invalid token.@petrochenkov, which approach should we take to fight cascading errors?