-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use speculative parsing for with-items #11770
Conversation
|
992da43
to
cbd6962
Compare
| | ||
9 | with (item := 10 as f): ... | ||
10 | with (item1, item2 := 10 as f): ... | ||
11 | with (x for x in range(10), item): ... | ||
| ^ Syntax Error: Expected ',', found ')' | ||
12 | with (item, x for x in range(10)): ... | ||
| ^^^^^^^^^^^^^^^^^^^^ Syntax Error: Unparenthesized generator expression cannot be used here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that the speculative parsing failed so we fallback to considering parenthesized expression. The terminator token for with-items parsing with the kind being parenthesized expression is :
, so it'll expect a ,
after a with item (item
) but not at the end (:
).
I think I want to improve the logic of parsing comma-separated list from (element comma) (element comma) ...
to (element) (comma element) (comma element) ...
.
| | ||
1 | # `)` followed by a newline | ||
2 | with (item1, item2) | ||
| ^ Syntax Error: Expected ':', found newline | ||
3 | pass | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an improvement: https://play.ruff.rs/3c1fd581-e5df-4aa4-8528-7550e6d63e4f
3 | | ||
4 | with (item1, item2),: ... | ||
| ^ Syntax Error: Expected an expression | ||
| ^ Syntax Error: Trailing comma not allowed | ||
5 | with (item1, item2), as f: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't included this logic (
ruff/crates/ruff_python_parser/src/parser/statement.rs
Lines 1910 to 1940 in 9b2cf56
if with_item_kind.is_parenthesized_expression() { | |
// The trailing comma is optional because (1) they aren't allowed in parenthesized | |
// expression context and, (2) We need to raise the correct error if they're present. | |
// | |
// Consider the following three examples: | |
// | |
// ```python | |
// with (item1, item2): ... # (1) | |
// with (item1, item2),: ... # (2) | |
// with (item1, item2), item3,: ... # (3) | |
// ``` | |
// | |
// Here, (1) is valid and represents a parenthesized with items while (2) and (3) | |
// are invalid as they are parenthesized expression. Example (3) will raise an error | |
// stating that a trailing comma isn't allowed, while (2) will raise an "expected an | |
// expression" error. | |
// | |
// The reason that (2) expects an expression is because if it raised an error | |
// similar to (3), we would be suggesting to remove the trailing comma, which would | |
// make it a parenthesized with items. This would contradict our original assumption | |
// that it's a parenthesized expression. | |
// | |
// However, for (3), the error is being raised by the list parsing logic and if the | |
// trailing comma is removed, it still remains a parenthesized expression, so it's | |
// fine to raise the error. | |
if self.eat(TokenKind::Comma) && !self.at_expr() { | |
self.add_error( | |
ParseErrorType::ExpectedExpression, | |
self.current_token_range(), | |
); | |
} |
with (item1, item2), item3,: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a simple way to do it and I'm not sure if it's worth it. The "trailing comma" error is raised by list parsing while the "expect an expression" is raised by the with-item parsing logic. Even if I'm able to add the logic, both errors will be displayed because they're raised at separate location but they both contradict each other. So, we should only raise one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both errors are equally good and ask the same fo the user. Remove the comma or add an expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I find the new code much easier to reason about!
The only thought I have is if we should restrict error recovery while we're doing speculative parsing to avoid that the error recovery breaks our validation if the with items is what we think it is... I do think it isn't necessary because the error recovery never eats over a character that we use to determine whether our assumption was correct ()
or :
both exit the recovery logic).
Edit: I recommend you to run the fuzzer for a while in the background. It proved useful last time ;)
I should read the summary more carefully. You already did that.
], | ||
body: [], | ||
body: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems better :)
@@ -15,7 +15,7 @@ Module( | |||
is_async: false, | |||
items: [ | |||
WithItem { | |||
range: 6..6, | |||
range: 5..6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this PR changed whether the with item includes the range of the parentheses. Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is for the following code:
with (
Here, the speculative parsing fails and thus it becomes a parenthesized expression.
ExprName { | ||
range: 149..154, | ||
id: "item2", | ||
range: 141..154, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old parse tree here was probably slightly better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, missing closing parenthesis is difficult to recover from because it's occurring in list parsing which skips over the :
token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the recovery skip over the :
? Shouldn't it stop when seeing a :
because it is part of the with items recovery set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the initial assumption of parenthesized with-items, the :
is not part of the recovery set (maybe it should?) and so we don't stop at :
which means we can't reliably detect whether our assumption was correct. This is why it falls back to parenthesized expression and parsing it as a tuple expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i think it probably should. Or we risk running over the end of the case header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #11775
## Summary This PR is a follow-up to this discussion (#11770 (comment)) which adds the `:` token in the terminator set for parenthesized with items. The main motivation is to avoid parsing too much in speculative mode. This is evident with the following _before_ and _after_ parsed with items list for the following code: ```py with (item1, item2: foo ``` <table> <tr> <th>Before (3 items)</th> <th>After (2 items)</th> </tr> <tr> <td> <pre> parsed_with_items: [ ParsedWithItem { item: WithItem { range: 6..11, context_expr: Name( ExprName { range: 6..11, id: "item1", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 13..18, context_expr: Name( ExprName { range: 13..18, id: "item2", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 24..27, context_expr: Name( ExprName { range: 24..27, id: "foo", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ] </pre> </td> <td> <pre> parsed_with_items: [ ParsedWithItem { item: WithItem { range: 6..11, context_expr: Name( ExprName { range: 6..11, id: "item1", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 13..18, context_expr: Name( ExprName { range: 13..18, id: "item2", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ] </pre> </td> </tr> </table> ## Test Plan `cargo insta test`
Summary
This PR updates the with-items parsing logic to use speculative parsing instead.
Existing logic
First, let's understand the previous logic:
(
, it doesn't know whether it's part of a parenthesized with items or a parenthesized expressionHere, in (3) there are lots of edge cases which we've to deal with:
if
expressions after (3)Consider other cases like
And, this is all possible only if we allow parsing these expressions in the with item parsing logic.
Speculative parsing
With #11457 merged, we can simplify this logic by changing the step (3) from above to just rewind the parser back to the
(
if our assumption (parenthesized with-items) was incorrect and then continue parsing it considering parenthesized expression.This also behaves a lot similar to what a PEG parser does which is to consider the first grammar rule and if it fails consider the second grammar rule and so on.
resolves: #11639
Test Plan