Use speculative parsing for with-items #11770

dhruvmanila · 2024-06-06T04:48:18Z

Summary

This PR updates the with-items parsing logic to use speculative parsing instead.

Existing logic

First, let's understand the previous logic:

The parser sees (, it doesn't know whether it's part of a parenthesized with items or a parenthesized expression
Consider it a parenthesized with items and perform a hand-rolled speculative parsing
Then, verify the assumption and if it's incorrect convert the parsed with items into an appropriate expression which becomes part of the first with item

Here, in (3) there are lots of edge cases which we've to deal with:

Trailing comma with a single element should be converted to the expression as is
Trailing comma with multiple elements should be converted to a tuple expression
Limit the allowed expression based on whether it's (1) or (2)
Consider postfix expressions after (3)
Consider if expressions after (3)
Consider binary expressions after (3)

Consider other cases like

And, this is all possible only if we allow parsing these expressions in the with item parsing logic.

Speculative parsing

With #11457 merged, we can simplify this logic by changing the step (3) from above to just rewind the parser back to the ( if our assumption (parenthesized with-items) was incorrect and then continue parsing it considering parenthesized expression.

This also behaves a lot similar to what a PEG parser does which is to consider the first grammar rule and if it fails consider the second grammar rule and so on.

resolves: #11639

Test Plan

Verify the updated snapshots
Run the fuzzer on around 3000 valid source code (locally)

github-actions · 2024-06-06T05:07:45Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

dhruvmanila · 2024-06-06T06:38:14Z

...on_parser/tests/snapshots/invalid_syntax@statements__with__ambiguous_lpar_with_items.py.snap

   |
+ 9 | with (item := 10 as f): ...
 10 | with (item1, item2 := 10 as f): ...
 11 | with (x for x in range(10), item): ...
+   |                                 ^ Syntax Error: Expected ',', found ')'
 12 | with (item, x for x in range(10)): ...
-   |             ^^^^^^^^^^^^^^^^^^^^ Syntax Error: Unparenthesized generator expression cannot be used here


The problem here is that the speculative parsing failed so we fallback to considering parenthesized expression. The terminator token for with-items parsing with the kind being parenthesized expression is :, so it'll expect a , after a with item (item) but not at the end (:).

I think I want to improve the logic of parsing comma-separated list from (element comma) (element comma) ... to (element) (comma element) (comma element) ....

dhruvmanila · 2024-06-06T06:38:57Z

..._python_parser/tests/snapshots/invalid_syntax@with_items_parenthesized_missing_colon.py.snap

+  |
+1 | # `)` followed by a newline
+2 | with (item1, item2)
+  |                    ^ Syntax Error: Expected ':', found newline
+3 |     pass
+  |


This is an improvement: https://play.ruff.rs/3c1fd581-e5df-4aa4-8528-7550e6d63e4f

dhruvmanila · 2024-06-06T06:41:24Z

...on_parser/tests/snapshots/invalid_syntax@statements__with__ambiguous_lpar_with_items.py.snap

 3 | 
 4 | with (item1, item2),: ...
-  |                     ^ Syntax Error: Expected an expression
+  |                    ^ Syntax Error: Trailing comma not allowed
 5 | with (item1, item2), as f: ...


I haven't included this logic (

ruff/crates/ruff_python_parser/src/parser/statement.rs

Lines 1910 to 1940 in 9b2cf56

if with_item_kind.is_parenthesized_expression() {

// The trailing comma is optional because (1) they aren't allowed in parenthesized

// expression context and, (2) We need to raise the correct error if they're present.

//

// Consider the following three examples:

//

// ```python

// with (item1, item2): ... # (1)

// with (item1, item2),: ... # (2)

// with (item1, item2), item3,: ... # (3)

// ```

//

// Here, (1) is valid and represents a parenthesized with items while (2) and (3)

// are invalid as they are parenthesized expression. Example (3) will raise an error

// stating that a trailing comma isn't allowed, while (2) will raise an "expected an

// expression" error.

//

// The reason that (2) expects an expression is because if it raised an error

// similar to (3), we would be suggesting to remove the trailing comma, which would

// make it a parenthesized with items. This would contradict our original assumption

// that it's a parenthesized expression.

//

// However, for (3), the error is being raised by the list parsing logic and if the

// trailing comma is removed, it still remains a parenthesized expression, so it's

// fine to raise the error.

if self.eat(TokenKind::Comma) && !self.at_expr() {

self.add_error(

ParseErrorType::ExpectedExpression,

self.current_token_range(),

);

}

) which took care of raising "expected an expression" error. It's an edge case where we would raise a different error while in the following we'd raise "trailing comma not allowed".

with (item1, item2), item3,: ...

I couldn't find a simple way to do it and I'm not sure if it's worth it. The "trailing comma" error is raised by list parsing while the "expect an expression" is raised by the with-item parsing logic. Even if I'm able to add the logic, both errors will be displayed because they're raised at separate location but they both contradict each other. So, we should only raise one of them.

I think both errors are equally good and ask the same fo the user. Remove the comma or add an expression.

MichaReiser

Nice! I find the new code much easier to reason about!

The only thought I have is if we should restrict error recovery while we're doing speculative parsing to avoid that the error recovery breaks our validation if the with items is what we think it is... I do think it isn't necessary because the error recovery never eats over a character that we use to determine whether our assumption was correct () or : both exit the recovery logic).

Edit: ~~I recommend you to run the fuzzer for a while in the background. It proved useful last time ;)~~
I should read the summary more carefully. You already did that.

MichaReiser · 2024-06-06T07:29:08Z

...thon_parser/tests/snapshots/invalid_syntax@statements__with__unclosed_ambiguous_lpar.py.snap

                    ],
-                    body: [],
+                    body: [


This seems better :)

MichaReiser · 2024-06-06T07:30:01Z

..._parser/tests/snapshots/invalid_syntax@statements__with__unclosed_ambiguous_lpar_eof.py.snap

@@ -15,7 +15,7 @@ Module(
                    is_async: false,
                    items: [
                        WithItem {
-                            range: 6..6,
+                            range: 5..6,


It seems that this PR changed whether the with item includes the range of the parentheses. Is this expected?

Yes. This is for the following code:

with (

Here, the speculative parsing fails and thus it becomes a parenthesized expression.

MichaReiser · 2024-06-06T07:31:24Z

..._python_parser/tests/snapshots/invalid_syntax@with_items_parenthesized_missing_comma.py.snap

-                                ExprName {
-                                    range: 149..154,
-                                    id: "item2",
+                            range: 141..154,


The old parse tree here was probably slightly better.

Yeah, missing closing parenthesis is difficult to recover from because it's occurring in list parsing which skips over the : token.

Should the recovery skip over the :? Shouldn't it stop when seeing a : because it is part of the with items recovery set?

For the initial assumption of parenthesized with-items, the : is not part of the recovery set (maybe it should?) and so we don't stop at : which means we can't reliably detect whether our assumption was correct. This is why it falls back to parenthesized expression and parsing it as a tuple expression.

Yeah, i think it probably should. Or we risk running over the end of the case header

Done in #11775

## Summary This PR is a follow-up to this discussion (#11770 (comment)) which adds the `:` token in the terminator set for parenthesized with items. The main motivation is to avoid parsing too much in speculative mode. This is evident with the following _before_ and _after_ parsed with items list for the following code: ```py with (item1, item2: foo ``` <table> <tr> <th>Before (3 items)</th> <th>After (2 items)</th> </tr> <tr> <td> <pre> parsed_with_items: [ ParsedWithItem { item: WithItem { range: 6..11, context_expr: Name( ExprName { range: 6..11, id: "item1", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 13..18, context_expr: Name( ExprName { range: 13..18, id: "item2", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 24..27, context_expr: Name( ExprName { range: 24..27, id: "foo", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ] </pre> </td> <td> <pre> parsed_with_items: [ ParsedWithItem { item: WithItem { range: 6..11, context_expr: Name( ExprName { range: 6..11, id: "item1", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 13..18, context_expr: Name( ExprName { range: 13..18, id: "item2", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ] </pre> </td> </tr> </table> ## Test Plan `cargo insta test`

dhruvmanila added internal An internal refactor or improvement parser Related to the parser labels Jun 6, 2024

Use speculative parsing for with-items

cbd6962

dhruvmanila force-pushed the dhruv/speculative-with-items-parsing branch from 992da43 to cbd6962 Compare June 6, 2024 05:48

dhruvmanila marked this pull request as ready for review June 6, 2024 06:27

dhruvmanila requested a review from MichaReiser as a code owner June 6, 2024 06:27

dhruvmanila commented Jun 6, 2024

View reviewed changes

MichaReiser approved these changes Jun 6, 2024

View reviewed changes

Update docs for parse_generator_expression

e89bcc3

dhruvmanila enabled auto-merge (squash) June 6, 2024 08:56

dhruvmanila merged commit 6c1fa1d into main Jun 6, 2024
18 checks passed

dhruvmanila deleted the dhruv/speculative-with-items-parsing branch June 6, 2024 08:59

dhruvmanila mentioned this pull request Jun 6, 2024

Consider : to terminate parenthesized with items #11775

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use speculative parsing for with-items #11770

Use speculative parsing for with-items #11770

dhruvmanila commented Jun 6, 2024 •

edited

Loading

github-actions bot commented Jun 6, 2024 •

edited

Loading

dhruvmanila Jun 6, 2024

dhruvmanila Jun 6, 2024

dhruvmanila Jun 6, 2024

dhruvmanila Jun 6, 2024

MichaReiser Jun 6, 2024

MichaReiser left a comment •

edited

Loading

MichaReiser Jun 6, 2024

MichaReiser Jun 6, 2024

dhruvmanila Jun 6, 2024

MichaReiser Jun 6, 2024

dhruvmanila Jun 6, 2024

MichaReiser Jun 6, 2024

dhruvmanila Jun 6, 2024

MichaReiser Jun 6, 2024

dhruvmanila Jun 6, 2024

	if with_item_kind.is_parenthesized_expression() {
	// The trailing comma is optional because (1) they aren't allowed in parenthesized
	// expression context and, (2) We need to raise the correct error if they're present.
	//
	// Consider the following three examples:
	//
	// ```python
	// with (item1, item2): ... # (1)
	// with (item1, item2),: ... # (2)
	// with (item1, item2), item3,: ... # (3)
	// ```
	//
	// Here, (1) is valid and represents a parenthesized with items while (2) and (3)
	// are invalid as they are parenthesized expression. Example (3) will raise an error
	// stating that a trailing comma isn't allowed, while (2) will raise an "expected an
	// expression" error.
	//
	// The reason that (2) expects an expression is because if it raised an error
	// similar to (3), we would be suggesting to remove the trailing comma, which would
	// make it a parenthesized with items. This would contradict our original assumption
	// that it's a parenthesized expression.
	//
	// However, for (3), the error is being raised by the list parsing logic and if the
	// trailing comma is removed, it still remains a parenthesized expression, so it's
	// fine to raise the error.
	if self.eat(TokenKind::Comma) && !self.at_expr() {
	self.add_error(
	ParseErrorType::ExpectedExpression,
	self.current_token_range(),
	);
	}

Use speculative parsing for with-items #11770

Use speculative parsing for with-items #11770

Conversation

dhruvmanila commented Jun 6, 2024 • edited Loading

Summary

Existing logic

Speculative parsing

Test Plan

github-actions bot commented Jun 6, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvmanila commented Jun 6, 2024 •

edited

Loading

github-actions bot commented Jun 6, 2024 •

edited

Loading

`ruff-ecosystem` results

MichaReiser left a comment •

edited

Loading