Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use speculative parsing for with-items #11770

Merged
merged 2 commits into from
Jun 6, 2024

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Jun 6, 2024

Summary

This PR updates the with-items parsing logic to use speculative parsing instead.

Existing logic

First, let's understand the previous logic:

  1. The parser sees (, it doesn't know whether it's part of a parenthesized with items or a parenthesized expression
  2. Consider it a parenthesized with items and perform a hand-rolled speculative parsing
  3. Then, verify the assumption and if it's incorrect convert the parsed with items into an appropriate expression which becomes part of the first with item

Here, in (3) there are lots of edge cases which we've to deal with:

  1. Trailing comma with a single element should be converted to the expression as is
  2. Trailing comma with multiple elements should be converted to a tuple expression
  3. Limit the allowed expression based on whether it's (1) or (2)
  4. Consider postfix expressions after (3)
  5. Consider if expressions after (3)
  6. Consider binary expressions after (3)

Consider other cases like

And, this is all possible only if we allow parsing these expressions in the with item parsing logic.

Speculative parsing

With #11457 merged, we can simplify this logic by changing the step (3) from above to just rewind the parser back to the ( if our assumption (parenthesized with-items) was incorrect and then continue parsing it considering parenthesized expression.

This also behaves a lot similar to what a PEG parser does which is to consider the first grammar rule and if it fails consider the second grammar rule and so on.

resolves: #11639

Test Plan

  • Verify the updated snapshots
  • Run the fuzzer on around 3000 valid source code (locally)

@dhruvmanila dhruvmanila added internal An internal refactor or improvement parser Related to the parser labels Jun 6, 2024
Copy link
Contributor

github-actions bot commented Jun 6, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@dhruvmanila dhruvmanila force-pushed the dhruv/speculative-with-items-parsing branch from 992da43 to cbd6962 Compare June 6, 2024 05:48
@dhruvmanila dhruvmanila marked this pull request as ready for review June 6, 2024 06:27
Comment on lines 1440 to -1441
|
9 | with (item := 10 as f): ...
10 | with (item1, item2 := 10 as f): ...
11 | with (x for x in range(10), item): ...
| ^ Syntax Error: Expected ',', found ')'
12 | with (item, x for x in range(10)): ...
| ^^^^^^^^^^^^^^^^^^^^ Syntax Error: Unparenthesized generator expression cannot be used here
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that the speculative parsing failed so we fallback to considering parenthesized expression. The terminator token for with-items parsing with the kind being parenthesized expression is :, so it'll expect a , after a with item (item) but not at the end (:).

I think I want to improve the logic of parsing comma-separated list from (element comma) (element comma) ... to (element) (comma element) (comma element) ....

Comment on lines +55 to +60
|
1 | # `)` followed by a newline
2 | with (item1, item2)
| ^ Syntax Error: Expected ':', found newline
3 | pass
|
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 1364 to 1367
3 |
4 | with (item1, item2),: ...
| ^ Syntax Error: Expected an expression
| ^ Syntax Error: Trailing comma not allowed
5 | with (item1, item2), as f: ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't included this logic (

if with_item_kind.is_parenthesized_expression() {
// The trailing comma is optional because (1) they aren't allowed in parenthesized
// expression context and, (2) We need to raise the correct error if they're present.
//
// Consider the following three examples:
//
// ```python
// with (item1, item2): ... # (1)
// with (item1, item2),: ... # (2)
// with (item1, item2), item3,: ... # (3)
// ```
//
// Here, (1) is valid and represents a parenthesized with items while (2) and (3)
// are invalid as they are parenthesized expression. Example (3) will raise an error
// stating that a trailing comma isn't allowed, while (2) will raise an "expected an
// expression" error.
//
// The reason that (2) expects an expression is because if it raised an error
// similar to (3), we would be suggesting to remove the trailing comma, which would
// make it a parenthesized with items. This would contradict our original assumption
// that it's a parenthesized expression.
//
// However, for (3), the error is being raised by the list parsing logic and if the
// trailing comma is removed, it still remains a parenthesized expression, so it's
// fine to raise the error.
if self.eat(TokenKind::Comma) && !self.at_expr() {
self.add_error(
ParseErrorType::ExpectedExpression,
self.current_token_range(),
);
}
) which took care of raising "expected an expression" error. It's an edge case where we would raise a different error while in the following we'd raise "trailing comma not allowed".

with (item1, item2), item3,: ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a simple way to do it and I'm not sure if it's worth it. The "trailing comma" error is raised by list parsing while the "expect an expression" is raised by the with-item parsing logic. Even if I'm able to add the logic, both errors will be displayed because they're raised at separate location but they both contradict each other. So, we should only raise one of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both errors are equally good and ask the same fo the user. Remove the comma or add an expression.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I find the new code much easier to reason about!

The only thought I have is if we should restrict error recovery while we're doing speculative parsing to avoid that the error recovery breaks our validation if the with items is what we think it is... I do think it isn't necessary because the error recovery never eats over a character that we use to determine whether our assumption was correct () or : both exit the recovery logic).

Edit: I recommend you to run the fuzzer for a while in the background. It proved useful last time ;)
I should read the summary more carefully. You already did that.

],
body: [],
body: [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems better :)

@@ -15,7 +15,7 @@ Module(
is_async: false,
items: [
WithItem {
range: 6..6,
range: 5..6,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this PR changed whether the with item includes the range of the parentheses. Is this expected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is for the following code:

with (

Here, the speculative parsing fails and thus it becomes a parenthesized expression.

ExprName {
range: 149..154,
id: "item2",
range: 141..154,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old parse tree here was probably slightly better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, missing closing parenthesis is difficult to recover from because it's occurring in list parsing which skips over the : token.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the recovery skip over the :? Shouldn't it stop when seeing a : because it is part of the with items recovery set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the initial assumption of parenthesized with-items, the : is not part of the recovery set (maybe it should?) and so we don't stop at : which means we can't reliably detect whether our assumption was correct. This is why it falls back to parenthesized expression and parsing it as a tuple expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i think it probably should. Or we risk running over the end of the case header

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #11775

@dhruvmanila dhruvmanila enabled auto-merge (squash) June 6, 2024 08:56
@dhruvmanila dhruvmanila merged commit 6c1fa1d into main Jun 6, 2024
18 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/speculative-with-items-parsing branch June 6, 2024 08:59
dhruvmanila added a commit that referenced this pull request Jun 6, 2024
## Summary

This PR is a follow-up to this discussion
(#11770 (comment))
which adds the `:` token in the terminator set for parenthesized with
items.

The main motivation is to avoid parsing too much in speculative mode.
This is evident with the following _before_ and _after_ parsed with
items list for the following code:

```py
with (item1, item2:
    foo
```

<table>
  <tr>
    <th>Before (3 items)</th>
    <th>After (2 items)</th>
  </tr>
  <tr>
    <td>
<pre>
parsed_with_items: [
    ParsedWithItem {
        item: WithItem {
            range: 6..11,
            context_expr: Name(
                ExprName {
                    range: 6..11,
                    id: "item1",
                    ctx: Load,
                },
            ),
            optional_vars: None,
        },
        is_parenthesized: false,
    },
    ParsedWithItem {
        item: WithItem {
            range: 13..18,
            context_expr: Name(
                ExprName {
                    range: 13..18,
                    id: "item2",
                    ctx: Load,
                },
            ),
            optional_vars: None,
        },
        is_parenthesized: false,
    },
    ParsedWithItem {
        item: WithItem {
            range: 24..27,
            context_expr: Name(
                ExprName {
                    range: 24..27,
                    id: "foo",
                    ctx: Load,
                },
            ),
            optional_vars: None,
        },
        is_parenthesized: false,
    },
]
</pre>
	</td>
    <td>
<pre>
parsed_with_items: [
    ParsedWithItem {
        item: WithItem {
            range: 6..11,
            context_expr: Name(
                ExprName {
                    range: 6..11,
                    id: "item1",
                    ctx: Load,
                },
            ),
            optional_vars: None,
        },
        is_parenthesized: false,
    },
    ParsedWithItem {
        item: WithItem {
            range: 13..18,
            context_expr: Name(
                ExprName {
                    range: 13..18,
                    id: "item2",
                    ctx: Load,
                },
            ),
            optional_vars: None,
        },
        is_parenthesized: false,
    },
]
</pre>
	</td>
  </tr>
</table>

## Test Plan

`cargo insta test`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement parser Related to the parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update with-items parsing to use speculative parsing
2 participants