Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update with-items parsing to use speculative parsing #11639

Closed
dhruvmanila opened this issue May 31, 2024 · 0 comments · Fixed by #11770
Closed

Update with-items parsing to use speculative parsing #11639

dhruvmanila opened this issue May 31, 2024 · 0 comments · Fixed by #11770
Assignees
Labels
parser Related to the parser

Comments

@dhruvmanila
Copy link
Member

#11457 added the ability to do speculative parsing via checkpoint - rewind infrastructure and is currently being used for match statement via #11443.

To give a brief background, speculative parsing means that the parser tries to parse a syntax node as one kind and determines at the end if the assumption was right by testing if the parser is at a specific token (or has no errors). This is useful if a syntax is ambiguous and no amount of lookahead (except parsing the whole syntax) is sufficient to determine what syntax it is.

The idea here is to update the current parsing logic for with-items to use the checkpoint - rewind infrastructure. The current logic performs a manual and a bit complex speculative parsing to disambiguate between a parenthesized with-items and parenthesized with-item expression i.e., whether the parenthesis belongs to with-items or the expression of the first with-item. For example,

with (item1, item2): ...       # Parenthesized with-items
with (item1, item2) as f: ...  # Parenthesized expression
@dhruvmanila dhruvmanila added the parser Related to the parser label May 31, 2024
@dhruvmanila dhruvmanila self-assigned this May 31, 2024
@dhruvmanila dhruvmanila changed the title Update with-items parsing to use checkpoint - rewind infrastructure Update with-items parsing to use speculative parsing Jun 4, 2024
dhruvmanila added a commit that referenced this issue Jun 6, 2024
## Summary

This PR updates the with-items parsing logic to use speculative parsing
instead.

### Existing logic

First, let's understand the previous logic:
1. The parser sees `(`, it doesn't know whether it's part of a
parenthesized with items or a parenthesized expression
2. Consider it a parenthesized with items and perform a hand-rolled
speculative parsing
3. Then, verify the assumption and if it's incorrect convert the parsed
with items into an appropriate expression which becomes part of the
first with item

Here, in (3) there are lots of edge cases which we've to deal with:
1. Trailing comma with a single element should be [converted to the
expression as
is](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2140-L2153)
2. Trailing comma with multiple elements should be [converted to a tuple
expression](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2155-L2178)
3. Limit the allowed expression based on whether it's
[(1)](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2144-L2152)
or
[(2)](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2157-L2171)
4. [Consider postfix
expressions](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2181-L2200)
after (3)
5. [Consider `if`
expressions](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2203-L2208)
after (3)
6. [Consider binary
expressions](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2210-L2228)
after (3)

Consider other cases like
* [Single generator
expression](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2020-L2035)
* [Expecting a
comma](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2122-L2130)

And, this is all possible only if we allow parsing these expressions in
the [with item parsing
logic](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2287-L2334).

### Speculative parsing

With #11457 merged, we can simplify this logic by changing the step (3)
from above to just rewind the parser back to the `(` if our assumption
(parenthesized with-items) was incorrect and then continue parsing it
considering parenthesized expression.

This also behaves a lot similar to what a PEG parser does which is to
consider the first grammar rule and if it fails consider the second
grammar rule and so on.

resolves: #11639 

## Test Plan

- [x] Verify the updated snapshots
- [x] Run the fuzzer on around 3000 valid source code (locally)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to the parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant