Skip to content

Commit

Permalink
Use speculative parsing for with-items (#11770)
Browse files Browse the repository at this point in the history
## Summary

This PR updates the with-items parsing logic to use speculative parsing
instead.

### Existing logic

First, let's understand the previous logic:
1. The parser sees `(`, it doesn't know whether it's part of a
parenthesized with items or a parenthesized expression
2. Consider it a parenthesized with items and perform a hand-rolled
speculative parsing
3. Then, verify the assumption and if it's incorrect convert the parsed
with items into an appropriate expression which becomes part of the
first with item

Here, in (3) there are lots of edge cases which we've to deal with:
1. Trailing comma with a single element should be [converted to the
expression as
is](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2140-L2153)
2. Trailing comma with multiple elements should be [converted to a tuple
expression](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2155-L2178)
3. Limit the allowed expression based on whether it's
[(1)](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2144-L2152)
or
[(2)](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2157-L2171)
4. [Consider postfix
expressions](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2181-L2200)
after (3)
5. [Consider `if`
expressions](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2203-L2208)
after (3)
6. [Consider binary
expressions](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2210-L2228)
after (3)

Consider other cases like
* [Single generator
expression](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2020-L2035)
* [Expecting a
comma](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2122-L2130)

And, this is all possible only if we allow parsing these expressions in
the [with item parsing
logic](https://github.com/astral-sh/ruff/blob/9b2cf569b22855439fa916be6fc417b372074f42/crates/ruff_python_parser/src/parser/statement.rs#L2287-L2334).

### Speculative parsing

With #11457 merged, we can simplify this logic by changing the step (3)
from above to just rewind the parser back to the `(` if our assumption
(parenthesized with-items) was incorrect and then continue parsing it
considering parenthesized expression.

This also behaves a lot similar to what a PEG parser does which is to
consider the first grammar rule and if it fails consider the second
grammar rule and so on.

resolves: #11639 

## Test Plan

- [x] Verify the updated snapshots
- [x] Run the fuzzer on around 3000 valid source code (locally)
  • Loading branch information
dhruvmanila authored Jun 6, 2024
1 parent 5a5a588 commit 6c1fa1d
Show file tree
Hide file tree
Showing 9 changed files with 544 additions and 719 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `)` followed by a newline
with (item1, item2)
pass
61 changes: 12 additions & 49 deletions crates/ruff_python_parser/src/parser/expression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -689,7 +689,8 @@ impl<'src> Parser<'src> {

parsed_expr = Expr::Generator(parser.parse_generator_expression(
parsed_expr.expr,
GeneratorExpressionInParentheses::No(start),
start,
Parenthesized::No,
))
.into();
}
Expand Down Expand Up @@ -1705,7 +1706,8 @@ impl<'src> Parser<'src> {

let generator = Expr::Generator(self.parse_generator_expression(
parsed_expr.expr,
GeneratorExpressionInParentheses::Yes(start),
start,
Parenthesized::Yes,
));

ParsedExpr {
Expand Down Expand Up @@ -1929,46 +1931,27 @@ impl<'src> Parser<'src> {

/// Parses a generator expression.
///
/// The given `in_parentheses` parameter is used to determine whether the generator
/// expression is enclosed in parentheses or not:
/// - `Yes`, expect the `)` token after the generator expression.
/// - `No`, no parentheses are expected.
/// - `Maybe`, consume the `)` token if it's present.
///
/// The contained start position in each variant is used to determine the range
/// of the generator expression.
/// The given `start` offset is the start of either the opening parenthesis if the generator is
/// parenthesized or the first token of the expression.
///
/// See: <https://docs.python.org/3/reference/expressions.html#generator-expressions>
pub(super) fn parse_generator_expression(
&mut self,
element: Expr,
in_parentheses: GeneratorExpressionInParentheses,
start: TextSize,
parenthesized: Parenthesized,
) -> ast::ExprGenerator {
let generators = self.parse_generators();

let (parenthesized, start) = match in_parentheses {
GeneratorExpressionInParentheses::Yes(lpar_start) => {
self.expect(TokenKind::Rpar);
(true, lpar_start)
}
GeneratorExpressionInParentheses::No(expr_start) => (false, expr_start),
GeneratorExpressionInParentheses::Maybe {
lpar_start,
expr_start,
} => {
if self.eat(TokenKind::Rpar) {
(true, lpar_start)
} else {
(false, expr_start)
}
}
};
if parenthesized.is_yes() {
self.expect(TokenKind::Rpar);
}

ast::ExprGenerator {
elt: Box::new(element),
generators,
range: self.node_range(start),
parenthesized,
parenthesized: parenthesized.is_yes(),
}
}

Expand Down Expand Up @@ -2472,26 +2455,6 @@ impl From<Operator> for OperatorPrecedence {
}
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(super) enum GeneratorExpressionInParentheses {
/// The generator expression is in parentheses. The given [`TextSize`] is the
/// start of the left parenthesis. E.g., `(x for x in range(10))`.
Yes(TextSize),

/// The generator expression is not in parentheses. The given [`TextSize`] is the
/// start of the expression. E.g., `x for x in range(10)`.
No(TextSize),

/// The generator expression may or may not be in parentheses. The given [`TextSize`]s
/// are the start of the left parenthesis and the start of the expression, respectively.
Maybe {
/// The start of the left parenthesis.
lpar_start: TextSize,
/// The start of the expression.
expr_start: TextSize,
},
}

/// Represents the precedence used for parsing the value part of a starred expression.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(super) enum StarredExpressionPrecedence {
Expand Down
34 changes: 6 additions & 28 deletions crates/ruff_python_parser/src/parser/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -705,16 +705,6 @@ enum WithItemKind {
/// The parentheses belongs to the context expression.
ParenthesizedExpression,

/// A list of `with` items that has only one item which is a parenthesized
/// generator expression.
///
/// ```python
/// with (x for x in range(10)): ...
/// ```
///
/// The parentheses belongs to the generator expression.
SingleParenthesizedGeneratorExpression,

/// The `with` items aren't parenthesized in any way.
///
/// ```python
Expand All @@ -732,20 +722,15 @@ impl WithItemKind {
const fn list_terminator(self) -> TokenKind {
match self {
WithItemKind::Parenthesized => TokenKind::Rpar,
WithItemKind::Unparenthesized
| WithItemKind::ParenthesizedExpression
| WithItemKind::SingleParenthesizedGeneratorExpression => TokenKind::Colon,
WithItemKind::Unparenthesized | WithItemKind::ParenthesizedExpression => {
TokenKind::Colon
}
}
}

/// Returns `true` if the `with` item is a parenthesized expression i.e., the
/// parentheses belong to the context expression.
const fn is_parenthesized_expression(self) -> bool {
matches!(
self,
WithItemKind::ParenthesizedExpression
| WithItemKind::SingleParenthesizedGeneratorExpression
)
/// Returns `true` if the with items are parenthesized.
const fn is_parenthesized(self) -> bool {
matches!(self, WithItemKind::Parenthesized)
}
}

Expand Down Expand Up @@ -1172,7 +1157,6 @@ bitflags! {
const LAMBDA_PARAMETERS = 1 << 24;
const WITH_ITEMS_PARENTHESIZED = 1 << 25;
const WITH_ITEMS_PARENTHESIZED_EXPRESSION = 1 << 26;
const WITH_ITEMS_SINGLE_PARENTHESIZED_GENERATOR_EXPRESSION = 1 << 27;
const WITH_ITEMS_UNPARENTHESIZED = 1 << 28;
const F_STRING_ELEMENTS = 1 << 29;
}
Expand Down Expand Up @@ -1225,9 +1209,6 @@ impl RecoveryContext {
WithItemKind::ParenthesizedExpression => {
RecoveryContext::WITH_ITEMS_PARENTHESIZED_EXPRESSION
}
WithItemKind::SingleParenthesizedGeneratorExpression => {
RecoveryContext::WITH_ITEMS_SINGLE_PARENTHESIZED_GENERATOR_EXPRESSION
}
WithItemKind::Unparenthesized => RecoveryContext::WITH_ITEMS_UNPARENTHESIZED,
},
RecoveryContextKind::FStringElements => RecoveryContext::F_STRING_ELEMENTS,
Expand Down Expand Up @@ -1294,9 +1275,6 @@ impl RecoveryContext {
RecoveryContext::WITH_ITEMS_PARENTHESIZED_EXPRESSION => {
RecoveryContextKind::WithItems(WithItemKind::ParenthesizedExpression)
}
RecoveryContext::WITH_ITEMS_SINGLE_PARENTHESIZED_GENERATOR_EXPRESSION => {
RecoveryContextKind::WithItems(WithItemKind::SingleParenthesizedGeneratorExpression)
}
RecoveryContext::WITH_ITEMS_UNPARENTHESIZED => {
RecoveryContextKind::WithItems(WithItemKind::Unparenthesized)
}
Expand Down
Loading

0 comments on commit 6c1fa1d

Please sign in to comment.