-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement re-lexing logic for better error recovery (#11845)
## Summary This PR implements the re-lexing logic in the parser. This logic is only applied when recovering from an error during list parsing. The logic is as follows: 1. During list parsing, if an unexpected token is encountered and it detects that an outer context can understand it and thus recover from it, it invokes the re-lexing logic in the lexer 2. This logic first checks if the lexer is in a parenthesized context and returns if it's not. Thus, the logic is a no-op if the lexer isn't in a parenthesized context 3. It then reduces the nesting level by 1. It shouldn't reset it to 0 because otherwise the recovery from nested list parsing will be incorrect 4. Then, it tries to find last newline character going backwards from the current position of the lexer. This avoids any whitespaces but if it encounters any character other than newline or whitespace, it aborts. 5. Now, if there's a newline character, then it needs to be re-lexed in a logical context which means that the lexer needs to emit it as a `Newline` token instead of `NonLogicalNewline`. 6. If the re-lexing gives a different token than the current one, the token source needs to update it's token collection to remove all the tokens which comes after the new current position. It turns out that the list parsing isn't that happy with the results so it requires some re-arranging such that the following two errors are raised correctly: 1. Expected comma 2. Recovery context error For (1), the following scenarios needs to be considered: * Missing comma between two elements * Half parsed element because the grammar doesn't allow it (for example, named expressions) For (2), the following scenarios needs to be considered: 1. If the parser is at a comma which means that there's a missing element otherwise the comma would've been consumed by the first `eat` call above. And, the parser doesn't take the re-lexing route on a comma token. 2. If it's the first element and the current token is not a comma which means that it's an invalid element. resolves: #11640 ## Test Plan - [x] Update existing test snapshots and validate them - [x] Add additional test cases specific to the re-lexing logic and validate the snapshots - [x] Run the fuzzer on 3000+ valid inputs - [x] Run the fuzzer on invalid inputs - [x] Run the parser on various open source projects - [x] Make sure the ecosystem changes are none
- Loading branch information
1 parent
1f654ee
commit 8499abf
Showing
43 changed files
with
1,585 additions
and
204 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
crates/ruff_python_parser/resources/inline/err/comma_separated_missing_comma.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
call(**x := 1) |
2 changes: 2 additions & 0 deletions
2
...ruff_python_parser/resources/inline/err/comma_separated_missing_comma_between_elements.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# The comma between the first two elements is expected in `parse_list_expression`. | ||
[0, 1 2] |
1 change: 1 addition & 0 deletions
1
...ruff_python_parser/resources/inline/err/comma_separated_missing_element_between_commas.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
[0, 1, , 2] |
1 change: 1 addition & 0 deletions
1
crates/ruff_python_parser/resources/inline/err/comma_separated_missing_first_element.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
call(= 1) |
7 changes: 7 additions & 0 deletions
7
crates/ruff_python_parser/resources/inline/ok/comma_separated_regular_list_terminator.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# The first element is parsed by `parse_list_like_expression` and the comma after | ||
# the first element is expected by `parse_list_expression` | ||
[0] | ||
[0, 1] | ||
[0, 1,] | ||
[0, 1, 2] | ||
[0, 1, 2,] |
46 changes: 46 additions & 0 deletions
46
crates/ruff_python_parser/resources/invalid/re_lex_logical_token.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# No indentation before the function definition | ||
if call(foo | ||
def bar(): | ||
pass | ||
|
||
|
||
# Indented function definition | ||
if call(foo | ||
def bar(): | ||
pass | ||
|
||
|
||
# There are multiple non-logical newlines (blank lines) in the `if` body | ||
if call(foo | ||
|
||
|
||
def bar(): | ||
pass | ||
|
||
|
||
# There are trailing whitespaces in the blank line inside the `if` body | ||
if call(foo | ||
|
||
def bar(): | ||
pass | ||
|
||
|
||
# The lexer is nested with multiple levels of parentheses | ||
if call(foo, [a, b | ||
def bar(): | ||
pass | ||
|
||
|
||
# The outer parenthesis is closed but the inner bracket isn't | ||
if call(foo, [a, b) | ||
def bar(): | ||
pass | ||
|
||
|
||
# The parser tries to recover from an unclosed `]` when the current token is `)`. This | ||
# test is to make sure it emits a `NonLogicalNewline` token after `b`. | ||
if call(foo, [a, | ||
b | ||
) | ||
def bar(): | ||
pass |
1 change: 1 addition & 0 deletions
1
crates/ruff_python_parser/resources/invalid/re_lex_logical_token_mac_eol.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
if call(foo, [a, b def bar(): pass | ||
|
3 changes: 3 additions & 0 deletions
3
crates/ruff_python_parser/resources/invalid/re_lex_logical_token_windows_eol.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
if call(foo, [a, b | ||
def bar(): | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.