Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use correct range to highlight line continuation error #12016

Merged
merged 1 commit into from
Jun 25, 2024

Conversation

dhruvmanila
Copy link
Member

Summary

This PR fixes the range highlighted for the line continuation error.

Previously, it would highlight an incorrect range:

1 | call(a, b, \\\
  |           ^^ Syntax Error: unexpected character after line continuation character
2 | 
3 | def bar():
  |

And now:

  |
1 | call(a, b, \\\
  |             ^ Syntax Error: unexpected character after line continuation character
2 | 
3 | def bar():
  |

This is implemented by avoiding to update the token range for the Unknown token which is emitted when there's a lexical error. Instead, the push_error helper method will be responsible to update the range to the error location.

This actually becomes a requirement which can be seen in follow-up PRs.

Test Plan

Update and validate the snapshot.

Copy link
Contributor

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Comment on lines +150 to +153
// For `Unknown` token, the `push_error` method updates the current range.
if !matches!(self.current_kind, TokenKind::Unknown) {
self.current_range = self.token_range();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate that we now have this branch in the very hot next_token function for the very rare case of an unknown token. But I don't see a better way of solving this that doesn't require a lot of repetition on the token range.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. I thought of doing something like what you've suggested (computing the token range at the place where it's emitted) but wanted to see if this actually impacts the performance. It doesn't seem to be which is why I'm fine moving ahead with this.

@dhruvmanila dhruvmanila merged commit 9c1b6ec into main Jun 25, 2024
20 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/unknown-range branch June 25, 2024 08:05
dhruvmanila added a commit that referenced this pull request Jun 25, 2024
## Summary

This PR updates the unterminated string error range to not include the
final newline character.

This is a follow-up to #12016 and required for #12019

This is not done for when the unterminated string goes till the end of
file (not a newline character). The unterminated f-string range is
correct.

### Why is this required for #12019 ?

Because otherwise the token ranges will overlap. For example:
```py
f"{"
f"{foo!r"
```

Here, the re-lexing logic recovers from an unterminated f-string and
thus emitting a `Newline` token for the one at the end of the first
line. But, currently the `Unknown` and the `Newline` token would overlap
because the `Unknown` token (unterminated string literal) range would
include the newline character.

## Test Plan

Update and validate the snapshot.
dhruvmanila added a commit that referenced this pull request Jun 25, 2024
## Summary

This PR updates the parser test infrastructure to validate the token
ranges.

From the code documentation:
```
/// Verifies that:
/// * the ranges are strictly increasing when loop the tokens in insertion order
/// * all ranges are within the length of the source code
```

Follow-up from #12016 and #12017
resolves: #11938

## Test Plan

Make sure that there are no failures.
dhruvmanila added a commit that referenced this pull request Jun 28, 2024
This PR reverts #12016 with a
small change where the error location points to the continuation
character only. Earlier, it would also highlight the whitespace that
came before it.

The motivation for this change is to avoid panic in
#11950. For example:

```py
\)
```

Playground: https://play.ruff.rs/87711071-1b54-45a3-b45a-81a336a1ea61

The range of `Unknown` token and `Rpar` is the same. Once #11950 is
enabled, the indexer would panic. It won't panic in the stable version
because we stop at the first `Unknown` token.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to the parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants