Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow token-based rules to work on source code with syntax errors #11915

Closed
dhruvmanila opened this issue Jun 18, 2024 · 0 comments · Fixed by #11950
Closed

Allow token-based rules to work on source code with syntax errors #11915

dhruvmanila opened this issue Jun 18, 2024 · 0 comments · Fixed by #11950
Assignees
Labels
linter Related to the linter

Comments

@dhruvmanila
Copy link
Member

Currently, the rules which work with the tokens doesn't emit diagnostics after the location of the first syntax error. For example:

foo;

"hello world

bar;

This will only raise the useless-semicolon for foo and not bar because there's an unterminated string literal in between them.

Playground: https://play.ruff.rs/4c17a92c-0189-4b98-b961-27b04db14599

The task here is to disable this limit and allow token-based rules to check all the tokens. This raises a question: Now that the parser can now recover from an unclosed parenthesis (#11845), how to make sure that the rule logic knows about this and has the correct information about the nesting level? Should we reduce the nesting level if we encounter a Newline token?

We also need to make sure that this doesn't panic in any scenarios (valid or invalid source code). This can be done via lots of fuzzing.

@dhruvmanila dhruvmanila added the linter Related to the linter label Jun 18, 2024
@dhruvmanila dhruvmanila self-assigned this Jun 18, 2024
dhruvmanila added a commit that referenced this issue Jul 2, 2024
## Summary

This PR updates the linter, specifically the token-based rules, to work
on the tokens that come after a syntax error.

For context, the token-based rules only diagnose the tokens up to the
first lexical error. This PR builds up an error resilience by
introducing a `TokenIterWithContext` which updates the `nesting` level
and tries to reflect it with what the lexer is seeing. This isn't 100%
accurate because if the parser recovered from an unclosed parenthesis in
the middle of the line, the context won't reduce the nesting level until
it sees the newline token at the end of the line.

resolves: #11915

## Test Plan

* Add test cases for a bunch of rules that are affected by this change.
* Run the fuzzer for a long time, making sure to fix any other bugs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linter Related to the linter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant