Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruff applies auto-fix in files with syntax errors #11455

Closed
Dobatymo opened this issue May 17, 2024 · 13 comments · Fixed by #12134
Closed

Ruff applies auto-fix in files with syntax errors #11455

Dobatymo opened this issue May 17, 2024 · 13 comments · Fixed by #12134
Labels
linter Related to the linter

Comments

@Dobatymo
Copy link

When running ruff format ruff will not modify files if it encounters a SyntaxError (which is expected I think). However running ruff check --fix will encounter the SyntaxError as well, but will continue to "fix" and modify the files. I don't know if this is expected, but I find that behaviour surprising.

  • List of keywords you searched for before creating this issue: fix, syntaxerror

Code snippet:

typedef struct _STORAGE_DEVICE_NUMBER_EX {
    ("PartitionNumber", ULONG),
} STORAGE_DEVICE_NUMBER_EX, *PSTORAGE_DEVICE_NUMBER_EX;

ruff check --fix with remove the ; at the end of the line (which I didn't expect since the file is not actually Python.

Version: ruff 0.4.4 (3e8878a 2024-05-09)

@dhruvmanila
Copy link
Member

Can you provide some more details? How are you running Ruff? Is it from the command-line or is it via an editor? If it is in an editor context, could it be that it's another extension which is removing the semicolon?

@dhruvmanila dhruvmanila added the needs-info More information is needed from the issue author label May 17, 2024
@Dobatymo
Copy link
Author

Dobatymo commented May 17, 2024

Just using the normal command line. Windows 11. I made sure it must be ruff which modifies the file.

error: Failed to parse syntaxerror.py:1:9: Simple statements must be separated by newlines or semicolons
syntaxerror.py:1:9: E999 SyntaxError: Simple statements must be separated by newlines or semicolons
Found 2 errors (1 fixed, 1 remaining).

That's the output.

@Dobatymo Dobatymo changed the title ruff check --fix modifies files with syntaxerror ruff check --fix modifies files with SyntaxError May 17, 2024
@dhruvmanila
Copy link
Member

Ruff uses the file extension to determine whether it's a Python file or not as otherwise it's difficult to reliably determine whether a file contains Python source code or not. It seems that the file name is syntaxerror.py which has the .py extension.

Can you provide us any context as to why do you have a non-Python source code in a Python file?

@Dobatymo
Copy link
Author

I am working on simple C header to Python ctypes translator. And I was simply using ruff format to check if the files I generated are syntactically correct. I know I could have just used the python builtin parser, but having a nicely formatted file when the translation produced a syntactically valid file was a nice side effect.
Then for further investigation if the generated files are correct, I can ruff check --fix instead and was surprised to see that the file was modified even when it wasn't syntactially valid (something ruff format didn't do). So if this is expected behaviour this should be documented, but in my opinion it should not do any modifications (just like ruff format).

@zanieb
Copy link
Member

zanieb commented May 17, 2024

I think we'll move more towards Ruff working on files with syntax errors. I'd recommend either excluding these files from lint checks or selecting E999 to only check for syntax errors.

@charliermarsh
Copy link
Member

I honestly thought we didn't apply fixes when a file contains a syntax error (but looks like I'm wrong). I would be open to changing the behavior, not sure what @dhruvmanila thinks.

@dhruvmanila
Copy link
Member

I think this is the side effect of non-AST based rules which work with tokens, logical lines, etc. And, that Ruff checks all of the tokens up to the first error token. In your case, the code produces a valid token stream and so it checks for violations which uses the token stream. Ruff doesn't know that this isn't a syntactically valid code unless it's processed by the parser.

Then for further investigation if the generated files are correct, I can ruff check --fix instead and was surprised to see that the file was modified even when it wasn't syntactially valid (something ruff format didn't do).

The reason ruff format doesn't do this is because it works with the AST and the code you mentioned is invalid syntactically.

I honestly thought we didn't apply fixes when a file contains a syntax error (but looks like I'm wrong).

This isn't exactly possible today because unless the parser processes the token stream, Ruff can't know whether it contains a syntax error. And, the code provided by the author produces a valid token stream. This will change during this week when we combine the lexing and parsing step and then Ruff can get this knowledge.

I'm not opposed to this idea although this does mean that we'd stop fixing code like the following:

x;

# unterminated f-string
f"hello

In the future, we do want to make Ruff capable of applying a fix even if the source code contains syntax error. This will be done after we expand Ruff's capabilities to allow diagnosing issues in a file containing syntax errors.

td;dr I'm more in favor of documenting this behavior rather than disallowing to generate fixes.

@Dobatymo
Copy link
Author

Better documentation sounds good!

@dhruvmanila dhruvmanila added documentation Improvements or additions to documentation and removed needs-info More information is needed from the issue author labels May 20, 2024
@dhruvmanila
Copy link
Member

This seems like good place for the content to be in: https://docs.astral.sh/ruff/linter/#fixes

@dhruvmanila
Copy link
Member

I think we might have to disable fixes for source code with syntax errors with #11950 because otherwise Ruff could panic or go into infinite auto-fix loop where first auto-fix creates a need for the second auto-fix which then creates a need for the first auto-fix again. This infinite loop can be created quite easily when there's a syntax error. For example, #12094 or between E226 and E201 for (/.

@MichaReiser
Copy link
Member

Disabling fixes for token-based rules makse sense to me, unless we have a way to know whether a token range overlaps with a parser error range.

Long term, I think fixing files with syntax errors would be nice but only if the specific code doesn't overlap with a syntax error.

@dhruvmanila
Copy link
Member

Disabling fixes for token-based rules makse sense to me, unless we have a way to know whether a token range overlaps with a parser error range.

This seems like a good idea although I'm not sure if that could cover all cases. I guess it might be more useful to look at the logical line for token-based rules and the enclosing AST node for AST-based rules.

@dhruvmanila
Copy link
Member

Disabling fixes for token-based rules makse sense to me

I was more thinking of disabling fixes for all rules and not just token-based although it might be ok to keep the fix for line-based rules on.

@dhruvmanila dhruvmanila changed the title ruff check --fix modifies files with SyntaxError Ruff applies auto-fix in files with syntax errors Jul 1, 2024
@dhruvmanila dhruvmanila added linter Related to the linter and removed documentation Improvements or additions to documentation labels Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linter Related to the linter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants