-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PLW1514: extend or change autofix with encoding="utf-8"
#12069
Comments
I thought I raised something about this already somewhere, but I can't find it anymore. A better autofix imo is As bonus, the preferred encoding could be configurable, so that "locale", "utf-8", or a project's custom needs is used for autofixing. As a sidenote, whilst I agree a move to utf-8 is preferred, that would be an unsafe fix atm as it would change the behaviour and can definitely break code in real-world scenario. |
The entire point of PEP 597 is that "the same behaviour" is garbage.
The problem is that using the locale is generally broken and only sometimes correct, but can be very hard to discover (for western devs anyway), as developers tend to have non-misconfigured system, limit themselves to ASCII data, and have extended-ASCII-codepage systems (e.g. CP437). So while the current fixer "preserves the current behaviour" that doesn't mean it's not broken, the developer might just have not thought about the issue, ran the fixer, and now the broken behaviour gets embedded into the source as if it had been an actual decision rather than something that was not considered. That is exactly what happened to a colleague and why I opened this issue in the first place. As things are, I consider the fixer to be actively detrimental because the only thing it fixes is the warning, not the underlying issue. If "unsafe fixers" are not allowed I do consider this one to be that in both cases, and would vote to see it removed entirely. IO encoding is very much in the realm of developer decision, and the tool pretending it can know that will always fail in some way. |
Unsafe fixes are allowed -- we have a separate opt-in level for them. It seems reasonable to change the fix to |
This rule is worth requiring human intervention anyway, fixing to |
## Summary This is the _intended_ default that PEP 597 _wants_, but it's not backwards compatible. The fix is already unsafe, so it's better for us to recommend the desired and expected behavior. Closes #12069.
#8883 requested that the
open
/read
calls without an explicit encoding be autofixed to use the locale, however the use of the locale is exactly why PEP 597 / EncodingWarning was implemented: in this day and age, and especially when interacting with internal data or software-created files, following the possibly wonky and misconfigured locale of the host machine is actively detrimental. Even more so when accounting for Windows where the locale is essentially always useless (it generally defaults to a single-byte windows codepage, changing the locale to UTF8 -- codepage 65001 -- is possible but uncommon).At https://docs.astral.sh/ruff/rules/unspecified-encoding/ it is claimed that this is the PEP 597 recommendation, but it is the opposite of reality:
Using
encoding=locale.getpreferredencoding(False)
is a forward-compatibility measure, and the newencoding="locale"
is a shorter and less error prone to specify that, but using the locale is not the expected default.As a result, the autofixer for PLW1514 should either:
encoding-"utf-8"
by default, that is more likely to be correct, expected, and desired than notThe text was updated successfully, but these errors were encountered: