Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLW1514: extend or change autofix with encoding="utf-8" #12069

Closed
xmo-odoo opened this issue Jun 27, 2024 · 4 comments · Fixed by #12370
Closed

PLW1514: extend or change autofix with encoding="utf-8" #12069

xmo-odoo opened this issue Jun 27, 2024 · 4 comments · Fixed by #12370
Assignees
Labels
fixes Related to suggested fixes for violations good first issue Good for newcomers help wanted Contributions especially welcome rule Implementing or modifying a lint rule

Comments

@xmo-odoo
Copy link

#8883 requested that the open/read calls without an explicit encoding be autofixed to use the locale, however the use of the locale is exactly why PEP 597 / EncodingWarning was implemented: in this day and age, and especially when interacting with internal data or software-created files, following the possibly wonky and misconfigured locale of the host machine is actively detrimental. Even more so when accounting for Windows where the locale is essentially always useless (it generally defaults to a single-byte windows codepage, changing the locale to UTF8 -- codepage 65001 -- is possible but uncommon).

At https://docs.astral.sh/ruff/rules/unspecified-encoding/ it is claimed that this is the PEP 597 recommendation, but it is the opposite of reality:

How to Teach This

For new users

Since EncodingWarning is used to write cross-platform code, there is no need to teach it to new users.

We can just recommend using UTF-8 for text files and using encoding="utf-8" when opening them.

Using encoding=locale.getpreferredencoding(False) is a forward-compatibility measure, and the new encoding="locale" is a shorter and less error prone to specify that, but using the locale is not the expected default.

As a result, the autofixer for PLW1514 should either:

  • autofix to encoding-"utf-8" by default, that is more likely to be correct, expected, and desired than not
  • if supported, provide a second(ary) autofix to use the locale instead
@AlexWaygood AlexWaygood added rule Implementing or modifying a lint rule fixes Related to suggested fixes for violations labels Jun 27, 2024
@Avasam
Copy link
Contributor

Avasam commented Jul 2, 2024

I thought I raised something about this already somewhere, but I can't find it anymore.
Even though I originally suggested the autofix for this rule (#8883) based on official doc. After trying it out in setuptools, I noticed that this autofix doesn't help, because locale.getpreferredencoding(False) will itself lead to a warning !

A better autofix imo is encoding="locale" if sys.version_info >= (3, 10) else None which does successfully keep the same behaviour, removes the runtime warnings and is easy to search7replace to "utf-8" in a manual pass.

As bonus, the preferred encoding could be configurable, so that "locale", "utf-8", or a project's custom needs is used for autofixing.

As a sidenote, whilst I agree a move to utf-8 is preferred, that would be an unsafe fix atm as it would change the behaviour and can definitely break code in real-world scenario.

@xmo-odoo
Copy link
Author

xmo-odoo commented Jul 2, 2024

A better autofix imo is encoding="locale" if sys.version_info >= (3, 10) else None which does successfully keep the same behaviour

The entire point of PEP 597 is that "the same behaviour" is garbage.

As a sidenote, whilst I agree a move to utf-8 is preferred, that would be an unsafe fix atm as it would change the behaviour and can definitely break code in real-world scenario.

The problem is that using the locale is generally broken and only sometimes correct, but can be very hard to discover (for western devs anyway), as developers tend to have non-misconfigured system, limit themselves to ASCII data, and have extended-ASCII-codepage systems (e.g. CP437).

So while the current fixer "preserves the current behaviour" that doesn't mean it's not broken, the developer might just have not thought about the issue, ran the fixer, and now the broken behaviour gets embedded into the source as if it had been an actual decision rather than something that was not considered. That is exactly what happened to a colleague and why I opened this issue in the first place.

As things are, I consider the fixer to be actively detrimental because the only thing it fixes is the warning, not the underlying issue. If "unsafe fixers" are not allowed I do consider this one to be that in both cases, and would vote to see it removed entirely. IO encoding is very much in the realm of developer decision, and the tool pretending it can know that will always fail in some way.

@charliermarsh
Copy link
Member

Unsafe fixes are allowed -- we have a separate opt-in level for them. It seems reasonable to change the fix to encoding-"utf-8" as you suggested above.

@Avasam
Copy link
Contributor

Avasam commented Jul 17, 2024

This rule is worth requiring human intervention anyway, fixing to encoding-"utf-8" would end up simplifying the fixer too (no version check, conditionals, or imports)

@charliermarsh charliermarsh added good first issue Good for newcomers help wanted Contributions especially welcome labels Jul 17, 2024
@charliermarsh charliermarsh self-assigned this Jul 17, 2024
charliermarsh added a commit that referenced this issue Jul 17, 2024
## Summary

This is the _intended_ default that PEP 597 _wants_, but it's not
backwards compatible. The fix is already unsafe, so it's better for us
to recommend the desired and expected behavior.

Closes #12069.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixes Related to suggested fixes for violations good first issue Good for newcomers help wanted Contributions especially welcome rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants