-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Massively reduce EncodingWarning
spam in tests
#4234
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Avasam, I would like to thank you for all the contributions you are doing to setuptools
. You have been very nice to the project and put a lot of effort in tackling very hard and time consuming tasks.
My minor comment about this are the following:
-
When dealing with
.pth
files specifically, the advice we got from CPython maintainers is to omit theencoding
parameter and ignore the warning specifically, instead ofgetpreferredencoding
: .pth files cannot contain folders with utf-8 names python/cpython#77102 (comment).I while ago I left a comment in
editable_wheel
to remind myself about that in the future. -
.py
files should always useUTF-8
right? (Because we only support Python3) -
In the
tests
folder, for everything else that is not.pth
or is explicitly trying to test encoding, we probably can riskutf-8
.
Edit: Unless getting the default encoding is extracted to a helper method, so the
I was thinking that too. Some or most tests could probably still be changed to use "utf-8" directly. But we could also just blanket ignore this warning (if that's doable) given the special situation setuptools always has. And it would achieve the same result of improving test results readability.
I have come across python files with non-utf-8 characters in the wild, specifically this one with Windows 1252 chars that I remember tripping up Ruff: >>> open("./Pythonwin/pywin/test/_dbgscript.py").readlines()
['# -*- coding: latin-1 -*-\n', '#\n', '# Script for testing pywin.debugger & interactive exec from test_pywin\n', '\n', '# Umlauts for encoding test: áéúäöü\n', '\n', 'aa = 11\n', 'aa = 22\n', '\n', '\n', 'class CC:\n', ' cc = 44\n', '\n', '\n', 'def ff(bb=55):\n', ' global aa\n', ' aa = 77\n', ' return aa + bb\n', '\n', '\n', 'ff()\n', 'aa = 33\n']
>>> open("./Pythonwin/pywin/test/_dbgscript.py", encoding="utf-8").readlines()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python39\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 129: invalid continuation byte And pywin32 also has these files with a different top-of-the-file encoding declaration: Granted these are Python 2 & Internet Explorer era files, and the one with non-UTF-8 chars is in a test. But apparently they exist? |
EncodingWarning: 'encoding' argument not specified
spam in testsEncodingWarning
spam in tests for Python 3.8 & 3.9
41d7d29
to
a460f2c
Compare
544aed8
to
ae88944
Compare
ae88944
to
b96c050
Compare
All tests pass (except coverage), and you can see using the diffcov test (because they only appear in failing test) that there's 4 EncodingWarning left down from ~124 ! (2 of which are from pip and I believe are happening outside pytest, ref.: pypa/pip#12070) That doesn't mean this is done though, I still need to:
Finally, once I have a full complete solution, this could likely be split up in smaller PRs. For example to start with all changes in tests that certainly won't affect end-users. |
EncodingWarning
spam in tests for Python 3.8 & 3.9EncodingWarning
spam in tests
fca4dc1
to
acd2a0b
Compare
35ac604
to
2bb129d
Compare
2bb129d
to
74c586f
Compare
74c586f
to
015135b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for doing this work. I know it's a lot.
I'd like to explore adopting the future default UTF-8 if possible.
setuptools/compat/encoding.py
Outdated
from typing import Optional | ||
|
||
|
||
locale_encoding = "locale" if sys.version_info >= (3, 10) else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my understanding that encoding='locale'
is essentially locking in the legacy behavior, which is UTF-8 on most systems, but cp1252 or similar on Windows. I believe the intention of the EncodingWarnings is to eventually default to UTF-8 regardless of the platform or environment. It would be my preference too if Setuptools could adopt UTF-8 and avoid the platform variance or at the very least minimize the variance to places where it's absolutely necessary.
When I made similar changes to distutils (pypa/distutils#232), I simply hard-coded "utf-8".
I still need to absorb the discussion, which I haven't the time right now, but I wanted to get this review in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding of encoding='locale'
seems to align with mine. As a point of precision for this PR's goal: It's not to be a long-term fix that is forward-compatible with Python 3.15 (current ETA of UTF-8 as default), but rather just to stop the warning spam whilst keeping the current behaviour + backwards compatibility (no possible breaking change) and keeping it easy to find where the encoding is set explicitly for an eventual update to UTF-8.
I'd like to explore adopting the future default UTF-8 if possible.
It would be my preference too if Setuptools could adopt UTF-8 and avoid the platform variance or at the very least minimize the variance to places where it's absolutely necessary.
As far as going full utf-8 goes, @abravalheri suggested starting with tests, where we couldn't directly affect any user of setuptools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is to suppress the warnings, I would simply suppress them in the pytest.ini rather than coding in the less desirable behavior. At the very least, if this code is undesirable/transitional, I'd like to see a comment (probably linked to an open issue) indicating that there's anticipated work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered pretty late into doing this how the pytest.ini
file could be used to suppress these warnings. If you're fine suppressing them there, with an appropriate comment, to make current failed tests more readable, then I'll try to make it work that way instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this #4255 ?
If so, I'll immediately close this PR in favor of that one.
In #4261, I am testing out |
As mentioned in #4255 (comment) , the vast majority of the spam has been greatly reduced. And about half of what's left should be fixed upstream (and can be ignored by that PR). Let's try to use UTF-8 where applicable for non-test changes. |
Summary of changes
Explicitly addencoding=locale.getpreferredencoding(False)
to files opened as string. Starting Python 3.10 this can becomeencoding="locale"
Done by runningruff check . --select=PLW1514 --fix --unsafe-fixes
(no manual code change)I also added the check to Ruff configEdit: This was changed by a constant that's
"locale"
when supported.PLW1514
also wasn't added to the Ruff config because it requires the preview flag which itself causes new lint errors unrelated to this PRI didn't go for "utf-8" because it could be a breaking change for some projects on Windows.
Works toward #3810 (comment) (I don't think it closes, idk if there's more warnings, probably from vendored packages)
Pull Request Checklist
newsfragments/
.(See documentation for details)