Massively reduce `EncodingWarning` spam in tests #4234

Avasam · 2024-02-23T02:29:33Z

Summary of changes

~~Explicitly add encoding=locale.getpreferredencoding(False) to files opened as string. Starting Python 3.10 this can become encoding="locale"~~
~~Done by running ruff check . --select=PLW1514 --fix --unsafe-fixes (no manual code change)~~
~~I also added the check to Ruff config~~
Edit: This was changed by a constant that's "locale" when supported. PLW1514 also wasn't added to the Ruff config because it requires the preview flag which itself causes new lint errors unrelated to this PR

I didn't go for "utf-8" because it could be a breaking change for some projects on Windows.

Works toward #3810 (comment) (I don't think it closes, idk if there's more warnings, probably from vendored packages)

Pull Request Checklist

Changes have tests (all existing tests should pass, but with a lot less warnings now)
News fragment added in newsfragments/.
(See documentation for details)

abravalheri

Hi @Avasam, I would like to thank you for all the contributions you are doing to setuptools. You have been very nice to the project and put a lot of effort in tackling very hard and time consuming tasks.

My minor comment about this are the following:

When dealing with .pth files specifically, the advice we got from CPython maintainers is to omit the encoding parameter and ignore the warning specifically, instead of getpreferredencoding: .pth files cannot contain folders with utf-8 names python/cpython#77102 (comment).

I while ago I left a comment in editable_wheel to remind myself about that in the future.
.py files should always use UTF-8 right? (Because we only support Python3)
In the tests folder, for everything else that is not .pth or is explicitly trying to test encoding, we probably can risk utf-8.

Avasam · 2024-02-23T23:07:13Z

Alright so doing this made me notice another issue that leads me to believe we can't completely hide these warnings in a non-breaking way to stop to spam in tests (until Python 3.9 support is dropped so encoding="locale" can be used everywhere it should):

In diffcov (which runs Python 3.12), the warning is simply changed to EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. But since I'm only using getpreferredencoding for open, may as well use encoding="locale" at that point. (both available at Python 3.11). Which defeats the purpose of my attempted changes here.

Edit: Unless getting the default encoding is extracted to a helper method, so the getpreferredencoding warning appears only a few times.

In the tests folder, for everything else that is not .pth or is explicitly trying to test encoding, we probably can risk utf-8.

I was thinking that too. Some or most tests could probably still be changed to use "utf-8" directly.

But we could also just blanket ignore this warning (if that's doable) given the special situation setuptools always has. And it would achieve the same result of improving test results readability.

.py files should always use UTF-8 right? (Because we only support Python3)

I have come across python files with non-utf-8 characters in the wild, specifically this one with Windows 1252 chars that I remember tripping up Ruff:
https://github.com/mhammond/pywin32/blob/main/Pythonwin/pywin/test/_dbgscript.py

>>> open("./Pythonwin/pywin/test/_dbgscript.py").readlines()
['# -*- coding: latin-1 -*-\n', '#\n', '# Script for testing pywin.debugger & interactive exec from test_pywin\n', '\n', '# Umlauts for encoding test: áéúäöü\n', '\n', 'aa = 11\n', 'aa = 22\n', '\n', '\n', 'class CC:\n', '    cc = 44\n', '\n', '\n', 'def ff(bb=55):\n', '    global aa\n', '    aa = 77\n', '    return aa + bb\n', '\n', '\n', 'ff()\n', 'aa = 33\n']
>>> open("./Pythonwin/pywin/test/_dbgscript.py", encoding="utf-8").readlines()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python39\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 129: invalid continuation byte

And pywin32 also has these files with a different top-of-the-file encoding declaration:
https://github.com/mhammond/pywin32/blob/main/com/win32com/demos/iebutton.py
https://github.com/mhammond/pywin32/blob/main/com/win32com/demos/ietoolbar.py

Granted these are Python 2 & Internet Explorer era files, and the one with non-UTF-8 chars is in a test. But apparently they exist?

Avasam · 2024-02-24T06:12:18Z

All tests pass (except coverage), and you can see using the diffcov test (because they only appear in failing test) that there's 4 EncodingWarning left down from ~124 ! (2 of which are from pip and I believe are happening outside pytest, ref.: pypa/pip#12070)

That doesn't mean this is done though, I still need to:

Validate with you, the maintainers, whether the name and location of the new utilities is good
~~2. Manually search the code for .pth to try to set the encoding explicitly to None (unless you tell me that passing tests is enough)~~
~~3. Should I use encoding_for_open_for_mode everywhere where the mode argument to open is dynamic and could potentially be b (bytes read/write)? Rn I only added it where tests fail.~~

Finally, once I have a full complete solution, this could likely be split up in smaller PRs. For example to start with all changes in tests that certainly won't affect end-users.

jaraco

Thanks so much for doing this work. I know it's a lot.

I'd like to explore adopting the future default UTF-8 if possible.

pytest.ini

jaraco · 2024-03-03T11:54:01Z

setuptools/compat/encoding.py

+from typing import Optional
+
+
+locale_encoding = "locale" if sys.version_info >= (3, 10) else None


It's my understanding that encoding='locale' is essentially locking in the legacy behavior, which is UTF-8 on most systems, but cp1252 or similar on Windows. I believe the intention of the EncodingWarnings is to eventually default to UTF-8 regardless of the platform or environment. It would be my preference too if Setuptools could adopt UTF-8 and avoid the platform variance or at the very least minimize the variance to places where it's absolutely necessary.
When I made similar changes to distutils (pypa/distutils#232), I simply hard-coded "utf-8".
I still need to absorb the discussion, which I haven't the time right now, but I wanted to get this review in.

Your understanding of encoding='locale' seems to align with mine. As a point of precision for this PR's goal: It's not to be a long-term fix that is forward-compatible with Python 3.15 (current ETA of UTF-8 as default), but rather just to stop the warning spam whilst keeping the current behaviour + backwards compatibility (no possible breaking change) and keeping it easy to find where the encoding is set explicitly for an eventual update to UTF-8.

I'd like to explore adopting the future default UTF-8 if possible.
It would be my preference too if Setuptools could adopt UTF-8 and avoid the platform variance or at the very least minimize the variance to places where it's absolutely necessary.

As far as going full utf-8 goes, @abravalheri suggested starting with tests, where we couldn't directly affect any user of setuptools.

If the goal is to suppress the warnings, I would simply suppress them in the pytest.ini rather than coding in the less desirable behavior. At the very least, if this code is undesirable/transitional, I'd like to see a comment (probably linked to an open issue) indicating that there's anticipated work.

I discovered pretty late into doing this how the pytest.ini file could be used to suppress these warnings. If you're fine suppressing them there, with an appropriate comment, to make current failed tests more readable, then I'll try to make it work that way instead.

Something like this #4255 ?
If so, I'll immediately close this PR in favor of that one.

…ngWarning

abravalheri · 2024-03-06T20:02:01Z

In #4261, I am testing out "utf-8" for setuptools/tests discussed in #4234 (review).

…ngWarning

Avasam · 2024-03-09T00:11:33Z

As mentioned in #4255 (comment) , the vast majority of the spam has been greatly reduced. And about half of what's left should be fixed upstream (and can be ignored by that PR). Let's try to use UTF-8 where applicable for non-test changes.

abravalheri reviewed Feb 23, 2024

View reviewed changes

Avasam changed the title ~~Fix EncodingWarning: 'encoding' argument not specified spam in tests~~ Fix EncodingWarning spam in tests for Python 3.8 & 3.9 Feb 23, 2024

Avasam force-pushed the EncodingWarning branch 5 times, most recently from 41d7d29 to a460f2c Compare February 24, 2024 04:36

Avasam added 3 commits February 23, 2024 23:43

Run ruff check . --select=PLW1514 --fix --unsafe-fixes --preview

e9a428c

Fix test asserted arguments

b8dd8b0

centralize encoding_for_open

e8552eb

Avasam force-pushed the EncodingWarning branch 4 times, most recently from 544aed8 to ae88944 Compare February 24, 2024 05:43

Avasam marked this pull request as draft February 24, 2024 05:55

Avasam force-pushed the EncodingWarning branch from ae88944 to b96c050 Compare February 24, 2024 05:58

Avasam changed the title ~~Fix EncodingWarning spam in tests for Python 3.8 & 3.9~~ Massively reduce EncodingWarning spam in tests Feb 24, 2024

Avasam force-pushed the EncodingWarning branch 6 times, most recently from fca4dc1 to acd2a0b Compare February 24, 2024 18:18

Avasam added 2 commits February 24, 2024 13:26

Resolve more EncodingWarning not caught by Ruff

b52a55c

Check for mode in some methods

b5aef75

Avasam force-pushed the EncodingWarning branch 3 times, most recently from 35ac604 to 2bb129d Compare February 24, 2024 19:06

Avasam force-pushed the EncodingWarning branch from 2bb129d to 74c586f Compare February 24, 2024 19:42

Restricting EncodingWarning ignores in pytest.ini

015135b

Avasam force-pushed the EncodingWarning branch from 74c586f to 015135b Compare February 24, 2024 19:59

Avasam added 3 commits February 25, 2024 13:59

Simplify changes

f7ff8f9

Use locale_encoding_for_mode where mode could be binary + some cleanup

29b1867

Merge branch 'main' into EncodingWarning

8e320ee

Avasam marked this pull request as ready for review February 25, 2024 19:10

jaraco requested changes Mar 3, 2024

View reviewed changes

Avasam added 2 commits March 3, 2024 17:04

Merge branch 'main' of https://github.com/pypa/setuptools into Encodi…

c2af5f8

…ngWarning

Move down pypa/distutils suppressions and add PR number

8a99cfb

Avasam mentioned this pull request Mar 4, 2024

Turn EncodingWarning into errors and cleanup pytest.ini #4255

Merged

2 tasks

abravalheri mentioned this pull request Mar 6, 2024

Attempt on improving encoding in setuptools.tests #4261

Merged

2 tasks

Avasam marked this pull request as draft March 7, 2024 17:23

Avasam added 4 commits March 7, 2024 12:25

Merge branch 'main' of https://github.com/pypa/setuptools into Encodi…

0833a6c

…ngWarning

Merge branch 'main' of https://github.com/pypa/setuptools into Encodi…

d0720d0

…ngWarning

use compat.py39

b1a79d4

Reduce empty lines changes

538bfcc

Avasam closed this Mar 9, 2024

Avasam deleted the EncodingWarning branch May 9, 2024 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massively reduce `EncodingWarning` spam in tests #4234

Massively reduce `EncodingWarning` spam in tests #4234

Avasam commented Feb 23, 2024 •

edited

Loading

abravalheri left a comment •

edited

Loading

Avasam commented Feb 23, 2024 •

edited

Loading

Avasam commented Feb 24, 2024 •

edited

Loading

jaraco left a comment

jaraco Mar 3, 2024

Avasam Mar 3, 2024 •

edited

Loading

jaraco Mar 4, 2024

Avasam Mar 4, 2024

Avasam Mar 4, 2024

abravalheri commented Mar 6, 2024

Avasam commented Mar 9, 2024

		from typing import Optional


		locale_encoding = "locale" if sys.version_info >= (3, 10) else None

Massively reduce EncodingWarning spam in tests #4234

Massively reduce EncodingWarning spam in tests #4234

Conversation

Avasam commented Feb 23, 2024 • edited Loading

Summary of changes

Pull Request Checklist

abravalheri left a comment • edited Loading

Choose a reason for hiding this comment

Avasam commented Feb 23, 2024 • edited Loading

Avasam commented Feb 24, 2024 • edited Loading

jaraco left a comment

Choose a reason for hiding this comment

jaraco Mar 3, 2024

Choose a reason for hiding this comment

Avasam Mar 3, 2024 • edited Loading

Choose a reason for hiding this comment

jaraco Mar 4, 2024

Choose a reason for hiding this comment

Avasam Mar 4, 2024

Choose a reason for hiding this comment

Avasam Mar 4, 2024

Choose a reason for hiding this comment

abravalheri commented Mar 6, 2024

Avasam commented Mar 9, 2024

Massively reduce `EncodingWarning` spam in tests #4234

Massively reduce `EncodingWarning` spam in tests #4234

Avasam commented Feb 23, 2024 •

edited

Loading

abravalheri left a comment •

edited

Loading

Avasam commented Feb 23, 2024 •

edited

Loading

Avasam commented Feb 24, 2024 •

edited

Loading

Avasam Mar 3, 2024 •

edited

Loading