Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-92613: Deprecate other uuencode functionality per PEP 594 & document as such #92758

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

CAM-Gerlach
Copy link
Member

@CAM-Gerlach CAM-Gerlach commented May 13, 2022

Per #92613 , for the following functionality deprecated by PEP 594 (PEP-594) but not formally implemented due to an oversight:

  • Low-level interfaces binascii.a2b_uu and binascii.b2a_uu for decoding and encoding uuencode data
  • The uu_codec binary transform in the codecs module, implementing uuencode as a Python codec
  • Support for decoding non-MIME uuencode payloads with the get_payload(decode=True) method of the legacy email.message.Message (compat32) API

This PR adds missing:

  • DeprecationWarnings in the Python/C code
  • Warning checks and handling in the tests
  • Deprecation notices in the docs, and
  • A descriptive entry in the What's New for 3.11

Fixes #92613

@CAM-Gerlach CAM-Gerlach requested a review from a team as a code owner May 13, 2022 03:20
@CAM-Gerlach CAM-Gerlach added docs Documentation in the Doc dir stdlib Python modules in the Lib dir 3.11 only security fixes needs backport to 3.11 only security fixes labels May 13, 2022
@CAM-Gerlach CAM-Gerlach requested a review from brettcannon May 13, 2022 05:45
@CAM-Gerlach CAM-Gerlach requested review from warsaw and pablogsal May 13, 2022 21:56
@CAM-Gerlach
Copy link
Member Author

@warsaw , I've tagged you on behalf of the email team for review of this PR's approach to handling the deprecation of uuencode in the legacy email.message.Message compat32 API, and @pablogsal , I've tagged you as release manager for 3.11, to review adding necessary deprecation messages to a couple final uuencode-related items that were explicitly slated by PEP 594 to be deprecated in 3.11 and to be removed in 3.13, and not yet formally documented as such (both per the recommendation of @brettcannon).

@CAM-Gerlach
Copy link
Member Author

@warsaw @python/email-team Any chance you could take a look at this?

To provide some more detail about the potential for a backward-compat impact of the email change, it only affects one specific method of the legacy email.message.Message object using the old compat32 policy, only when the non-default decode=True argument is passed (which was dropped in the modern email.message.EmailMessage API), and then only when the message body contains an attachment encoded with the obsolete, non-MIME uuencode codec. In this case, a clear DeprecationWarning is raised, so if this is an actual issue affecting modern usage, it can either be fixed, or we'll hear something about it over the next two and a half years and decide whether the deprecation should be postponed or reverted.

Once the actual removal occurs, get_payload would either pass the payload back without uudecoding, as it currently does for other unrecognized, obsolete or non-standard encodings, or could raise a warning or error if a uuencode content type was detected, for additional visibility.

Doing a bit more research, a grep.app search for .get_payload(decode=True) on grep.app returns 400 total hits. However, examining the first 40 results by hand, 24 were vendered/downstream copies of the email module and its test suite, 8 were tutorials or example code and 1 was Python 2-only, leaving only 7 that could theoretically be affected (extrapolated out to ≈70 total).

Furthermore, of those, nearly all are appear to be oriented toward processing new email, which is very unlikely to be receiving attachments encoded in a 1980s-vintage encoding—and in case they do, the uuencoded text will still be preserved, so they can just paste it into a uuencode decode website or program, which IIUC is exactly what people did back in the days when uuencode was still relevant; before MIME replaced it, there was no such thing as "attachments", and embedding uuencode data into MIME messages has always been non-standard.

@CAM-Gerlach CAM-Gerlach force-pushed the deprecate-uu-codec branch 2 times, most recently from 6a45435 to 9cc18b3 Compare June 25, 2022 03:01
@warsaw
Copy link
Member

warsaw commented Jun 26, 2022

Furthermore, of those, nearly all are appear to be oriented toward processing new email, which is very unlikely to be receiving attachments encoded in a 1980s-vintage encoding

This invokes the use case I'm worried about. Processing new mail as you say shouldn't use this old encoding, but processing old emails in archives could be broken by removing this. This could happen in e.g. Mailman if you had really old archives, you wouldn't be able to parse their attachments. OTOH, maybe I'm worrying too much about that. @maxking or @msapiro can you chime in?

@warsaw
Copy link
Member

warsaw commented Jun 26, 2022

The other thought is whether it's worth it to inline the uu functions in the email package, in the same way as was done for determining the subtype for audio and image data. It might be too much baggage to keep carrying around, even if made private.

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Jun 26, 2022

Ah. good point—in my grep.app searching, the great majority of the use cases I found were processing new (ish) email, as opposed to email archives where this could theoretically be a problem, but I didn't think about Mailman:

  • In a search for get_payload in Mailman Core, only one instance used decode=True, and that was in the test suite for an example (standard MIME) message, which would not be affected. Otherwise, all instances did not pass decode=True so the affected code path could not be invoked.
  • HyperKitty had three instances, none of which used decode=True
  • django-mailman3 had one hit for get_payload, which did not use decode=True.
  • Postorius, MailmanClient, Mailman bundler and mailman-hyperkitty had no usages, as expected. I didn't see anything else that looked particularly likely or critical.

So, unless there's something I missed, it looks like Mailman is not affected—since it is a very specific code path that needs to be hit to even have the possibility to trigger the deprecated behavior. If there are other users, we should hopefully discover them during the >two year period when DeprecationWarnings will be issued. However, if you feel its best to inline the functionality (which we can always do at any point prior to the final removal, if it has significant real-world impact, I'd be happy to help with that.

@msapiro
Copy link
Contributor

msapiro commented Jun 26, 2022

@warsaw wrote:

This could happen in e.g. Mailman if you had really old archives, you wouldn't be able to parse their attachments. OTOH, maybe I'm worrying too much about that. @maxking or @msapiro can you chime in?

It seems to me that the only place where this might be an issue is in hyperkitty_import of an old mbox, but hyprerkitty_import does not use a compat32 policy. It retrieves messages as raw bytes from the mbox and parses them into a message object with message_from_bytes(msg_raw, policy=policy.default). I can't think of anywhere else where we're dealing with old mboxes that might have non-MIME uuencoded data.

@brettcannon
Copy link
Member

So are we chopping this out in 3.11 (if we still can), taking it out in 3.12, or leaving it behind?

@CAM-Gerlach CAM-Gerlach removed 3.11 only security fixes needs backport to 3.11 only security fixes labels Aug 5, 2022
@CAM-Gerlach CAM-Gerlach removed the request for review from pablogsal August 5, 2022 23:52
@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Aug 5, 2022

I assume its too late for 3.11 now that rc1 is almost upon us (especially given how things have gone), so I imagine a push to 3.12 (with removal in 3.14) is the only practical solution.

Per my discussions with @warsaw here and on Discord as well as the results of several rounds of analysis of existing code bases, the backward compat impact appears to be very minimal at best, and the deprecation period would give us time to inline the relevant code or revert specific parts of the change in case it was a significant enough real-world issue.

Assuming we agree on moving forward, I can bump it to the 3.12 What's New instead and update the various bits that reference the versions accordingly.

@brettcannon
Copy link
Member

@CAM-Gerlach yep, it's too late for 3.11.

@CAM-Gerlach
Copy link
Member Author

I can bump it to the 3.12 What's New instead and update the various bits that reference the versions accordingly.

I've pushed a commit updating the various deprecating warnings, documentation messages, NEWS item and What's New content to target 3.12 as the deprecation version and 3.14 as the removal version.

@CAM-Gerlach
Copy link
Member Author

I re-rebased this again to fix the merge conflict and update to the latest main (since I'd already rebased it before for the same reason, and there was no outstanding review activity since then, so it wasn't going to do any further harm—per this repo's standard workflow, I now avoid rebasing and rebase-updates on PRs once they have been reviewed, absent special circumstances, but the previous rebase was before I'd adapted to this).

@brettcannon @warsaw would you like to review this, so we can start evaluating any potential impact of these deprecations as early as practical (if needed, I can undeprecate or revise the deprecation strategy here based on feedback, either before or after the 3.12 release). Thanks!

Lib/email/message.py Outdated Show resolved Hide resolved
@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Nov 8, 2022

Sorry for the force-push; in the last commit I accidentally committed several completely unrelated local test scripts that were untracked in my working tree, so I had to prune those out of the history.

@brettcannon
Copy link
Member

I consider this primarily an email team call at this point.

@ericvsmith
Copy link
Member

See issue #100308 for a non-email use of uuencoding. This usage is for a newly generated file, and is a non-email example.

@CAM-Gerlach
Copy link
Member Author

While it doesn't necessarily capture proprietary-only uses, a grep.app search for the binascii (a2b|b2a)_uu functions which searches all of GitHub only found 76 hits, of which all but ≈12 were just vendored copies of the stdlib, and at least half of the remaining were in tests, of stdlib or third-party code (and one was for EDGAR).

Also, for reference, a grep.app search for the already-deprecated import uu, there are 46 hits for import uu (plus 1 for from uu), of which only ≈7 are not vendored copies of the stdlib, and most are test code (and 1 for EDGAR). In general, these numbers are much lower than the 100s+ of uses of other modules deprecated by PEP 594). By comparison, there are around 25 000 + 6000 hits for import base64 + from base64.

If the functionality is still useful, you could consider maintaining a PyPI package with it, which would give you much more flexibility to fix issues like the one you mentioned, and deploy fixes and enhancements much more quickly than with a whole new Python feature version. I'd be willing to help advise you on what code would need to be moved, and how you might most easily structure it, as well as suggest migration advice so that existing use cases can move to it seamlessly with minimal or no changes.

@arhadthedev
Copy link
Member

.. deprecated-removed:: 3.12 3.14

We need to decide on this PR and its parent issue because 3.12 release is approaching.

@vstinner
Copy link
Member

All modules scheduled for removal by PEP 594 have been removed in Python 3.12 and Python 3.13. The last batch was done in #104773 especially the uu module.

@CAM-Gerlach: Would you mind to solve the conflicts on your PR?

@vstinner
Copy link
Member

We need to decide on this PR and its parent issue because 3.12 release is approaching.

The main branch is now Python 3.13. Honestly, IMO it's ok to keep this code a little bit longer, it has been there for maybe 30 years. 1 or 2 more Python releases is fine. At the same time, I'm also fine with deprecating it in Python 3.12 beta2, but you might ask to Release Manager to validate that.

@vstinner
Copy link
Member

Oh. I wasn't aware that the email package was using the binascii.a2b_uu() function. Is it really important to remove that feature of the email module? Maybe we can only deprecate binascii.b2a_uu() and keep binascii.a2b_uu()?

@vstinner
Copy link
Member

vstinner commented May 26, 2023

While the uuencode format may no longer be used in the wild today, using the email module to parse of old archives old mailing lists or old Maildir files remain an interesting use case, no?

binascii_a2b_uu() is about 86 lines of C code, whereas Lib/uu.py was about 216 lines of Python code. The uu module removal reduced the Python maintenance burden. But maybe here is balance is more towards supporting reading old email backups?

@CAM-Gerlach
Copy link
Member Author

We need to decide on this PR and its parent issue because 3.12 release is approaching.

That all should be updated to 3.13 -> 3.15 if we decide we want to do this.

@CAM-Gerlach: Would you mind to solve the conflicts on your PR?

I would, yeah, and update the various versions that need to be bumped again, but I'd like to hear at least a +0.5 from @warsaw or the @python/email-team before going ahead with all of it.

At the same time, I'm also fine with deprecating it in Python 3.12 beta2, but you might ask to Release Manager to validate that.

I thought it was policy to not introduce new deprecations during the beta period, but maybe I'm wrong here? IMO, if we're going to do this (which I still think we should), it would be best to push it to Python 3.13 given we want to give as much time as practical for users to report any issues which might cause a reconsideration of the various deprecations/planned removals (and conversely, makes a lot more comfortable going ahead with this now despite a few potential corner cases where this might still be used).

Oh. I wasn't aware that the email package was using the binascii.a2b_uu() function. Is it really important to remove that feature of the email module? Maybe we can only deprecate binascii.b2a_uu() and keep binascii.a2b_uu()?

See the extensive discussion above—it's only actually ever used if a non-default value is passed to an optional parameter of one method of one class in the legacy email interface, and would only cause a non-fatal degradation in content for extremely old messages (uuencded attachments would not be decoded automatically). IMO, I'm not sure its worth just removing one of the two parallel methods just for that—is there a precedent? If there is significant usage, they will get a clear deprecation warning, and if it is an important-enough problem, we can just revert that part of the deprecation.

However, if there isn't the risk tolerance for that, then it seems we may as well just leave both.

While the uuencode format may no longer be used in the wild today, using the email module to parse of old archives old mailing lists or old Maildir files remain an interesting use case, no?

Yeah, that's something we looked carefully at, but Mailman, Hyperkitty, Postorios, and related mailing list tools had zero affected usages and of the few usages in the wild of that particular legacy method, only a very small number could even theoretically be affected, and all those we found were oriented toward processing new email rather than archives. And the worst that would happen is those non-standard "attachments" would be displayed as plain text and decoded manually, just like people used to have to do back in uuencode's day.

And again, if this is a significant concern based on response to the deprecation warnings (and we could perhaps even roll out user-visible warnings if necessary), we could either provide clear guidance to putting the code on PyPI, postpone the removal, or un-deprecate that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes docs Documentation in the Doc dir stdlib Python modules in the Lib dir topic-email
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

PEP 594: uuencode codec and binascii functions apparently not properly deprecated or documented as such
10 participants