Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Use Literal for compression in zipfile (#9346)" #9367

Merged
merged 1 commit into from
Dec 16, 2022

Conversation

JukkaL
Copy link
Contributor

@JukkaL JukkaL commented Dec 15, 2022

This reverts commit 034cfab.

The commit exposed the value of the compression constants, which seem to be implementation details, in the public API. I don't see anything in the documentation about the values of the constants such as ZIP_STORED:
https://docs.python.org/3/library/zipfile.html?highlight=zipfile#zipfile.ZipFile

Example where this makes a difference:

from typing import Literal
import zipfile

def f1(p: str, compression: int) -> None:
    # Error: compression should have the type Literal[0, 8, 12, 14]
    zipfile.ZipFile(p, compression=compression)

def f2(p: str, compression: Literal[0, 8, 12, 14]) -> None:
    # Works, but cryptic and exposes internal implementation details
    zipfile.ZipFile(p, compression=compression)

The integer values of named constants need to be explicitly specified if somebody wants to wrap zipfipe.ZipFile, which arguably exposes implementation details in a problematic fashion.

Here is a real-world example where this caused a regression: https://github.com/pytorch/vision/blob/main/torchvision/datasets/utils.py#L301

This reverts commit 034cfab.

The commit exposed the value of the compression constants, which seem
to be implementation details, in the public API. I don't see anything
in the documentation about the values of the constants such as
`ZIP_STORED`:
https://docs.python.org/3/library/zipfile.html?highlight=zipfile#zipfile.ZipFile

Example where this makes a difference:
```
from typing import Literal
import zipfile

def f1(p: str, compression: int) -> None:
    """Error: compression should have the type Literal[0, 8, 12, 14]"""
    zipfile.ZipFile(p, compression=compression)

def f2(p: str, compression: Literal[0, 8, 12, 14]) -> None:
    """Works, but cryptic and exposes internal implementation details"""
    zipfile.ZipFile(p, compression=compression)
```

The values are of constants need to be explicitly specified if somebody
wants to wrap `zipfipe.ZipFile`, which arguably exposes implementation
details in a problematic fashion.

Here is a real-world example where this caused a regression:
https://github.com/pytorch/vision/blob/main/torchvision/datasets/utils.py#L301
@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

vision (https://github.com/pytorch/vision)
- torchvision/datasets/utils.py:301: error: Argument "compression" to "ZipFile" has incompatible type "int"; expected "Literal[0, 8, 12, 14]"  [arg-type]

jinja (https://github.com/pallets/jinja)
- src/jinja2/environment.py:864: error: Argument 3 to "ZipFile" has incompatible type "int"; expected "Literal[0, 8, 12, 14]"  [arg-type]

@JukkaL
Copy link
Contributor Author

JukkaL commented Dec 15, 2022

The test failure happens in test setup phase, and apparently it's been happening before (#9364) so it's likely unrelated to this PR.

@AlexWaygood
Copy link
Member

Cc. @Viicos

@srittau
Copy link
Collaborator

srittau commented Dec 15, 2022

We do provide the zipfile._CompressionMode type alias, which hides the implementation detail. Unfortunately this requires a type-only import, but I still prefer this for the additional type safety it provides. I would argue that the overhead of using the type-only import is for the more uncommon case of wrapping the function is justified to prevent wrong integers being passed in the more common case of calling zipfile.

@JukkaL
Copy link
Contributor Author

JukkaL commented Dec 15, 2022

I don't think that a type-only import is a good workaround, since it's inherently poorly discoverable -- it's not documented in the module documentation, and mypy error messages give no hint of it. The user would have to look at the stub file to figure out the existence of the alias, and since it has an underscore prefix, I bet many users will think that is is some internal thing that could be removed or renamed without a deprecation period.

Also I'd consider type-only imports from stubs an advanced feature that we should avoid whenever possible. Here I strongly believe that ease of use is more important than the extra type safety. If a user calls the function with an invalid value, it will fail immediately, so the issue is not subtle and easy to catch if there is any test coverage.

@JukkaL
Copy link
Contributor Author

JukkaL commented Dec 15, 2022

To be fair, this particular PR that I'd like to revert probably isn't going to cause widespread problems, but I'm worried that there's a tendency to keep increasing our reliance on type aliases that don't exist at runtime, and they will give more ammunition to typing skeptics who think that Python typing is "bolted-on" and too hard to use.

I believe that the 90+% of Python users who aren't using static type checking yet is a very important (perhaps the most important) audience for us, and they will be beginners if/when they start using typing, by definition. Thus making things easier to use instead more complicated feels essential to me, even at the cost of (a little) type safety.

@Viicos
Copy link
Contributor

Viicos commented Dec 15, 2022

I have to say your are making a point here, having to specify Literal[0, 8, 12, 14] in the function definition f1 makes it harder to understand, I don't have any preferences on this. I think it is a good improvement if Zipfile is used as is, without any wrapper function, but as soon as you need to specify a compression parameter (like the example functions above), people would need to use Literal, which is new in Python 3.8 (so people would need to use typing_extensions if they support py37).

If this is reverted, we might want to keep literal for constants, that doesn't hurt.

We might have the same issue in other places using Literal as well, I was thinking about open with the mode parameter, but there's already an overload with type hint str, but there's probably still places where we might encounter the same situation.

@srittau
Copy link
Collaborator

srittau commented Dec 16, 2022

Just to clarify: I'm -0 on reverting, but I'm not going to fight it. But we should really have a general discussion about flag arguments and Literal at some point.

@AlexWaygood
Copy link
Member

I can see both sides of the argument here; I don't really have a strong opinion either way. I was somewhat wary of the PR being merged to begin with, so I'm fine with it being reverted.

@AlexWaygood AlexWaygood merged commit 2b20b70 into python:main Dec 16, 2022
@AlexWaygood
Copy link
Member

AlexWaygood commented Dec 16, 2022

In the longer term, one possible way of improving things for this module: CPython could make a backwards-compatible change to turn these constants into members of an IntEnum:

import enum

@enum.global_enum
class CompressionMode(enum.IntEnum):
    ZIP_STORED = 0
    ZIP_DEFLATED = 8
    ZIP_BZIP2 = 12
    ZIP_LZMA = 14

Then typeshed could possibly be slightly stricter than the runtime on Python 3.12+ (due to the fact that, as @JukkaL says, the precise values of these constants aren't documented anywhere, so are something of an implementation detail):

import sys
from enum import IntEnum
from typing_extensions import TypeAlias

if sys.version_info >= (3, 12):
    class CompressionMode(IntEnum):
        ZIP_STORED: int
        ZIP_DEFLATED: int
        ZIP_BZIP2: int
        ZIP_LZMA: int

    ZIP_STORED = CompressionMode.ZIP_STORED
    ZIP_DEFLATED = CompressionMode.ZIP_DEFLATED
    ZIP_BZIP2 = CompressionMode.ZIP_BZIP2
    ZIP_LZMA = CompressionMode.ZIP_LZMA
    _CompressionMode: TypeAlias = CompressionMode
else:
    ZIP_STORED: int
    ZIP_DEFLATED: int
    ZIP_BZIP2: int
    ZIP_LZMA: int
    _CompressionMode: TypeAlias = int

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants