Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loading fails for typing.Tuple[()] #517

Closed
ryanthompson591 opened this issue Jun 22, 2022 · 17 comments
Closed

loading fails for typing.Tuple[()] #517

ryanthompson591 opened this issue Jun 22, 2022 · 17 comments

Comments

@ryanthompson591
Copy link

ryanthompson591 commented Jun 22, 2022

How to reproduce:

Run this code with 0.3.4 you will see it pass.
Run in 0.3.5.1 and you will see it fail.

import dill

from typing import Union, Tuple

FeatureValueType = Union[str, int, float]  # pylint: disable=invalid-name
SingletonSliceKeyType = Tuple[str, FeatureValueType]  # pylint: disable=invalid-name
A_TYPE = Union[Tuple[()], Tuple[SingletonSliceKeyType, ...]]

def a_function() -> A_TYPE:
  return ('string', 'feature_value_type')

serialized = dill.dumps(a_function)
loaded = dill.loads(serialized)

print(loaded())

I get the following error in 0.3.5.1

Traceback (most recent call last):
  File "/Users/ryanthompson/PycharmProjects/learningProjects/dill_regression/regression_test.py", line 13, in <module>
    loaded = dill.loads(serialized)
  File "/Users/ryanthompson/.pyenv/versions/3.8.11/lib/python3.8/site-packages/dill/_dill.py", line 387, in loads
    return load(file, ignore, **kwds)
  File "/Users/ryanthompson/.pyenv/versions/3.8.11/lib/python3.8/site-packages/dill/_dill.py", line 373, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/Users/ryanthompson/.pyenv/versions/3.8.11/lib/python3.8/site-packages/dill/_dill.py", line 646, in load
    obj = StockUnpickler.load(self)
  File "/Users/ryanthompson/.pyenv/versions/3.8.11/lib/python3.8/typing.py", line 804, in __getitem__
    return self.__getitem_inner__(params)
  File "/Users/ryanthompson/.pyenv/versions/3.8.11/lib/python3.8/typing.py", line 261, in inner
    return func(*args, **kwds)
  File "/Users/ryanthompson/.pyenv/versions/3.8.11/lib/python3.8/typing.py", line 830, in __getitem_inner__
    params = tuple(_type_check(p, msg) for p in params)
  File "/Users/ryanthompson/.pyenv/versions/3.8.11/lib/python3.8/typing.py", line 830, in <genexpr>
    params = tuple(_type_check(p, msg) for p in params)
  File "/Users/ryanthompson/.pyenv/versions/3.8.11/lib/python3.8/typing.py", line 149, in _type_check
    raise TypeError(f"{msg} Got {arg!r:.100}.")
TypeError: Tuple[t0, t1, ...]: each t must be a type. Got ().
@mmckerns
Copy link
Member

I can confirm this fails in python 3.8 (at least) with dill master. Minimal code is:

>>> import dill
>>> from typing import Tuple
>>> t = Tuple[()]
>>> s = dill.dumps(t)
>>> dill.loads(s)
...
TypeError: Tuple[t0, t1, ...]: each t must be a type. Got ().

@mmckerns
Copy link
Member

mmckerns commented Jun 22, 2022

The message in the TypeError says that you should be passing a type, and not an expression (i.e. ()).
If instead you do this, it works as expected:

>>> import dill
>>> from typing import Tuple
>>> t = Tuple[tuple]
>>> s = dill.dumps(t)
>>> dill.loads(s)
typing.Tuple[tuple]

Is there a reason you are passing a tuple instance instead of a tuple?

This seems like a bug in python. Or it's a weird corner case...

>>> t = Tuple[[]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 804, in __getitem__
    return self.__getitem_inner__(params)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 261, in inner
    return func(*args, **kwds)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 830, in __getitem_inner__
    params = tuple(_type_check(p, msg) for p in params)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 830, in <genexpr>
    params = tuple(_type_check(p, msg) for p in params)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 149, in _type_check
    raise TypeError(f"{msg} Got {arg!r:.100}.")
TypeError: Tuple[t0, t1, ...]: each t must be a type. Got [].
>>> t = Tuple[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 804, in __getitem__
    return self.__getitem_inner__(params)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 261, in inner
    return func(*args, **kwds)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 830, in __getitem_inner__
    params = tuple(_type_check(p, msg) for p in params)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 830, in <genexpr>
    params = tuple(_type_check(p, msg) for p in params)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 149, in _type_check
    raise TypeError(f"{msg} Got {arg!r:.100}.")
TypeError: Tuple[t0, t1, ...]: each t must be a type. Got 1.
>>> t = Tuple[()]
>>> t
typing.Tuple[()]

It might be due to some artifact of this also working:

>>> t = Tuple[(int, str)]
>>> t
typing.Tuple[int, str]
>>> t = Tuple[int, str]
>>> t
typing.Tuple[int, str]

Regardless, if it's a valid way to declare a typed tuple, then it should be supported.

@mmckerns mmckerns changed the title Regression from in 0.3.5.1 from 0.3.4 - unable to decode tuple type of () loading fails for typing.Tuple[()] Jun 22, 2022
@mmckerns mmckerns added the bug label Jun 22, 2022
@ryanthompson591
Copy link
Author

It might be possible to change our code to work around this for now. However, ideally dill could support this as I see this type of code all over the place.

For example:
https://github.com/numpy/numpy/blob/df7ccc4034e04fd3fadaa33710c727589dc0d938/numpy/__init__.pyi#L2532
https://github.com/google/flax/blob/7f510055c71caa4470a126e0ce20dd7ec6805241/flax/core/frozen_dict.py#L151

My understanding is the Tuple[()] means an empty tuple whereas Tuple[tuple] has a different meaning. Any clue why it would work in a previous version?

@mmckerns
Copy link
Member

mmckerns commented Jun 22, 2022

Yeah, I just found this is indeed intentional. So, definitely needs support. I'll either need to step through the commits or look at the difference in the dill trace to see what changed. Hard to tell from an error thrown on loading.

@mmckerns
Copy link
Member

Here's what I'm seeing from dill-0.3.4:

>>> import dill
>>> from typing import Tuple
>>> t = Tuple[()]
>>> t
typing.Tuple[()]
>>> s = dill.dumps(t)
>>> dill.loads(s)
typing.Tuple
>>> 

So, essentially it didn't serialize correctly.

@mmckerns
Copy link
Member

mmckerns commented Jun 23, 2022

The old trace (note it's python 3.6, but that "shouldn't matter"):

>>> dill.detect.trace(True)
>>> s = dill.dumps(t)
T2: typing.Tuple[()]
F2: <function _create_type at 0x10cd36e18>
# F2
T4: <class 'typing.TupleMeta'>
# T4
T4: typing.Tuple
# T4
T1: <class 'tuple'>
F2: <function _load_type at 0x10cd36d90>
# F2
# T1
D2: <dict object at 0x10cdb7a20>
Cm: <staticmethod object at 0x10cd84240>
T1: <class 'staticmethod'>
# T1
F1: <function Tuple.__new__ at 0x10cd82598>
F2: <function _create_function at 0x10cd36ea0>
# F2
Co: <code object __new__ at 0x10cd63c00, file "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/typing.py", line 1291>
F2: <function _create_code at 0x10cd36f28>
# F2
# Co
D4: <dict object at 0x10c9c3ea0>
# D4
D2: <dict object at 0x10cd64c60>
# D2
# F1
# Cm
T1: <class 'frozenset'>
# T1
T4: <class '_weakrefset.WeakSet'>
# T4
D2: <dict object at 0x10cd87120>
T1: <class 'set'>
# T1
F1: <function WeakSet.__init__.<locals>._remove at 0x10cd82620>
Co: <code object _remove at 0x10c795c90, file "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_weakrefset.py", line 38>
# Co
D4: <dict object at 0x10c791750>
# D4
R1: <weakref at 0x10cd7b868; to 'WeakSet' at 0x10cd84278>
F2: <function _create_weakref at 0x10cd31620>
# F2
# R1
D2: <dict object at 0x10cd64e58>
# D2
# F1
# D2
D2: <dict object at 0x10cd87168>
F1: <function WeakSet.__init__.<locals>._remove at 0x10cd826a8>
D4: <dict object at 0x10c791750>
# D4
R1: <weakref at 0x10cd7b8b8; to 'WeakSet' at 0x10cd842e8>
# R1
D2: <dict object at 0x10cd64dc8>
# D2
# F1
# D2
D2: <dict object at 0x10cd871b0>
F1: <function WeakSet.__init__.<locals>._remove at 0x10cd82730>
D4: <dict object at 0x10c791750>
# D4
R1: <weakref at 0x10cd7b908; to 'WeakSet' at 0x10cd84358>
# R1
D2: <dict object at 0x10cd64f30>
# D2
# F1
# D2
T1: <class 'object'>
# T1
F1: <function _make_subclasshook.<locals>.__extrahook__ at 0x10cdbc268>
Co: <code object __extrahook__ at 0x10cd5d4b0, file "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/typing.py", line 889>
# Co
D4: <dict object at 0x10c9c3ea0>
# D4
Ce2: <cell at 0x10cd33cd8: TupleMeta object at 0x7fa01aae4858>
F2: <function _create_cell at 0x10cd31378>
# F2
# Ce2
D2: <dict object at 0x10cd64d38>
# D2
# F1
# D2
B1: <built-in function getattr>
F2: <function _get_attr at 0x10cd319d8>
# F2
# B1
M2: <module 'dill._dill' from '/Users/mmckerns/lib/python3.6/site-packages/dill-0.3.5.dev0-py3.6.egg/dill/_dill.py'>
# M2
B1: <built-in function setattr>
# B1
# T2

and the new trace (in python 3.7):

>>> dill.detect.trace(True)
>>> s = dill.dumps(t)

(i.e. totally handled by pickle.Pickler).

So this means, it would need a special case for creation. However, after thinking about it a bit, I'm pretty sure this is a python bug, and the functionality really should be added to pickle.

@mmckerns
Copy link
Member

mmckerns commented Jun 23, 2022

This might be something that won't get fixed in python, as typing.Tuple is deprecated. See: https://docs.python.org/3/library/typing.html#typing.Tuple and https://peps.python.org/pep-0585.

... and empty tuple type pickling works on builtins.tuple (in python 3.10):

>>> import pickle
>>> t = tuple[()]
>>> t
tuple[()]
>>> s = pickle.dumps(t)
>>> pickle.loads(s)
tuple[()]
>>>

@mmckerns
Copy link
Member

mmckerns commented Jun 23, 2022

Here's a potential workaround, and to me seems to be a python bug:

>>> import pickle
>>> import typing
>>> t = typing.Tuple[()]
>>> t
typing.Tuple[()]
>>> t = t.copy_with(())
>>> t
typing.Tuple[]
>>> s = pickle.dumps(t)
>>> pickle.loads(s)
typing.Tuple[()]
>>> typing.Tuple[]
  File "<stdin>", line 1
    typing.Tuple[]
                 ^
SyntaxError: invalid syntax

So, a function could be added to intercept the Tuple and apply copy_with (or alter the __args__):

>>> import typing
>>> t = typing.Tuple[()]
>>> p = t.copy_with(())
>>> t._name, t.__args__
('Tuple', ((),))
>>> p._name, p.__args__
('Tuple', ())
>>> 

The question then becomes: how does one know if the intent was to pickle a empty tuple type or the weird (and probably erroneous) empty Tuple? I believe that dill could store the _name and __args__, and then recreate the appropriate object from that.

@mmckerns mmckerns removed the bug label Jun 23, 2022
@tvalentyn
Copy link

Since typing.Tuple is deprecated, I would suggest to preserve the pickling behavior of 3.4 even if it less semantically precise, to have less breaking changes.

@mmckerns
Copy link
Member

mmckerns commented Jun 24, 2022

@tvalentyn: with the new dill shims module, we can preserve prior behavior so as to not break stored pickles. Is that what you are looking for? I'd argue that if there's a bugfix it should be applied (instead of preserving incorrect behavior), as long as stored pickles that were created before the change can be unpickled after the change. Also, it may be deprecated, but who knows how long it will be around before it's removed.

@ryanthompson591
Copy link
Author

What do you think about the idea of accepting a patch into the shims module to handle this edge case?

@ryanthompson591
Copy link
Author

FYI: I did file an issue with python
python/cpython#94245

@mmckerns
Copy link
Member

From the cpython issue (ref above):

We're not going to fix this for 3.9 and lower because those branches are in security fix-only mode, but we can still fix it on 3.10.

Hmm, so given that information and everything else above, I'm not sure how to best handle this yet. Will need some thought...

@tvalentyn
Copy link

@tvalentyn: with the new dill shims module, we can preserve prior behavior so as to not break stored pickles. Is that what you are looking for? I'd argue that if there's a bugfix it should be applied (instead of preserving incorrect behavior), as long as stored pickles that were created before the change can be unpickled after the change. Also, it may be deprecated, but who knows how long it will be around before it's removed.

I assumed, perhaps incorrectly, that it would be removed ~soon and if dill users didn't complain about this particular buggy behavior when pickling the deprecated portion of Python semantics, then perhaps it's not worth fixing it, given that the fix introduces issues for some other group of users due to a bug in cpython that's not going to be fixed.

that's my take. in a perfect world of course we'd fix the bug everywhere, but looks like that's not going to happen.

One option could be, buggy dill behavior only in 3.10+ after the cpython fixes the bug on their side. But perhaps adding such branching to dill would involve unnecessary complexity.

@mmckerns
Copy link
Member

mmckerns commented Jul 23, 2022

I ended up adding special handling to serialize two cases of the typing.Tuple, and for everything else, punt back to pickle. This is a change from the erroneous behavior (it worked by coincidence) in 0.3.4 and earlier. @tvalentyn: let me know if you think there needs to be some special handling for old pickles for Tuple[()]... as (shown above) they didn't actually serialize correctly.

@tvalentyn
Copy link

Thank you. Much appreciated. Left some comments.

@ryanthompson591
Copy link
Author

Great. Just tried this out with dill at head and it works great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants