-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement dynamic class provenance tracking to fix isinstance semantics and add support for dynamically defined enums #246
Conversation
Codecov Report
@@ Coverage Diff @@
## master #246 +/- ##
==========================================
+ Coverage 88.22% 89.15% +0.93%
==========================================
Files 1 1
Lines 552 627 +75
Branches 112 129 +17
==========================================
+ Hits 487 559 +72
- Misses 42 44 +2
- Partials 23 24 +1
Continue to review full report at Codecov.
|
@pitrou @pierreglaser if you have any suggestions / comments on this solution, please let me know. |
I had forgotten about #105 which implemented similar support for However I also think this is a dangerous feature in its current implementation. I think we need to better think trough when an how to reconcile dynamic types (including enums for also other dynamically defined types) on a cluster with some explicit scoping mechanism. |
d8d2b37
to
2c83439
Compare
Here is another valid use case that cannot be support by this PR: from enum import Enum
from dask.distributed import Client
class Color(Enum):
RED = 1
GREEN = 2
client = Client()
data_futures = client.scatter([Color.RED, Color.GREEN])
def is_red(x):
return x is Color.RED
for f in client.map(is_red, data_futures):
print(f.result())
|
@mrocklin you might be interested in this PR. Basically it makes it possible to properly support enums in dask and fix the isinstance semantics for objects passed as futures. @pitrou @mrocklin I would also love to get your feedback on the solution I used to tackle this issue that involves a WeakKeyDictionary / WeakValueDictionary pair protected by a lock. Maybe there is a better way to achieve a similar result. @pcmoritz @HyukjinKwon I would love it if you could test this PR with Ray and Spark to check that it does not introduce any regression. |
@mrocklin when launching the dask-distributed tests against this PR, https://travis-ci.org/cloudpipe/cloudpickle/jobs/494254377 It does not seem to be related to the change of this PR but I am not 100% sure and it seems deterministic. |
Resolved here: dask/distributed#2533 |
Alright: all downstream CI tests pass (dask, loky, joblib). I removed the hard dependency to the enum module so that it should fix a broken test on Py 2.7 on Ray. Waiting for more feedback. |
@llllllllll @ssanderson I would also love to get your input on this PR. |
From the Ray side, this PR passes the test suite, so looking forward to seeing it merged! Thanks for all the hard work :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took so long for me to look at this. I will try to be more responsive here in the future.
I really like the type class tracking work, this seems super useful and fixes hard to explain isinstance
bugs. I think there are a few cases that we still don't support with the enum serialization. It's pretty late right now so I am not sure yet what the correct fix is. I posted some tests in the comments below that show the behavior I am talking about. Tomorrow I can PR those tests against this branch and then look into making them pass.
class_tracker_id, extra): | ||
"""Build dynamic class with an empty __dict__ to be filled once memoized | ||
|
||
If class_tracker_id is not None, try to lookup an existing class definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can this be None if we are always passing in _ensure_tracking(obj)
on line 590. The enum case also appears to pass this as a non-None value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It used to be the case in a former version of this PR. However I wonder if it's not a good idea to make it possible to disable tracking by passing None for forward compatibility.
I am not sure. Would this be a YAGNI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this seems fine given that it is already implemented.
cloudpickle/cloudpickle.py
Outdated
@@ -468,6 +527,9 @@ def save_dynamic_class(self, obj): | |||
functions, or that otherwise can't be serialized as attribute lookups | |||
from global modules. | |||
""" | |||
if Enum is not None and issubclass(obj, Enum): | |||
return self._save_dynamic_enum(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently a case that we don't support, which we may want to consider. Currently the enum module has code that adds a failing reduce to the enum subclass if the mixed in type isn't (default) picklable.
# If a custom type is mixed into the Enum, and it does not know how
# to pickle itself, pickle.dumps will succeed but pickle.loads will
# fail. Rather than have the error show up later and possibly far
# from the source, sabotage the pickle protocol for this class so
# that pickle.dumps also fails.
#
# However, if the new class implements its own __reduce_ex__, do not
# sabotage -- it's on them to make sure it works correctly. We use
# __reduce_ex__ instead of any of the others as it is preferred by
# pickle over __reduce__, and it handles all pickle protocols.
if '__reduce_ex__' not in classdict:
if member_type is not object:
methods = ('__getnewargs_ex__', '__getnewargs__',
'__reduce_ex__', '__reduce__')
if not any(m in member_type.__dict__ for m in methods):
_make_class_unpicklable(enum_class)
This means I get the following error:
In [18]: class A:
...: def a(self): return 'a'
...:
...:
In [19]: class AEnum(A, enum.Enum):
...: value = 1
...:
...:
In [20]: cp.loads(cp.dumps(AEnum.value))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-24ee7c65848f> in <module>()
----> 1 cp.loads(cp.dumps(AEnum.value))
~/projects/python/cloudpickle/cloudpickle/cloudpickle.py in dumps(obj, protocol)
1015 try:
1016 cp = CloudPickler(file, protocol=protocol)
-> 1017 cp.dump(obj)
1018 return file.getvalue()
1019 finally:
~/projects/python/cloudpickle/cloudpickle/cloudpickle.py in dump(self, obj)
297 self.inject_addons()
298 try:
--> 299 return Pickler.dump(self, obj)
300 except RuntimeError as e:
301 if 'recursion' in e.args[0]:
/usr/lib64/python3.6/pickle.py in dump(self, obj)
407 if self.proto >= 4:
408 self.framer.start_framing()
--> 409 self.save(obj)
410 self.write(STOP)
411 self.framer.end_framing()
/usr/lib64/python3.6/pickle.py in save(self, obj, save_persistent_id)
494 reduce = getattr(obj, "__reduce_ex__", None)
495 if reduce is not None:
--> 496 rv = reduce(self.proto)
497 else:
498 reduce = getattr(obj, "__reduce__", None)
~/.virtualenvs/cloudpickle/lib/python3.6/enum.py in _break_on_call_reduce(self, proto)
44 """Make the given class un-picklable."""
45 def _break_on_call_reduce(self, proto):
---> 46 raise TypeError('%r cannot be pickled' % self)
47 cls.__reduce_ex__ = _break_on_call_reduce
48 cls.__module__ = '<unknown>'
TypeError: <AEnum.value: 1> cannot be pickled
However, I am able to pickle A
objects just fine with cloudpickle. I am not sure if we can fix this because it looks a lot like an explicit __reduce_ex__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that's what the type
argument is for. I forgot to address @pierreglaser's comments above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a deeper look with @pierreglaser and it seems that this is a fundamental limitation of enums that inherit from another Python class: this problem is already present for enums defined statically in importable modules. I therefore believe this cannot be fixed in cloudpickle but should be fixed upstream in Python's pickle or enum modules themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more specific:
In [6]: cat mymodule.py
from enum import Enum
class A:
def method_a(self):
return "a"
class E(A, Enum):
x = 1
y = 2
In [7]: from mymodule import E
In [8]: import pickle
In [9]: pickle.dumps(E.x)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-5c82fc8d827e> in <module>()
----> 1 pickle.dumps(E.x)
/opt/python3.7/lib/python3.7/enum.py in _break_on_call_reduce(self, proto)
42 """Make the given class un-picklable."""
43 def _break_on_call_reduce(self, proto):
---> 44 raise TypeError('%r cannot be pickled' % self)
45 cls.__reduce_ex__ = _break_on_call_reduce
46 cls.__module__ = '<unknown>'
TypeError: <E.x: 1> cannot be pickled
The class A
can be imported from mymodule
so maybe it's actually a bug in Python itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think somehow this is not the case anymore in recent python versions (3.7.1 and later):
Python 3.7.1 (tags/v3.7.1:260ec2c36a, Jan 24 2019, 18:18:14)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: class A:
...: def a(self): return "a"
...: import enum
...: class AEnum(A, enum.Enum):
...: value = 1
...: import pickle
...: pickle.dumps(AEnum.value)
Out[1]: b'\x80\x03c__main__\nAEnum\nq\x00K\x01\x85q\x01Rq\x02.'
See: Enum: mixin classes don't mix well with already mixed Enums
@llllllllll thanks for the review. I think I addressed your comments. |
It seems that my latest commits have broken some dask tests although I am not 100% sure because there is no output for 10min and travis kills the job without letting pytest report the failures. Will have to do that manually but I won't have time soon enough. If someone else wants to have a look don't hesitate. |
b01ec79
to
5c11202
Compare
@llllllllll I have rebased this PR to check that the downstream tests of dask distributed, ray, loky and joblib pass. I now pass all the bases as you suggested so and also added support for methods defined directly on the enum class itself. |
Sorry for nagging here, but is there anything which blocks this PR? The fix would be pretty useful for my team. |
I think we might want to release cloudpickle 1.0 without this change and 1.1 with this PR merged so that we have a pretty uncontroversial 1.0 release for cloudpickle as @mrocklin recently asked. |
@llllllllll I would like to merge this but I would really appreciate it if you could confirm that I took your comments into account as intended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that again. I just got back from C++now and did another pass. All of my comments have been addressed.
Thanks for working on this feature, this is really amazing work!
class_tracker_id, extra): | ||
"""Build dynamic class with an empty __dict__ to be filled once memoized | ||
|
||
If class_tracker_id is not None, try to lookup an existing class definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this seems fine given that it is already implemented.
Thanks you @llllllllll for your review. I am merging. |
@philippotto cloudpickle 1.1.0 is out with this fix included. |
Awesome, thank you! |
2.0.0 ===== - Python 3.5 is no longer supported. - Support for registering modules to be serialised by value. This allows code defined in local modules to be serialised and executed remotely without those local modules installed on the remote machine. ([PR #417](cloudpipe/cloudpickle#417)) - Fix a side effect altering dynamic modules at pickling time. ([PR #426](cloudpipe/cloudpickle#426)) - Support for pickling type annotations on Python 3.10 as per [PEP 563]( https://www.python.org/dev/peps/pep-0563/) ([PR #400](cloudpipe/cloudpickle#400)) - Stricter parametrized type detection heuristics in _is_parametrized_type_hint to limit false positives. ([PR #409](cloudpipe/cloudpickle#409)) - Support pickling / depickling of OrderedDict KeysView, ValuesView, and ItemsView, following similar strategy for vanilla Python dictionaries. ([PR #423](cloudpipe/cloudpickle#423)) - Suppressed a source of non-determinism when pickling dynamically defined functions and handles the deprecation of co_lnotab in Python 3.10+. ([PR #428](cloudpipe/cloudpickle#428)) 1.6.0 ===== - `cloudpickle`'s pickle.Pickler subclass (currently defined as `cloudpickle.cloudpickle_fast.CloudPickler`) can and should now be accessed as `cloudpickle.Pickler`. This is the only officially supported way of accessing it. ([issue #366](cloudpipe/cloudpickle#366)) - `cloudpickle` now supports pickling `dict_keys`, `dict_items` and `dict_values`. ([PR #384](cloudpipe/cloudpickle#384)) 1.5.0 ===== - Fix a bug causing cloudpickle to crash when pickling dynamically created, importable modules. ([issue #360](cloudpipe/cloudpickle#354)) - Add optional dependency on `pickle5` to get improved performance on Python 3.6 and 3.7. ([PR #370](cloudpipe/cloudpickle#370)) - Internal refactoring to ease the use of `pickle5` in cloudpickle for Python 3.6 and 3.7. ([PR #368](cloudpipe/cloudpickle#368)) 1.4.1 ===== - Fix incompatibilities between cloudpickle 1.4.0 and Python 3.5.0/1/2 introduced by the new support of cloudpickle for pickling typing constructs. ([issue #360](cloudpipe/cloudpickle#360)) - Restore compat with loading dynamic classes pickled with cloudpickle version 1.2.1 that would reference the `types.ClassType` attribute. ([PR #359](cloudpipe/cloudpickle#359)) 1.4.0 ===== **This version requires Python 3.5 or later** - cloudpickle can now all pickle all constructs from the ``typing`` module and the ``typing_extensions`` library in Python 3.5+ ([PR #318](cloudpipe/cloudpickle#318)) - Stop pickling the annotations of a dynamic class for Python < 3.6 (follow up on #276) ([issue #347](cloudpipe/cloudpickle#347)) - Fix a bug affecting the pickling of dynamic `TypeVar` instances on Python 3.7+, and expand the support for pickling `TypeVar` instances (dynamic or non-dynamic) to Python 3.5-3.6 ([PR #350](cloudpipe/cloudpickle#350)) - Add support for pickling dynamic classes subclassing `typing.Generic` instances on Python 3.7+ ([PR #351](cloudpipe/cloudpickle#351)) 1.3.0 ===== - Fix a bug affecting dynamic modules occuring with modified builtins ([issue #316](cloudpipe/cloudpickle#316)) - Fix a bug affecting cloudpickle when non-modules objects are added into sys.modules ([PR #326](cloudpipe/cloudpickle#326)). - Fix a regression in cloudpickle and python3.8 causing an error when trying to pickle property objects. ([PR #329](cloudpipe/cloudpickle#329)). - Fix a bug when a thread imports a module while cloudpickle iterates over the module list ([PR #322](cloudpipe/cloudpickle#322)). - Add support for out-of-band pickling (Python 3.8 and later). https://docs.python.org/3/library/pickle.html#example ([issue #308](cloudpipe/cloudpickle#308)) - Fix a side effect that would redefine `types.ClassTypes` as `type` when importing cloudpickle. ([issue #337](cloudpipe/cloudpickle#337)) - Fix a bug affecting subclasses of slotted classes. ([issue #311](cloudpipe/cloudpickle#311)) - Dont pickle the abc cache of dynamically defined classes for Python 3.6- (This was already the case for python3.7+) ([issue #302](cloudpipe/cloudpickle#302)) 1.2.2 ===== - Revert the change introduced in ([issue #276](cloudpipe/cloudpickle#276)) attempting to pickle functions annotations for Python 3.4 to 3.6. It is not possible to pickle complex typing constructs for those versions (see [issue #193]( cloudpipe/cloudpickle#193)) - Fix a bug affecting bound classmethod saving on Python 2. ([issue #288](cloudpipe/cloudpickle#288)) - Add support for pickling "getset" descriptors ([issue #290](cloudpipe/cloudpickle#290)) 1.2.1 ===== - Restore (partial) support for Python 3.4 for downstream projects that have LTS versions that would benefit from cloudpickle bug fixes. 1.2.0 ===== - Leverage the C-accelerated Pickler new subclassing API (available in Python 3.8) in cloudpickle. This allows cloudpickle to pickle Python objects up to 30 times faster. ([issue #253](cloudpipe/cloudpickle#253)) - Support pickling of classmethod and staticmethod objects in python2. arguments. ([issue #262](cloudpipe/cloudpickle#262)) - Add support to pickle type annotations for Python 3.5 and 3.6 (pickling type annotations was already supported for Python 3.7, Python 3.4 might also work but is no longer officially supported by cloudpickle) ([issue #276](cloudpipe/cloudpickle#276)) - Internal refactoring to proactively detect dynamic functions and classes when pickling them. This refactoring also yields small performance improvements when pickling dynamic classes (~10%) ([issue #273](cloudpipe/cloudpickle#273)) 1.1.1 ===== - Minor release to fix a packaging issue (Markdown formatting of the long description rendered on pypi.org). The code itself is the same as 1.1.0. 1.1.0 ===== - Support the pickling of interactively-defined functions with positional-only arguments. ([issue #266](cloudpipe/cloudpickle#266)) - Track the provenance of dynamic classes and enums so as to preseve the usual `isinstance` relationship between pickled objects and their original class defintions. ([issue #246](cloudpipe/cloudpickle#246)) 1.0.0 ===== - Fix a bug making functions with keyword-only arguments forget the default values of these arguments after being pickled. ([issue #264](cloudpipe/cloudpickle#264)) 0.8.1 ===== - Fix a bug (already present before 0.5.3 and re-introduced in 0.8.0) affecting relative import instructions inside depickled functions ([issue #254](cloudpipe/cloudpickle#254)) 0.8.0 ===== - Add support for pickling interactively defined dataclasses. ([issue #245](cloudpipe/cloudpickle#245)) - Global variables referenced by functions pickled by cloudpickle are now unpickled in a new and isolated namespace scoped by the CloudPickler instance. This restores the (previously untested) behavior of cloudpickle prior to changes done in 0.5.4 for functions defined in the `__main__` module, and 0.6.0/1 for other dynamic functions. 0.7.0 ===== - Correctly serialize dynamically defined classes that have a `__slots__` attribute. ([issue #225](cloudpipe/cloudpickle#225)) 0.6.1 ===== - Fix regression in 0.6.0 which breaks the pickling of local function defined in a module, making it impossible to access builtins. ([issue #211](cloudpipe/cloudpickle#211)) 0.6.0 ===== - Ensure that unpickling a function defined in a dynamic module several times sequentially does not reset the values of global variables. ([issue #187](cloudpipe/cloudpickle#205)) - Restrict the ability to pickle annotations to python3.7+ ([issue #193]( cloudpipe/cloudpickle#193) and [issue #196]( cloudpipe/cloudpickle#196)) - Stop using the deprecated `imp` module under Python 3. ([issue #207](cloudpipe/cloudpickle#207)) - Fixed pickling issue with singleton types `NoneType`, `type(...)` and `type(NotImplemented)` ([issue #209](cloudpipe/cloudpickle#209)) 0.5.6 ===== - Ensure that unpickling a locally defined function that accesses the global variables of a module does not reset the values of the global variables if they are already initialized. ([issue #187](cloudpipe/cloudpickle#187)) 0.5.5 ===== - Fixed inconsistent version in `cloudpickle.__version__`. 0.5.4 ===== - Fixed a pickling issue for ABC in python3.7+ ([issue #180]( cloudpipe/cloudpickle#180)). - Fixed a bug when pickling functions in `__main__` that access global variables ([issue #187]( cloudpipe/cloudpickle#187)). 0.5.3 ===== - Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types ([issue #144](cloudpipe/cloudpickle#144)). - itertools objects can also pickled ([PR #156](cloudpipe/cloudpickle#156)). - `logging.RootLogger` can be also pickled ([PR #160](cloudpipe/cloudpickle#160)). 0.5.2 ===== - Fixed a regression: `AttributeError` when loading pickles that hold a reference to a dynamically defined class from the `__main__` module. ([issue #131]( cloudpipe/cloudpickle#131)). - Make it possible to pickle classes and functions defined in faulty modules that raise an exception when trying to look-up their attributes by name. 0.5.1 ===== - Fixed `cloudpickle.__version__`. 0.5.0 ===== - Use `pickle.HIGHEST_PROTOCOL` by default. 0.4.4 ===== - `logging.RootLogger` can be also pickled ([PR #160](cloudpipe/cloudpickle#160)). 0.4.3 ===== - Fixed a regression: `AttributeError` when loading pickles that hold a reference to a dynamically defined class from the `__main__` module. ([issue #131]( cloudpipe/cloudpickle#131)). - Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types. ([issue #144](cloudpipe/cloudpickle#144)) 0.4.2 ===== - Restored compatibility with pickles from 0.4.0. - Handle the `func.__qualname__` attribute. 0.4.1 ===== - Fixed a crash when pickling dynamic classes whose `__dict__` attribute was defined as a [`property`](https://docs.python.org/3/library/functions.html#property). Most notably, this affected dynamic [namedtuples](https://docs.python.org/2/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields) in Python 2. (cloudpipe/cloudpickle#113) - Cloudpickle now preserves the `__module__` attribute of functions (cloudpipe/cloudpickle#118). - Fixed a crash when pickling modules that don't have a `__package__` attribute (cloudpipe/cloudpickle#116). 0.4.0 ===== * Fix functions with empty cells * Allow pickling Logger objects * Fix crash when pickling dynamic class cycles * Ignore "None" mdoules added to sys.modules * Support WeakSets and ABCMeta instances * Remove non-standard `__transient__` support * Catch exception from `pickle.whichmodule()` 0.3.1 ===== * Fix version information and ship a changelog 0.3.0 ===== * Import submodules accessed by pickled functions * Support recursive functions inside closures * Fix `ResourceWarnings` and `DeprecationWarnings` * Assume modules with `__file__` attribute are not dynamic 0.2.2 ===== * Support Python 3.6 * Support Tornado Coroutines * Support builtin methods
This is a fix for #244 (and #101) to add support for dynamically defined Enum subclasses.
Properly adding support for dynamic Enums required to more broadly fix the
isinstance
semantics as initially requested in #195.The proposed solution involves tracking the provenance of pickled dynamic class definitions with a pair of
weakref.WeakKeyDictionary
/weakref.WeakValueDictionary
protected by athreading.Lock
.