-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Regression on pickling classes from the __main__ module #132
Conversation
tests/testutils.py
Outdated
from subprocess import PIPE | ||
from subprocess import STDOUT | ||
from subprocess import CalledProcessError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you import all those names at once? :-)
tests/testutils.py
Outdated
f.write(source_code.encode('utf-8')) | ||
|
||
cmd = [sys.executable, source_file] | ||
pythonpath = "{cwd}/tests:{cwd}".format(cwd=os.getcwd()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you use __file__
or something? getcwd
sounds fragile.
Codecov Report
@@ Coverage Diff @@
## master #132 +/- ##
==========================================
- Coverage 84.08% 83.76% -0.32%
==========================================
Files 2 2
Lines 534 536 +2
Branches 97 98 +1
==========================================
Hits 449 449
- Misses 63 64 +1
- Partials 22 23 +1
Continue to review full report at Codecov.
|
@pitrou I have tried to make a minimal change to |
The coverage decrease is caused by the new branch that can only be executed in subprocess calls and therefore not accounted in the report. |
I don't understand the fix. How does it work? |
Apparently instances of |
I have pushed a simpler fix. |
@@ -628,12 +628,16 @@ def save_global(self, obj, name=None, pack=struct.pack): | |||
The name of this method is somewhat misleading: all types get | |||
dispatched here. | |||
""" | |||
if obj.__module__ == "__main__": | |||
return self.save_dynamic_class(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean save_dynamic_class
is also used for functions? If so, could you rename that method and update its docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think functions get dispatched here. The dispatch entries for save_global
are:
dispatch[type] = save_global
dispatch[types.ClassType] = save_global
whereas function-likes are dispatched via:
dispatch[types.FunctionType] = save_function
...
dispatch[types.MethodType] = save_instancemethod
...
dispatch[types.BuiltinFunctionType] = save_builtin_function
...
dispatch[classmethod] = save_classmethod
dispatch[staticmethod] = save_classmethod
I believe the name save_global
is a holdover from the base Pickler
class, which calls self.save_global
in some base class methods that we call into via super
, so renaming this isn't straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a new commit to simplify that function even further and all tests still pass. However, I am not sure we are not breaking edge cases in third party libraries and applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run the test suite of both loky and joblib against this branch all tests pass as well so I am pretty confident that the changes are fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check whether save_dynamic_class
really gets a function as input and, if so, change the name or docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou I'm pretty confident save_dynamic_class
can never be called with a function as input. It accesses obj.__bases__
unconditionally, which doesn't exist on function objects.
@@ -628,12 +628,16 @@ def save_global(self, obj, name=None, pack=struct.pack): | |||
The name of this method is somewhat misleading: all types get | |||
dispatched here. | |||
""" | |||
if obj.__module__ == "__main__": | |||
return self.save_dynamic_class(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check whether save_dynamic_class
really gets a function as input and, if so, change the name or docstring?
No it does not. It's only for classes that cannot be looked up as attributes from global modules. In our tests those are:
|
The name of the PR is misleading: the new non-regression tests call interactively defined functions from the |
Ok, thank you for clarifying :-) |
cloudpickle/cloudpickle.py
Outdated
|
||
raise | ||
return self.save_dynamic_class(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does save_dynamic_class
raise a readable exception or do we want to keep the if
above?
Maybe it's good idea to keep the protection: >>> import cloudpickle
>>> import io
>>> p = cloudpickle.CloudPickler(io.BytesIO)
>>> p.save_dynamic_class(lambda x: x)
Fatal Python error: Cannot recover from stack overflow.
Thread 0x00007fc32d281700 (most recent call first):
File "/home/ogrisel/.virtualenvs/py36/lib/python3.6/site-packages/IPython/core/history.py", line 764 in _writeout_input_cache
File "/home/ogrisel/.virtualenvs/py36/lib/python3.6/site-packages/IPython/core/history.py", line 780 in writeout_cache
File "/home/ogrisel/.virtualenvs/py36/lib/python3.6/site-packages/IPython/core/history.py", line 58 in needs_sqlite
File "<decorator-gen-23>", line 2 in writeout_cache
File "/home/ogrisel/.virtualenvs/py36/lib/python3.6/site-packages/IPython/core/history.py", line 834 in run
File "/home/ogrisel/.virtualenvs/py36/lib/python3.6/site-packages/IPython/core/history.py", line 58 in needs_sqlite
File "<decorator-gen-24>", line 2 in run
File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap
Current thread 0x00007fc33466b700 (most recent call first):
File "/usr/lib/python3.6/pickle.py", line 264 in _getattribute
File "/usr/lib/python3.6/pickle.py", line 918 in save_global
File "/home/ogrisel/code/cloudpickle/cloudpickle/cloudpickle.py", line 635 in save_global
File "/home/ogrisel/code/cloudpickle/cloudpickle/cloudpickle.py", line 375 in save_function
File "/usr/lib/python3.6/pickle.py", line 476 in save
File "/home/ogrisel/code/cloudpickle/cloudpickle/cloudpickle.py", line 480 in save_dynamic_class
File "/home/ogrisel/code/cloudpickle/cloudpickle/cloudpickle.py", line 642 in save_global
File "/home/ogrisel/code/cloudpickle/cloudpickle/cloudpickle.py", line 375 in save_function
File "/usr/lib/python3.6/pickle.py", line 476 in save
File "/home/ogrisel/code/cloudpickle/cloudpickle/cloudpickle.py", line 480 in save_dynamic_class
File "/home/ogrisel/code/cloudpickle/cloudpickle/cloudpickle.py", line 642 in save_global
File "/home/ogrisel/code/cloudpickle/cloudpickle/cloudpickle.py", line 375 in save_function
...
Aborted (core dumped) |
Actually the snippet from the previous comment is misleading, I passed the >>> import cloudpickle
>>> import io
>>> p = cloudpickle.CloudPickler(io.BytesIO())
>>> p.save_dynamic_class(lambda x: x)
Traceback (most recent call last):
File "<ipython-input-8-4356e554f31b>", line 1, in <module>
p.save_dynamic_class(lambda x: x)
File "/home/ogrisel/code/cloudpickle/cloudpickle/cloudpickle.py", line 490, in save_dynamic_class
# encountered while saving will point to the skeleton class.
AttributeError: 'function' object has no attribute '__bases__' |
But what was the previous message? It seems to me that Pickler generally raises PicklingError. |
However, I am not sure we should try to preserve the behavior of the private API on wrong inputs. What really matters is that it should raise a meaningful exception when calling the public API with wrong inputs and calling |
@robertnishihara @iaroslav-ai could you please try this branch with your code to confirm that it fixes all your problems? |
@pitrou it's true that the upstream However this case should never happen in cloudpickle as far as I know because cloudpickle has been written precisely to be able to pickle dynamic class and function definitions. I am not sure how to write a test for this. |
This reverts commit d9e02fb.
@ogrisel Thanks, I just tried it out and this seems to fix my problem. |
Yup that seems to solve it. Thanks @ogrisel ! |
…lasses from the __main__ module
This PR aims to fix the regression reported in #131 and that affects 0.4.2 and later. Right now there is is just a non-regression test to reproduce the problem.
I believe the regression was silently introduced by @pitrou's cleanups in #122 although I am not 100% sure yet because this code is quite complex.