Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cloudpickle #8577

Merged
merged 1 commit into from
May 30, 2020
Merged

Update cloudpickle #8577

merged 1 commit into from
May 30, 2020

Conversation

suquark
Copy link
Member

@suquark suquark commented May 24, 2020

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@suquark
Copy link
Member Author

suquark commented May 24, 2020

This is not a full update, but to see if the CI could run correctly.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26267/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26268/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26275/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26279/
Test PASSed.

@suquark suquark force-pushed the update_cloudpickle branch from dfa3bca to 527ec14 Compare May 24, 2020 23:40
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26283/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26291/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26336/
Test FAILed.

@suquark
Copy link
Member Author

suquark commented May 26, 2020

jenkins retest this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26385/
Test FAILed.

@suquark
Copy link
Member Author

suquark commented May 27, 2020

Ready for review. Some CI fails with E AttributeError: 'ActorHandle' object has no attribute '__ray_kill__'. But this also happens on the master branch. So I suppose this is not an issue caused by this PR.

@robertnishihara
Copy link
Collaborator

  1. Is this directly copying the cloudpickle files, or are there any modifications?
  2. We should also update the version in ray/cloudpickle/init.py. Should probably also leave a comment pointing to the precise commit that this is coming from.
  3. Is this coming from a specific cloudpickle commit or from a specific release?

@suquark
Copy link
Member Author

suquark commented May 27, 2020

@robertnishihara yes, there is modifications as we did before. It is from a recent release. Let me update init.py to include the release number.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26446/
Test FAILed.

@robertnishihara
Copy link
Collaborator

@robertnishihara yes, there is modifications as we did before. It is from a recent release. Let me update init.py to include the release number.

@suquark can you remind me what the modifications are? Would they make sense to commit as part of cloudpickle?

@suquark
Copy link
Member Author

suquark commented May 28, 2020

@robertnishihara the only difference is in the cloudpickle_fast.py. In cloudpickle, cloudpickle_fast.py only targets at python3.8+, so I add some code to support older python versions (especially numpy support for python3.5).

Later I would create a PR in the numpy upstream to enable pickle protocol5 support for python3.5, so we can remote the numpy polyfill.

BTW, someone is working on pickle5 support (thus earlier python support) for cloudpickle: cloudpipe/cloudpickle#370. I think we should wait it merged, so then we can remove most of our changes.

@suquark suquark force-pushed the update_cloudpickle branch from 9383f08 to abeceef Compare May 28, 2020 18:59
@suquark
Copy link
Member Author

suquark commented May 28, 2020

sync with master branch to retrigger CI builds

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26488/
Test FAILed.

@suquark
Copy link
Member Author

suquark commented May 28, 2020

Ready to review. CI errors are same with the master branch, so they may be not related to this PR.

@jakirkham
Copy link

FWIW I think PR ( cloudpipe/cloudpickle#370 ) is close, but it has a couple of test failures on 3.5 and 3.6 that I haven't quite worked out. Python 3.7 already works with those changes. It would be nice to get this to cross the line if we can. Any help that people here might be able to offer would be much appreciated 🙂

@suquark
Copy link
Member Author

suquark commented May 29, 2020

@jakirkham yes, I think we have figure out how to combine pickle5 protocol, pickle5-backport, and python3.5-3.8 properly in Ray for a while. After this PR is merged, I can try to commit these changes to the cloudpickle upstream.

@suquark suquark force-pushed the update_cloudpickle branch from abeceef to 1f7044c Compare May 29, 2020 19:19
@suquark
Copy link
Member Author

suquark commented May 29, 2020

(Rebase on master to get rid of some CI errors)

@jakirkham
Copy link

jakirkham commented May 29, 2020

@suquark when it comes to implementing pickling methods on classes (__getstate__, __reduce__, etc.), the easiest path is to use NumPy arrays as arguments since these already can be pickled with older protocols and already have the logic to support out-of-band pickling with the backport package or integrated pickle protocol 5 support. It's also possible coerce other bytes-like objects to memoryviews, which numpy.asarray can consume zero-copy.

From the cloudpickle side of things, my guess is we can optimize memoryviews, which cloudpickle already special cases anyways. We have discussed doing this in issue ( cloudpipe/cloudpickle#365 ). So this seems like a pretty easy path to go down.

As to the particular test failures that I was stuck on earlier, Pierre has already helped sort those out. So I think mainly what we need at this stage is reviews from others.

I can't comment too much on Ray's serialization logic as I'm less familiar. Though hopefully those ideas are useful. Please let us know if you have other thoughts 🙂

@pcmoritz
Copy link
Contributor

pcmoritz commented May 29, 2020

Diff to the upstream cloudpickle (v1.4.1):

cloudpickle_fast.py:

17a18,19
> import _pickle
> import pickle
22a25,26
> from _pickle import Pickler
> 
28d31
<     cell_set, _make_empty_cell,
31,41c34
< if sys.version_info[:2] < (3, 8):
<     import pickle5 as pickle
<     from pickle5 import Pickler
<     load, loads = pickle.load, pickle.loads
< else:
<     import _pickle
<     import pickle
<     from _pickle import Pickler
<     load, loads = _pickle.load, _pickle.loads
< 
< import numpy
---
> load, loads = _pickle.load, _pickle.loads
151,153d143
<         clsdict.pop('_abc_cache', None)
<         clsdict.pop('_abc_negative_cache', None)
<         clsdict.pop('_abc_negative_cache_version', None)
155,172c145,147
<         registry = clsdict.pop('_abc_registry', None)
<         if registry is None:
<             # in Python3.7+, the abc caches and registered subclasses of a
<             # class are bundled into the single _abc_impl attribute
<             if hasattr(abc, '_get_dump'):
<                 (registry, _, _, _) = abc._get_dump(obj)
<                 clsdict["_abc_impl"] = [subclass_weakref()
<                                         for subclass_weakref in registry]
<             else:
<                 # FIXME(suquark): The upstream cloudpickle cannot work in Ray
<                 # because sometimes both '_abc_registry' and '_get_dump' does
<                 # not exist. Some strange typing objects may cause this issue.
<                 # Here the workaround just set "_abc_impl" to None.
<                 clsdict["_abc_impl"] = None
<         else:
<             # In the above if clause, registry is a set of weakrefs -- in
<             # this case, registry is a WeakSet
<             clsdict["_abc_impl"] = [type_ for type_ in registry]
---
>         (registry, _, _, _) = abc._get_dump(obj)
>         clsdict["_abc_impl"] = [subclass_weakref()
>                                 for subclass_weakref in registry]
218,234c193,200
<     if hasattr(obj, "co_posonlyargcount"):  # pragma: no branch
<         args = (
<                 obj.co_argcount, obj.co_posonlyargcount,
<                 obj.co_kwonlyargcount, obj.co_nlocals, obj.co_stacksize,
<                 obj.co_flags, obj.co_code, obj.co_consts, obj.co_names,
<                 obj.co_varnames, obj.co_filename, obj.co_name,
<                 obj.co_firstlineno, obj.co_lnotab, obj.co_freevars,
<                 obj.co_cellvars
<             )
<     else:
<         args = (
<             obj.co_argcount, obj.co_kwonlyargcount, obj.co_nlocals,
<             obj.co_stacksize, obj.co_flags, obj.co_code, obj.co_consts,
<             obj.co_names, obj.co_varnames, obj.co_filename,
<             obj.co_name, obj.co_firstlineno, obj.co_lnotab,
<             obj.co_freevars, obj.co_cellvars
<         )
---
>     args = (
>         obj.co_argcount, obj.co_posonlyargcount,
>         obj.co_kwonlyargcount, obj.co_nlocals, obj.co_stacksize,
>         obj.co_flags, obj.co_code, obj.co_consts, obj.co_names,
>         obj.co_varnames, obj.co_filename, obj.co_name,
>         obj.co_firstlineno, obj.co_lnotab, obj.co_freevars,
>         obj.co_cellvars
>     )
238,243d203
< def _make_cell(contents):
<     cell = _make_empty_cell()
<     cell_set(cell, contents)
<     return cell
< 
< 
247c207
<         contents = (obj.cell_contents,)
---
>         obj.cell_contents
249,255c209
<         contents = ()
< 
<     if sys.version_info[:2] < (3, 8):
<         if contents:
<             return _make_cell, contents
<         else:
<             return _make_empty_cell, ()
---
>         return types.CellType, ()
257c211
<         return types.CellType, contents
---
>         return types.CellType, (obj.cell_contents,)
421c375
<             cell_set(obj.__closure__[i], value)
---
>             obj.__closure__[i].cell_contents = value
442,486d395
< def _numpy_frombuffer(buffer, dtype, shape, order):
<     # Get the _frombuffer() function for reconstruction
<     from numpy.core.numeric import _frombuffer
<     array = _frombuffer(buffer, dtype, shape, order)
<     # Unfortunately, numpy does not follow the standard, so we still
<     # have to set the readonly flag for it here.
<     array.setflags(write=isinstance(buffer, bytearray) or not buffer.readonly)
<     return array
< 
< 
< def _numpy_ndarray_reduce(array):
<     # This function is implemented according to 'array_reduce_ex_picklebuffer'
<     # in numpy C backend. This is a workaround for python3.5 pickling support.
<     if sys.version_info >= (3, 8):
<         import pickle
<         picklebuf_class = pickle.PickleBuffer
<     elif sys.version_info >= (3, 5):
<         try:
<             import pickle5
<             picklebuf_class = pickle5.PickleBuffer
<         except Exception:
<             raise ImportError("Using pickle protocol 5 requires the pickle5 "
<                               "module for Python >=3.5 and <3.8")
<     else:
<         raise ValueError("pickle protocol 5 is not available for Python < 3.5")
<     # if the array if Fortran-contiguous and not C-contiguous,
<     # the PickleBuffer instance will hold a view on the transpose
<     # of the initial array, that is C-contiguous.
<     if not array.flags.c_contiguous and array.flags.f_contiguous:
<         order = "F"
<         picklebuf_args = array.transpose()
<     else:
<         order = "C"
<         picklebuf_args = array
<     try:
<         buffer = picklebuf_class(picklebuf_args)
<     except Exception:
<         # Some arrays may refuse to export a buffer, in which case
<         # just fall back on regular __reduce_ex__ implementation
<         # (gh-12745).
<         return array.__reduce__()
< 
<     return _numpy_frombuffer, (buffer, array.dtype, array.shape, order)
< 
< 
510,513c419
<     if sys.version_info[:2] >= (3, 8):
<         dispatch[types.CellType] = _cell_reduce
<     else:
<         dispatch[type(_make_empty_cell())] = _cell_reduce
---
>     dispatch[types.CellType] = _cell_reduce
567,576d472
< 
<         # This is a patch for python3.5
<         if isinstance(obj, numpy.ndarray):
<             if (self.proto < 5 or
<                     (not obj.flags.c_contiguous and not obj.flags.f_contiguous) or
<                     (issubclass(type(obj), numpy.ndarray) and type(obj) is not numpy.ndarray) or
<                     obj.dtype == "O" or obj.itemsize == 0):
<                 return NotImplemented
<             return _numpy_ndarray_reduce(obj)
< 
645,650c541,542
<             if sys.version_info[:2] >= (3, 8):
<                 closure = tuple(
<                     types.CellType() for _ in range(len(code.co_freevars)))
<             else:
<                 closure = tuple(
<                     _make_empty_cell() for _ in range(len(code.co_freevars)))
---
>             closure = tuple(
>                 types.CellType() for _ in range(len(code.co_freevars)))

@pcmoritz
Copy link
Contributor

pcmoritz commented May 29, 2020

EDIT: I made a mistake, it looks good, see above diff :)

@pcmoritz pcmoritz merged commit ebea5c4 into ray-project:master May 30, 2020
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/26530/
Test FAILed.

@suquark suquark deleted the update_cloudpickle branch July 9, 2020 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants