Skip to content

Commit

Permalink
Fix wrong unpickling with dask 2024.11 (#1993)
Browse files Browse the repository at this point in the history
### What kind of change does this PR introduce?
The issue introduced with dask 2024.11 concerned the passing of
arguments through the dask graph. Most SDBA classes are children of
`Parametrizable`, itself a child of `dict`. When stored in the graph,
objects like this are pickled. and then repickled when the function is
actually executed.

For some reason, the repickling stopped working and objects where
reinstated as dictionaries instead of their subclasses. This
particularly touched the `Grouper`, one of the only such objects
actually been passed through dask-backed computations.

Inheriting from `dict` was once not recommended but I think this is not
the case anymore. Anyway, I found that inheriting from
`collections.UserDict` actually solved the issue (after adapting the
internals). Not sure if I should raise the issue back to dask...

### Does this PR introduce a breaking change?
I changed how all classes of `xclim.sdba` are made, but in theory the
change is at a low, private level.
  • Loading branch information
Zeitsperre authored Nov 12, 2024
2 parents a3c1df7 + 9acce5e commit 445aefb
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 13 deletions.
8 changes: 6 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,15 @@ Changelog

v0.54.0 (unreleased)
--------------------
Contributors to this version: Trevor James Smith (:user:`Zeitsperre`),
Contributors to this version: Trevor James Smith (:user:`Zeitsperre`), Pascal Bourgault (:user:`aulemahal`).

Breaking changes
----------------
* The minimum required version of `dask` has been increased to `2024.8.1`. `dask` versions at or above `2024.11` are not yet supported. (:issue:`1992`, :pull:`1991`).
* The minimum required version of `dask` has been increased to `2024.8.1`. (:issue:`1992`, :pull:`1991`).

Bug fixes
^^^^^^^^^
* Fixed pickling issue with ``xclim.sdba.Grouper`` and other classes for usage with `dask>=2024.11`. (:issue:`1992`, :pull:`1993`).

Internal changes
^^^^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dependencies:
- cf_xarray >=0.9.3
- cftime >=1.4.1
- click >=8.1
- dask >=2024.8.1,<2024.11.0
- dask >=2024.8.1
- filelock >=3.14.0
- jsonpickle >=3.1.0
- numba >=0.54.1
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ dependencies = [
"cf-xarray >=0.9.3", # cf-xarray is differently named on conda-forge
"cftime >=1.4.1",
"click >=8.1",
"dask[array] >=2024.8.1,<2024.11.0",
"dask[array] >=2024.8.1",
"filelock >=3.14.0",
"jsonpickle >=3.1.0",
"numba >=0.54.1",
Expand Down
18 changes: 9 additions & 9 deletions xclim/sdba/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

from __future__ import annotations

from collections import UserDict
from collections.abc import Callable, Sequence
from inspect import _empty, signature # noqa

Expand All @@ -20,7 +21,7 @@


# ## Base class for the sdba module
class Parametrizable(dict):
class Parametrizable(UserDict):
"""Helper base class resembling a dictionary.
This object is _completely_ defined by the content of its internal dictionary, accessible through item access
Expand All @@ -35,24 +36,23 @@ class Parametrizable(dict):

def __getstate__(self):
"""For (json)pickle, a Parametrizable should be defined by its internal dict only."""
return self.parameters
return self.data

def __setstate__(self, state):
"""For (json)pickle, a Parametrizable in only defined by its internal dict."""
self.update(state)
# Unpickling skips the init, so we must _set_ data, we can't just update it - it's not there yet
self.data = {**state}

def __getattr__(self, attr):
"""Get attributes."""
try:
return self.__getitem__(attr)
except KeyError as err:
# Raise the proper error type for getattr
raise AttributeError(*err.args) from err
if attr == "data" or attr not in self.data:
return self.__getattribute__(attr)
return self.data[attr]

@property
def parameters(self) -> dict:
"""All parameters as a dictionary. Read-only."""
return {**self}
return {**self.data}

def __repr__(self) -> str:
"""Return a string representation."""
Expand Down

0 comments on commit 445aefb

Please sign in to comment.