Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cumsum to DatasetGroupBy #6525

Merged
merged 17 commits into from
Jul 20, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -736,6 +736,7 @@ Dataset
DatasetGroupBy.all
DatasetGroupBy.any
DatasetGroupBy.count
DatasetGroupBy.cumsum
DatasetGroupBy.max
DatasetGroupBy.mean
DatasetGroupBy.median
Expand Down Expand Up @@ -765,6 +766,7 @@ DataArray
DataArrayGroupBy.all
DataArrayGroupBy.any
DataArrayGroupBy.count
DataArrayGroupBy.cumsum
DataArrayGroupBy.max
DataArrayGroupBy.mean
DataArrayGroupBy.median
Expand Down
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,10 @@ New Features
- Allow passing chunks in ``**kwargs`` form to :py:meth:`Dataset.chunk`, :py:meth:`DataArray.chunk`, and
:py:meth:`Variable.chunk`. (:pull:`6471`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.
- Add :py:meth:`core.groupby.DatasetGroupBy.cumsum` and :py:meth:`core.groupby.DataArrayGroupBy.cumsum`.
By `Vladislav Skripniuk <https://github.com/VladSkripniuk>`_ and `Deepak Cherian <https://github.com/dcherian>`_. (:pull:`3147`, :pull:`6525`, :issue:`3141`)
- Expose `inline_array` kwarg from `dask.array.from_array` in :py:func:`open_dataset`, :py:meth:`Dataset.chunk`,
:py:meth:`DataArray.chunk`, and :py:meth:`Variable.chunk`. (:pull:`6471`)
- Expose the ``inline_array`` kwarg from :py:func:`dask.array.from_array` in :py:func:`open_dataset`,
:py:meth:`Dataset.chunk`, :py:meth:`DataArray.chunk`, and :py:meth:`Variable.chunk`. (:pull:`6471`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.
Expand Down
16 changes: 14 additions & 2 deletions xarray/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,12 @@
from . import dtypes, duck_array_ops, nputils, ops
from ._reductions import DataArrayGroupByReductions, DatasetGroupByReductions
from .arithmetic import DataArrayGroupbyArithmetic, DatasetGroupbyArithmetic
from .common import ImplementsArrayReduce, ImplementsDatasetReduce
from .concat import concat
from .formatting import format_array_flat
from .indexes import create_default_index_implicit, filter_indexes_from_coords
from .npcompat import QUANTILE_METHODS, ArrayLike
from .ops import IncludeCumMethods
from .options import _get_keep_attrs
from .pycompat import integer_types
from .types import T_Xarray
Expand Down Expand Up @@ -1192,7 +1194,12 @@ def reduce_array(ar: DataArray) -> DataArray:


# https://github.com/python/mypy/issues/9031
class DataArrayGroupBy(DataArrayGroupByBase, DataArrayGroupByReductions): # type: ignore[misc]
class DataArrayGroupBy(
DataArrayGroupByBase,
DataArrayGroupByReductions,
ImplementsArrayReduce,
IncludeCumMethods,
): # type: ignore[misc]
Illviljan marked this conversation as resolved.
Show resolved Hide resolved
__slots__ = ()


Expand Down Expand Up @@ -1346,5 +1353,10 @@ def assign(self, **kwargs: Any) -> Dataset:


# https://github.com/python/mypy/issues/9031
class DatasetGroupBy(DatasetGroupByBase, DatasetGroupByReductions): # type: ignore[misc]
class DatasetGroupBy(
DatasetGroupByBase,
DatasetGroupByReductions,
ImplementsDatasetReduce,
IncludeCumMethods,
): # type: ignore[misc]
Illviljan marked this conversation as resolved.
Show resolved Hide resolved
__slots__ = ()
28 changes: 28 additions & 0 deletions xarray/tests/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1990,3 +1990,31 @@ def func(arg1, arg2, arg3=0.0):
expected = xr.Dataset({"foo": ("time", [3.0, 3.0, 3.0]), "time": times})
actual = ds.resample(time="D").map(func, args=(1.0,), arg3=1.0)
assert_identical(expected, actual)


def test_groupby_cumsum():
dcherian marked this conversation as resolved.
Show resolved Hide resolved
ds = xr.Dataset(
{"foo": (("x",), [7, 3, 1, 1, 1, 1, 1])},
coords={"x": [0, 1, 2, 3, 4, 5, 6], "group_id": ("x", [0, 0, 1, 1, 2, 2, 2])},
)
actual = ds.groupby("group_id").cumsum(dim="x")
Illviljan marked this conversation as resolved.
Show resolved Hide resolved
expected = xr.Dataset(
{
"foo": (("x",), [7, 10, 1, 2, 1, 2, 3]),
},
coords={
"x": [0, 1, 2, 3, 4, 5, 6],
"group_id": ds.group_id,
},
)
# TODO: Remove drop_vars when GH6528 is fixed
# when Dataset.cumsum propagates indexes, and the group variable?
assert_identical(expected.drop_vars(["x", "group_id"]), actual)

actual = ds.foo.groupby("group_id").cumsum(dim="x")
expected.coords["group_id"] = ds.group_id
expected.coords["x"] = np.arange(7)
assert_identical(expected.foo, actual)


# TODO: move other groupby tests from test_dataset and test_dataarray over here