Add DatasetGroupBy.quantile #3527

keewis · 2019-11-13T23:03:07Z

This adds DatasetGroupBy.quantile by moving DataArrayGroupBy.quantile to GroupBy as proposed in #3018 (comment).

The tests are a modified copy of the ones from #2828. What confuses me is that expected_yy in test_ds_groupby_quantile needs the transpose whereas the equivalent in test_da_groupby_quantile doesn't. Does anyone have an idea about why that is?

Closes DatasetGroupBy does not implement quantile #3525
Tests added
Passes black . && mypy . && flake8
Fully documented, including whats-new.rst for all changes and api.rst for new API

keewis · 2019-11-13T23:10:34Z

is Bug Fixes the correct section for this?

dcherian · 2019-11-13T23:19:50Z

Let's do enhancements!

keewis · 2019-11-13T23:21:23Z

that would open a new category. Should I use New Features?

dcherian · 2019-11-13T23:21:57Z

👍

dcherian

Thanks @keewis !

I left some minor comments. The transpose is surprising. It would be good to track that down to either .map or Groupby._combine

xarray/core/groupby.py

keewis · 2019-11-14T17:31:58Z

the difference is here, where in DataArrayGroupBy the original dimension order is restored. DatasetGroupBy does not do that, so the test requires the transpose

dcherian · 2019-11-15T14:59:21Z

LGTM @keewis.

Re: docstring. If you jave time, can you add a couple of examples (just pick two from the tests) to illustrate the scalar vs vector behaviour of q?

keewis · 2019-11-15T15:05:34Z

sure. But as Dataset.quantile and DataArray.quantile don't have examples either, I think it might be worth to do that in a new PR?

Also, what will we do re transpose?

dcherian · 2019-11-15T15:07:55Z

👍

The transpose is fine, I think. It looks intentional and must be consistent with other Dataset/DataArray operations?

keewis · 2019-11-15T15:29:53Z

well, it was surprising to me. It also might be problematic that it matters if we first pull out a DataArray and then compute or the other way around:

>>> ds = xr.Dataset(
...     data_vars={
...         "a": (
...             ("x", "y"),
...             [[1, 11, 26], [2, 12, 22], [3, 13, 23],^I[4, 16,^I24], [5, 15, 25]],
...         )
...     },
...     coords={"x": [1, 1, 1, 2, 2], "y": [0, 0, 1]},
... )
>>> a_ds = ds.groupby("y").quantile(0).a
>>> a_da = ds.a.groupby("y").quantile(0)
>>> a_ds.identical(a_da)
False
>>> a_ds.transpose().identical(a_da)
True

dcherian · 2019-11-15T16:21:37Z

Hmm.. ok calling mean in this example passes the identical check so maybe there's something funny in quantile:

ds = xr.Dataset(
    data_vars={
        "a": (
            ("x", "y"),
            [[1, 11, 26], [2, 12, 22], [3, 13, 23],
             [4, 16,24], [5, 15, 25]],
        )
    },
    coords={"x": [1, 1, 1, 2, 2], "y": [0, 0, 1]},
).expand_dims({"z": [0,1,2]})
a_ds = ds.groupby("y").mean("x").a
a_da = ds.a.groupby("y").mean("x")
a_ds.identical(a_da)

keewis · 2019-11-15T16:22:41Z

but that's only because mean removes x, doesn't it?

keewis · 2019-11-15T16:24:19Z

I can also reproduce this with std

Edit: I'm fairly certain this is due to what I mentioned in #3527 (comment)

dcherian · 2019-11-15T18:39:44Z

I can also reproduce this with std

Which behaviour can you reproduce?

Replacing mean with std in my last example passes the identical check. Actually this works too

a_ds = ds.groupby("y").apply(lambda x: x).a
a_da = ds.a.groupby("y").apply(lambda x: x)
a_ds.identical(a_da)

max-sixty · 2019-11-15T19:36:50Z

Dimension reordering has come up a few times before (#1739, and others that I can't immediately find)

I think we should probably have a general solution & standard behavior here. Others should weigh in (CC @shoyer), but I would vote to not worry too much about it here, and prioritize simplicity and consistency with past behavior until we decide on a more general solution.

shoyer · 2019-11-15T19:41:41Z

the difference is here, where in DataArrayGroupBy the original dimension order is restored. DatasetGroupBy does not do that, so the test requires the transpose

It was not entirely obvious to me how to generalize "restoring dimension order" to Dataset. On a Dataset, it is not entirely clear which variable dimension order should be copied from. Maybe each variable in the result should have dimension order copied from the variable with the same name in the original dataset? But then what about new variables?

I agree @max-sixty, let's not worry about this too much for now.

dcherian · 2019-11-15T19:57:57Z

In it goes. Great work as usual, @keewis. Thanks!

keewis · 2019-11-15T23:55:14Z

Which behaviour can you reproduce?

ds = xr.Dataset(
    data_vars={
        "a": (
            ("x", "y"),
            [[1, 11, 26], [2, 12, 22], [3, 13, 23],
             [4, 16,24], [5, 15, 25]],
        )
    },
    coords={"x": [1, 1, 1, 2, 2], "y": [0, 0, 1]},
).expand_dims({"z": [0, 1, 1, 2, 2]})
a_ds = ds.groupby("y").std().a
a_da = ds.a.groupby("y").std()
a_ds.identical(a_da)

which seems to be the same as

a_ds = ds.groupby("y").mean("y").a
a_da = ds.a.groupby("y").mean("y")
a_ds.identical(a_da)

My impression is that this puts the dimension grouped over to the front:

before: ds.a.dims == ("z", "x", "y")
afterwards: a_ds.dims == ("y", "z", "x")

But I agree this discussion does not really belong here. So, thanks all!

* upstream/master: Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) Allow appending datetime & boolean variables to zarr stores (pydata#3504) warn if dim is passed to rolling operations. (pydata#3513) Deprecate allow_lazy (pydata#3435) Recursive tokenization (pydata#3515)

* upstream/master: (22 commits) Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) Allow appending datetime & boolean variables to zarr stores (pydata#3504) warn if dim is passed to rolling operations. (pydata#3513) Deprecate allow_lazy (pydata#3435) Recursive tokenization (pydata#3515) format indexing.rst code with black (pydata#3511) add missing pint integration tests (pydata#3508) DOC: update bottleneck repo url (pydata#3507) add drop_sel, drop_vars, map to api.rst (pydata#3506) remove syntax warning (pydata#3505) ...

* master: (24 commits) Tweaks to release instructions (pydata#3555) Clarify conda environments for new contributors (pydata#3551) Revert to dev version 0.14.1 whatsnew (pydata#3547) sparse option to reindex and unstack (pydata#3542) Silence sphinx warnings (pydata#3516) Numpy 1.18 support (pydata#3537) tweak whats-new. (pydata#3540) small simplification of rename from pydata#3532 (pydata#3539) Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) ...

keewis added 3 commits November 13, 2019 23:33

move the implementation of DataArrayGroupBy.quantile to GroupBy

da57caf

add tests for DatasetGroupBy

5b22514

update whats-new.rst

dc4846f

move the item in whats-new.rst into New Features

5c441c9

dcherian reviewed Nov 14, 2019

View reviewed changes

xarray/core/groupby.py Show resolved Hide resolved

xarray/core/groupby.py Outdated Show resolved Hide resolved

don't drop scalar quantile coords

5f9ca06

keewis force-pushed the fix-datasetgroupby-quantile branch from a0d5afe to 5f9ca06 Compare November 14, 2019 16:26

dcherian approved these changes Nov 15, 2019

View reviewed changes

dcherian merged commit 52d4845 into pydata:master Nov 15, 2019

keewis deleted the fix-datasetgroupby-quantile branch November 15, 2019 23:15

keewis mentioned this pull request Nov 27, 2019

Examples for quantile #3576

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DatasetGroupBy.quantile #3527

Add DatasetGroupBy.quantile #3527

keewis commented Nov 13, 2019 •

edited

Loading

keewis commented Nov 13, 2019

dcherian commented Nov 13, 2019

keewis commented Nov 13, 2019

dcherian commented Nov 13, 2019

dcherian left a comment

keewis commented Nov 14, 2019 •

edited

Loading

dcherian commented Nov 15, 2019

keewis commented Nov 15, 2019

dcherian commented Nov 15, 2019

keewis commented Nov 15, 2019

dcherian commented Nov 15, 2019

keewis commented Nov 15, 2019

keewis commented Nov 15, 2019 •

edited

Loading

dcherian commented Nov 15, 2019

max-sixty commented Nov 15, 2019

shoyer commented Nov 15, 2019

dcherian commented Nov 15, 2019

keewis commented Nov 15, 2019

Add DatasetGroupBy.quantile #3527

Add DatasetGroupBy.quantile #3527

Conversation

keewis commented Nov 13, 2019 • edited Loading

keewis commented Nov 13, 2019

dcherian commented Nov 13, 2019

keewis commented Nov 13, 2019

dcherian commented Nov 13, 2019

dcherian left a comment

Choose a reason for hiding this comment

keewis commented Nov 14, 2019 • edited Loading

dcherian commented Nov 15, 2019

keewis commented Nov 15, 2019

dcherian commented Nov 15, 2019

keewis commented Nov 15, 2019

dcherian commented Nov 15, 2019

keewis commented Nov 15, 2019

keewis commented Nov 15, 2019 • edited Loading

dcherian commented Nov 15, 2019

max-sixty commented Nov 15, 2019

shoyer commented Nov 15, 2019

dcherian commented Nov 15, 2019

keewis commented Nov 15, 2019

keewis commented Nov 13, 2019 •

edited

Loading

keewis commented Nov 14, 2019 •

edited

Loading

keewis commented Nov 15, 2019 •

edited

Loading