-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DatasetGroupBy.quantile #3527
Conversation
is |
Let's do enhancements! |
that would open a new category. Should I use |
👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @keewis !
I left some minor comments. The transpose
is surprising. It would be good to track that down to either .map
or Groupby._combine
a0d5afe
to
5f9ca06
Compare
the difference is here, where in |
LGTM @keewis. Re: docstring. If you jave time, can you add a couple of examples (just pick two from the tests) to illustrate the scalar vs vector behaviour of |
sure. But as Also, what will we do re |
👍 The transpose is fine, I think. It looks intentional and must be consistent with other Dataset/DataArray operations? |
well, it was surprising to me. It also might be problematic that it matters if we first pull out a >>> ds = xr.Dataset(
... data_vars={
... "a": (
... ("x", "y"),
... [[1, 11, 26], [2, 12, 22], [3, 13, 23],^I[4, 16,^I24], [5, 15, 25]],
... )
... },
... coords={"x": [1, 1, 1, 2, 2], "y": [0, 0, 1]},
... )
>>> a_ds = ds.groupby("y").quantile(0).a
>>> a_da = ds.a.groupby("y").quantile(0)
>>> a_ds.identical(a_da)
False
>>> a_ds.transpose().identical(a_da)
True |
Hmm.. ok calling mean in this example passes the identical check so maybe there's something funny in quantile:
|
but that's only because |
I can also reproduce this with Edit: I'm fairly certain this is due to what I mentioned in #3527 (comment) |
Which behaviour can you reproduce? Replacing
|
Dimension reordering has come up a few times before (#1739, and others that I can't immediately find) I think we should probably have a general solution & standard behavior here. Others should weigh in (CC @shoyer), but I would vote to not worry too much about it here, and prioritize simplicity and consistency with past behavior until we decide on a more general solution. |
It was not entirely obvious to me how to generalize "restoring dimension order" to Dataset. On a Dataset, it is not entirely clear which variable dimension order should be copied from. Maybe each variable in the result should have dimension order copied from the variable with the same name in the original dataset? But then what about new variables? I agree @max-sixty, let's not worry about this too much for now. |
In it goes. Great work as usual, @keewis. Thanks! |
ds = xr.Dataset(
data_vars={
"a": (
("x", "y"),
[[1, 11, 26], [2, 12, 22], [3, 13, 23],
[4, 16,24], [5, 15, 25]],
)
},
coords={"x": [1, 1, 1, 2, 2], "y": [0, 0, 1]},
).expand_dims({"z": [0, 1, 1, 2, 2]})
a_ds = ds.groupby("y").std().a
a_da = ds.a.groupby("y").std()
a_ds.identical(a_da) which seems to be the same as a_ds = ds.groupby("y").mean("y").a
a_da = ds.a.groupby("y").mean("y")
a_ds.identical(a_da) My impression is that this puts the dimension grouped over to the front:
But I agree this discussion does not really belong here. So, thanks all! |
* upstream/master: Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) Allow appending datetime & boolean variables to zarr stores (pydata#3504) warn if dim is passed to rolling operations. (pydata#3513) Deprecate allow_lazy (pydata#3435) Recursive tokenization (pydata#3515)
* upstream/master: (22 commits) Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) Allow appending datetime & boolean variables to zarr stores (pydata#3504) warn if dim is passed to rolling operations. (pydata#3513) Deprecate allow_lazy (pydata#3435) Recursive tokenization (pydata#3515) format indexing.rst code with black (pydata#3511) add missing pint integration tests (pydata#3508) DOC: update bottleneck repo url (pydata#3507) add drop_sel, drop_vars, map to api.rst (pydata#3506) remove syntax warning (pydata#3505) ...
* master: (24 commits) Tweaks to release instructions (pydata#3555) Clarify conda environments for new contributors (pydata#3551) Revert to dev version 0.14.1 whatsnew (pydata#3547) sparse option to reindex and unstack (pydata#3542) Silence sphinx warnings (pydata#3516) Numpy 1.18 support (pydata#3537) tweak whats-new. (pydata#3540) small simplification of rename from pydata#3532 (pydata#3539) Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) ...
This adds
DatasetGroupBy.quantile
by movingDataArrayGroupBy.quantile
toGroupBy
as proposed in #3018 (comment).The tests are a modified copy of the ones from #2828. What confuses me is that
expected_yy
intest_ds_groupby_quantile
needs thetranspose
whereas the equivalent intest_da_groupby_quantile
doesn't. Does anyone have an idea about why that is?black . && mypy . && flake8
whats-new.rst
for all changes andapi.rst
for new API