-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple dimensions in DataArray.argmin() and DataArray.argmax() methods #3936
Conversation
These return dicts of the indices of the minimum or maximum of a DataArray over several dimensions.
This looks really comprehensive, thank you! Before doing a really careful review here, I'd like to try to work out the full API design we want. I'll write out some of my thoughts here, but your thoughts would also be very welcome! Here's my summary of the current situation:
This PR implements the multidimensional equivalent of My first concern is about the name: it isn't obvious to me whether Another option would be to overload
I think I like this last option best but I would be curious what others think! @pydata/xarray any thoughts on this? |
@shoyer I think your last option sounds good. Questions:
|
Maybe worth noting, at the moment if you try to call
|
+1 for overloading I also really like how neat this resultant property is da.isel(da.argmin(list_of_dim)) == da.min(list_of_dim) we could even use a hypothesis test to check it...
Although it's breaking and would require a deprecation cycle, I think this is what we should aim for.
Yes let's take the time to make that clearer for users - this will be a commonly-used function. |
+1 for making According to the current docstring it should already work that way for Reduce this DataArray’s data by applying argmin along some dimension(s). Returns: New DataArray/Dataset object with But this behaviour is broken currently (works only for one given dim). My main concern for changing the API as suggested above is, how should we discern (at least for
|
That's a good question @kmuehlbauer, and the distinction probably needs to be clearer in the docs in general.
By this do you mean find the minimum as if the array were first (partially or totally) flattened along the given dims somehow? I'm not sure we provide that kind of behaviour anywhere in the current API. |
@TomNicholas Probably I was bit confused by the current docstring. I think I understand now and there should be no problem at all. |
@johnomotani FYI: For #3871 (merged) there is #3922 (yet unmerged) to fix dask-handling. |
When argmin or argmax are called with a sequence for 'dim', they now return a dict with the indices for each dimension in dim.
If single dim is passed to Dataset.argmin() or Dataset.argmax(), then pass through to _argmin_base or _argmax_base. If a sequence is passed for dim, raise an exception, because the result for each DataArray would be a dict, which cannot be stored in a Dataset.
The basic numpy-style argmin() and argmax() methods were renamed when adding support for handling multiple dimensions in DataArray.argmin() and DataArray.argmax(). Variable.argmin() and Variable.argmax() are therefore renamed as Variable._argmin_base() and Variable._argmax_base().
ef7b181
to
70aaa9d
Compare
I've updated so the new functionality is provided by
|
The rename to |
These test failures seem to have uncovered a larger issue: overriding a method injected by |
Also, feel free to rewrite to avoid using inject_all_ops_and_reduce_methods() for argmin/argmax at all. Method injection is pretty hacky, and generally not worthwhile, e.g., see this note about removing it. |
If a method (such as 'argmin') has been explicitly defined on a class (so that hasattr(cls, "argmin")==True), then do not inject that method, as it would override the explicitly defined one. Instead inject a private method, prefixed by "_injected_" (such as '_injected_argmin'), so that the injected method is available to the explicitly defined one. Do not perform the hasattr check on binary ops, because this breaks some operations (e.g. addition between DataArray and int in test_dask.py).
Now not needed because of change to injection in ops.py.
Merge conflicts fixed, this PR should be ready to review/merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beautiful work!
I am so excited about getting this into xarray. We are now one short step away from getting a canonical answer to the most frequently asked question about xarray on StackOverflow! https://stackoverflow.com/questions/40179593/how-to-get-the-coordinates-of-the-maximum-in-xarray
xarray/core/dataarray.py
Outdated
def argmin( | ||
self, | ||
dim: Union[Hashable, Sequence[Hashable]] = None, | ||
axis: Union[int, None] = None, | ||
keep_attrs: bool = None, | ||
skipna: bool = None, | ||
) -> Union["DataArray", Dict[Hashable, "DataArray"]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, it would be nice to use either TypeVar
or @overload
so tools like mypy can find out the type of the the return value from argmin()
based on the type of dim
.
But definitely don't worry about that now, we can save that for a follow-up :)
I'm going to wait a little while before merging in case anyone else has comment, but otherwise will merge in a day or two (definitely before the next release) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @johnomotani This is great!
I have some minor comments below that should be easy to fix.
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Pass an explicit axis or dim argument instead to avoid the warning.
Prefer to pass reduce_dims=None when possible, including for variables with only one dimension. Avoids an error if an 'axis' keyword was passed.
e1dda05
to
a07ce29
Compare
I noticed a few issues while debugging:
|
I think this is a bug. |
well, we'd need to somehow be able to use I think no further changes to the |
Some missing values should be OK. In your example though for example
So there are no values in the array slice that argmin should be applied over, which is an error. I guess we could add some special handling for this (not sure what though, because we can't set a variable with type |
Thanks @keewis! I think we've addressed all the review comments now. |
Co-authored-by: keewis <keewis@users.noreply.github.com>
Thanks @johnomotani . This is a significant contribution. |
Thanks @dcherian 😄 You're very welcome! |
* upstream/master: (21 commits) fix typo in error message in plot.py (pydata#4188) Support multiple dimensions in DataArray.argmin() and DataArray.argmax() methods (pydata#3936) Show data by default in HTML repr for DataArray (pydata#4182) Blackdoc (pydata#4177) Add CONTRIBUTING.md for the benefit of GitHub Correct dask handling for 1D idxmax/min on ND data (pydata#4135) use assert_allclose in the aggregation-with-units tests (pydata#4174) Remove old auto combine (pydata#3926) Fix 4009 (pydata#4173) Limit length of dataarray reprs (pydata#3905) Remove <pre> from nested HTML repr (pydata#4171) Proposal for better error message about in-place operation (pydata#3976) use builtin python types instead of the numpy alias (pydata#4170) Revise pull request template (pydata#4039) pint support for Dataset (pydata#3975) drop eccodes in docs (pydata#4162) Update issue templates inspired/based on dask (pydata#4154) Fix failing upstream-dev build & remove docs build (pydata#4160) Improve typehints of xr.Dataset.__getitem__ (pydata#4144) provide a error summary for assert_allclose (pydata#3847) ...
These return dicts of the indices of the minimum or maximum of a DataArray over several dimensions. Inspired by @fujiisoup's work in #1469. With #3871, replaces #1469. Provides a simpler solution to #3160.
Implemented so that
isort -rc . && black . && mypy . && flake8
whats-new.rst
for all changes andapi.rst
for new API