-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slice using non-index coordinates #2028
Comments
I agree this is harder that it should be. Here's one way: In [28]: a.where(a.currency=='EUR', drop=True)
Out[28]:
<xarray.DataArray (country: 2)>
array([20., 30.])
Coordinates:
* country (country) <U7 'Germany' 'France'
currency (country) <U3 'EUR' 'EUR' I'm not sure whether |
we're discussed this before: #934 I agree that this would be nice to support in theory. The challenge is that we would need to create (and then possibly throw away?) a pandas.Index do to the actual indexing, or use a numpy search function like Conceptually, I think it makes sense to support indexing on arbitrary variables, which is simply more expensive if an index is not already set. Dimension coordinates would not be special except that they have indexes created automatically. |
This has some connections to the broader indexes refactor envisioned in #1603. |
What's the easiest way to select on multiple values? Is it really this: In [63]: da = xr.DataArray(np.random.rand(3,2), dims=list('ab'), coords={'c':(('a',),list('xyz'))})
In [64]: da.sel(a=(np.isin(da.c, list('xy'))))
Out[64]:
<xarray.DataArray (a: 2, b: 2)>
array([[0.383989, 0.174317],
[0.698948, 0.815993]])
Coordinates:
c (a) <U1 'x' 'y'
Dimensions without coordinates: a, b |
@maxim-lian Probably. Or you could make the We should really add |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
Still relevant |
I am a little confused about the documentation relating to this issue. It says in the documentation at http://xarray.pydata.org/en/stable/data-structures.html#coordinates Is this an issue that has been resolved, and if so an example on how to do this would be helpful in the documentation. If not, should the documentation be corrected? |
#3925 would fix this for 1D non-dim coords. We should update the docs (ping @TomNicholas) |
@dcherian any recoomendations for 2D non-dim coords? I would like to subset a dataarray based on slices for |
xoak should work here: https://xoak.readthedocs.io/en/latest/ Here's an example with ocean model output: https://pop-tools.readthedocs.io/en/latest/examples/xoak-example.html . If you can wait a while, this will all work better once #5692 is merged. |
@max-sixty , perhaps there is any update on OPs question or maybe you can help to achieve the following? # sel based on a non-dim coordinate
# (using this coordinate directly .sel(product_id=26) would result in error "'no index found for coordinate product_id")
%timeit xds.sel(product=xds.product_id==26)
1.54 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# sel based on the dim itself
%timeit xds.sel(product='GN91 Glove Medium')
499 µs ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit xds.where(xds.product_id==26, drop=True)
4.17 ms ± 39 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Anyways, xarray is brilliant and made my week :) |
Hi all, wanted to ask what the status of this feature request is given all of the recent work by @benbovy on explicit indexes. |
Exciting news!! Thanks for the quick response and the huge amount of work on explicit indexes. I'll be excited and grateful to enjoy the public API once it comes into its own :) |
With the last release v2022.09.0, this is now possible via a = a.set_xindex("currency")
a.sel(currency="EUR")
# <xarray.DataArray (country: 2)>
# array([20, 30])
# Coordinates:
# * country (country) <U7 'Germany' 'France'
# * currency (country) <U3 'EUR' 'EUR' Closed in #6971 (although |
What about slices? My non-index coord is a datetime, and I need to select between two dates. |
@aberges-grd If your non-index coordinate supports it (I guess it does?), you could assign a default index to the coordinate with |
Thanks @benbovy, it works well. I am curious about using set_xindex with 2-dimensional non-index coordinates. A use case could be datasets with |
@gewitterblitz there is a kdtree-based index example in #7041 that works with multi-dimensional coordinates. You could also have a look at https://xoak.readthedocs.io/en/latest/ (it doesn't use Xarray indexes - soon hopefully - so the current API is via Xarray accessors). EDIT: seeing your previous #2028 (comment), not sure how you could use slices for label selection using those indexes as I don't think the wrapped scipy / sklearn kdtree objects support range queries. Other spatial indexes may support it (e.g., there's an example in https://github.com/martinfleis/xvec of selecting points using a |
Thanks, @benbovy. Yep, the kdtree objects don't like the range based slices. xoak has worked well in the past though. I'll keep an eye on xoak-xarray integration. |
It should be relatively straightforward to allow slicing on coordinates that are not backed by an IndexVariable, or in other words coordinates that are on a dimension with a different name, as long as they are 1-dimensional (unsure about the multidimensional case).
E.g. given this array:
This is currently not possible:
It should be interpreted as a shorthand for:
The text was updated successfully, but these errors were encountered: