Skip to content

Commit

Permalink
Add set_xindex and drop_indexes methods (#6971)
Browse files Browse the repository at this point in the history
* temporary API to set custom indexes

* add the temporary index API to DataArray

* add options argument to Index.from_variables()

It allows passing options to the constructor of a custom index class (if
any).

The **options arguments of Dataset.set_xindex() are passed through.

Also add type annotations to set_xindex().

* fix mypy

* remove temporary API warning

* add the Index class in Xarray's root namespace

* improve set_xindex docstrings and add to api.rst

* remove temp comments

* special case for pandas multi-index dim coord

* add tests for set_xindex

* error message tweaks

* set_xindex with 1 coord: avoid reodering coords

* mypy fixes

* add Dataset and DataArray drop_indexes methods

* improve assert_no_index_corrupted error msg

* drop_indexes: add tests

* add drop_indexes to api.rst

* improve docstrings of legacy methods

* add what's new entry

* try using correct typing w/o mypy complaining

* make index_cls arg optional

Try setting a pandas (multi-)index by default.

* docstrings fixes and tweaks

* make Index.from_variables options arg keyword only

* improve set_xindex invalid coordinates error msg

* add xarray.indexes namespace

* type tweaks

Co-authored-by: Keewis <keewis@posteo.de>
  • Loading branch information
benbovy and keewis authored Sep 28, 2022
1 parent 2f0f95a commit e678a1d
Show file tree
Hide file tree
Showing 9 changed files with 415 additions and 18 deletions.
10 changes: 10 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ Dataset contents
Dataset.swap_dims
Dataset.expand_dims
Dataset.drop_vars
Dataset.drop_indexes
Dataset.drop_duplicates
Dataset.drop_dims
Dataset.set_coords
Expand Down Expand Up @@ -146,6 +147,7 @@ Indexing
Dataset.reindex_like
Dataset.set_index
Dataset.reset_index
Dataset.set_xindex
Dataset.reorder_levels
Dataset.query

Expand Down Expand Up @@ -298,6 +300,7 @@ DataArray contents
DataArray.swap_dims
DataArray.expand_dims
DataArray.drop_vars
DataArray.drop_indexes
DataArray.drop_duplicates
DataArray.reset_coords
DataArray.copy
Expand Down Expand Up @@ -330,6 +333,7 @@ Indexing
DataArray.reindex_like
DataArray.set_index
DataArray.reset_index
DataArray.set_xindex
DataArray.reorder_levels
DataArray.query

Expand Down Expand Up @@ -1080,13 +1084,19 @@ Advanced API
Variable
IndexVariable
as_variable
indexes.Index
Context
register_dataset_accessor
register_dataarray_accessor
Dataset.set_close
backends.BackendArray
backends.BackendEntrypoint

Default, pandas-backed indexes built-in Xarray:

indexes.PandasIndex
indexes.PandasMultiIndex

These backends provide a low-level interface for lazily loading data from
external file-formats or protocols, and can be manually invoked to create
arguments for the ``load_store`` and ``dump_to_store`` Dataset methods:
Expand Down
5 changes: 5 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ v2022.07.0 (unreleased)

New Features
~~~~~~~~~~~~

- Add :py:meth:`Dataset.set_xindex` and :py:meth:`Dataset.drop_indexes` and
their DataArray counterpart for setting and dropping pandas or custom indexes
given a set of arbitrary coordinates. (:pull:`6971`)
By `Benoît Bovy <https://github.com/benbovy>`_ and `Justus Magin <https://github.com/keewis>`_.
- Enable taking the mean of dask-backed :py:class:`cftime.datetime` arrays
(:pull:`6556`, :pull:`6940`). By `Deepak Cherian
<https://github.com/dcherian>`_ and `Spencer Clark
Expand Down
68 changes: 68 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -2349,6 +2349,11 @@ def set_index(
"""Set DataArray (multi-)indexes using one or more existing
coordinates.
This legacy method is limited to pandas (multi-)indexes and
1-dimensional "dimension" coordinates. See
:py:meth:`~DataArray.set_xindex` for setting a pandas or a custom
Xarray-compatible index from one or more arbitrary coordinates.
Parameters
----------
indexes : {dim: index, ...}
Expand Down Expand Up @@ -2393,6 +2398,7 @@ def set_index(
See Also
--------
DataArray.reset_index
DataArray.set_xindex
"""
ds = self._to_temp_dataset().set_index(indexes, append=append, **indexes_kwargs)
return self._from_temp_dataset(ds)
Expand All @@ -2406,6 +2412,12 @@ def reset_index(
) -> DataArray:
"""Reset the specified index(es) or multi-index level(s).
This legacy method is specific to pandas (multi-)indexes and
1-dimensional "dimension" coordinates. See the more generic
:py:meth:`~DataArray.drop_indexes` and :py:meth:`~DataArray.set_xindex`
method to respectively drop and set pandas or custom indexes for
arbitrary coordinates.
Parameters
----------
dims_or_levels : Hashable or sequence of Hashable
Expand All @@ -2424,10 +2436,41 @@ def reset_index(
See Also
--------
DataArray.set_index
DataArray.set_xindex
DataArray.drop_indexes
"""
ds = self._to_temp_dataset().reset_index(dims_or_levels, drop=drop)
return self._from_temp_dataset(ds)

def set_xindex(
self: T_DataArray,
coord_names: str | Sequence[Hashable],
index_cls: type[Index] | None = None,
**options,
) -> T_DataArray:
"""Set a new, Xarray-compatible index from one or more existing
coordinate(s).
Parameters
----------
coord_names : str or list
Name(s) of the coordinate(s) used to build the index.
If several names are given, their order matters.
index_cls : subclass of :class:`~xarray.indexes.Index`
The type of index to create. By default, try setting
a pandas (multi-)index from the supplied coordinates.
**options
Options passed to the index constructor.
Returns
-------
obj : DataArray
Another dataarray, with this dataarray's data and with a new index.
"""
ds = self._to_temp_dataset().set_xindex(coord_names, index_cls, **options)
return self._from_temp_dataset(ds)

def reorder_levels(
self: T_DataArray,
dim_order: Mapping[Any, Sequence[int | Hashable]] | None = None,
Expand Down Expand Up @@ -2738,6 +2781,31 @@ def drop_vars(
ds = self._to_temp_dataset().drop_vars(names, errors=errors)
return self._from_temp_dataset(ds)

def drop_indexes(
self: T_DataArray,
coord_names: Hashable | Iterable[Hashable],
*,
errors: ErrorOptions = "raise",
) -> T_DataArray:
"""Drop the indexes assigned to the given coordinates.
Parameters
----------
coord_names : hashable or iterable of hashable
Name(s) of the coordinate(s) for which to drop the index.
errors : {"raise", "ignore"}, default: "raise"
If 'raise', raises a ValueError error if any of the coordinates
passed have no index or are not in the dataset.
If 'ignore', no error is raised.
Returns
-------
dropped : DataArray
A new dataarray with dropped indexes.
"""
ds = self._to_temp_dataset().drop_indexes(coord_names, errors=errors)
return self._from_temp_dataset(ds)

def drop(
self: T_DataArray,
labels: Mapping[Any, Any] | None = None,
Expand Down
Loading

0 comments on commit e678a1d

Please sign in to comment.