Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add set_index, reset_index and reorder_levels methods #1028

Merged
merged 17 commits into from
Dec 27, 2016
6 changes: 6 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,9 @@ Indexing
Dataset.squeeze
Dataset.reindex
Dataset.reindex_like
Dataset.set_index
Dataset.reset_index
Dataset.reorder_levels

Computation
-----------
Expand Down Expand Up @@ -234,6 +237,9 @@ Indexing
DataArray.squeeze
DataArray.reindex
DataArray.reindex_like
DataArray.set_index
DataArray.reset_index
DataArray.reorder_levels

Comparisons
-----------
Expand Down
42 changes: 41 additions & 1 deletion doc/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -478,6 +478,47 @@ Both ``reindex_like`` and ``align`` work interchangeably between
# this is a no-op, because there are no shared dimension names
ds.reindex_like(other)

.. _multi-index handling:

Multi-index handling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docs are great, but I wouldn't call them "indexing methods" exactly. Maybe move this section to Reshaping and reorganizing data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

--------------------

Morroring pandas, xarray's ``set_index``, ``reset_index`` and
``reorder_levels`` allow easy manipulation of ``DataArray`` or ``Dataset``
multi-indexes without modifying the data.

You can create a multi-index from several 1-dimensional variables and/or
coordinates using ``set_index``:

.. ipython:: python

da = xr.DataArray(np.random.rand(4),
coords={'band': ('x', ['a', 'a', 'b', 'b']),
'wavenumber': ('x', np.linspace(200, 400, 4))},
dims='x')
da
mda = da.set_index(x=['band', 'wavenumber'])
mda

These coordinates can now be used for indexing, e.g.,

.. ipython:: python

mda.sel(band='a')

Conversely, you can use ``reset_index`` to extract multi-index levels as
coordinates (this is mainly useful for serialization):

.. ipython:: python

mda.reset_index('x')

``reorder_levels`` allows changing the order of multi-index levels:

.. ipython:: python

mda.reorder_levels(x=['wavenumber', 'band'])

Underlying Indexes
------------------

Expand All @@ -490,4 +531,3 @@ through the :py:attr:`~xarray.DataArray.indexes` attribute.
arr
arr.indexes
arr.indexes['time']

4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,10 @@ Enhancements
(see :ref:`multi-level indexing`).
By `Benoit Bovy <https://github.com/benbovy>`_.

- Added ``set_index``, ``reset_index`` and ``reorder_levels`` methods to
easily create and manipulate multi-indexes (see :ref:`multi-index handling`).
By `Benoit Bovy <https://github.com/benbovy>`_.

- Added the ``compat`` option ``'no_conflicts'`` to ``merge``, allowing the
combination of xarray objects with disjoint (:issue:`742`) or
overlapping (:issue:`835`) coordinates as long as all present data agrees.
Expand Down
104 changes: 103 additions & 1 deletion xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from .common import AbstractArray, BaseDataObject
from .coordinates import (DataArrayCoordinates, LevelCoordinates,
Indexes)
from .dataset import Dataset
from .dataset import Dataset, merge_indexes, split_indexes
from .pycompat import iteritems, basestring, OrderedDict, zip
from .variable import (as_variable, Variable, as_compatible_data, IndexVariable,
default_index_coordinate,
Expand Down Expand Up @@ -821,6 +821,108 @@ def swap_dims(self, dims_dict):
ds = self._to_temp_dataset().swap_dims(dims_dict)
return self._from_temp_dataset(ds)

def set_index(self, append=False, inplace=False, **indexes):
"""Set DataArray (multi-)indexes using one or more existing coordinates.

Parameters
----------
append : bool, optional
If True, append the supplied index(es) to the existing index(es).
Otherwise replace the existing index(es) (default).
inplace : bool, optional
Copy link
Member

@shoyer shoyer Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need an inplace=True option. I guess it doesn't hurt. (Just more to test)

If True, set new index(es) in-place. Otherwise, return a new DataArray
object.
**indexes : {dim: index, ...}
Keyword arguments with names matching dimensions and values given
by (lists of) the names of existing coordinates or variables to set
as new (multi-)index.

Returns
-------
reindexed : DataArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong name -- should not be reindexed

Another dataarray, with this dataarray's data but replaced coordinates.

See Also
--------
DataArray.reset_index
"""
coords, _ = merge_indexes(indexes, self._coords, set(), append=append)
if inplace:
self._coords = coords
else:
return self._replace(coords=coords)

def reset_index(self, dim, levels=None, drop=False, inplace=False):
"""Extract index(es) as new coordinates.

Parameters
----------
dim : str or list
Name(s) of the dimension(s) for which to extract and reset
the index.
levels : list or None, optional
If None (default) and if `dim` has a multi-index, extract all levels
as new coordinates. Otherwise extract only the given list of level
names. If more than one dimension is given in `dim`, `levels` should
be a list of the same length than `dim` (or simply None to extract
all indexes/levels from all given dimensions).
drop : bool, optional
If True, remove the specified levels instead of extracting them as
new coordinates (default: False).
inplace : bool, optional
If True, modify the dataarray in-place. Otherwise, return a new
DataArray object.

Returns
-------
reindexed: DataArray
Another dataarray, with this dataarray's data but replaced
coordinates.

See Also
--------
DataArray.set_index
"""
coords, _ = split_indexes(dim, levels, self._coords, set(), drop=drop)
if inplace:
self._coords = coords
else:
return self._replace(coords=coords)

def reorder_levels(self, inplace=False, **dim_order):
"""Rearrange index levels using input order.

Parameters
----------
inplace : bool, optional
If True, modify the dataarray in-place. Otherwise, return a new
DataArray object.
**dim_order : optional
Keyword arguments with names matching dimensions and values given
by lists representing new level orders. Every given dimension
must have a multi-index.

Returns
-------
reindexed: DataArray
Another dataarray, with this dataarray's data but replaced
coordinates.
"""
replace_coords = {}
for dim, order in dim_order.items():
coord = self._coords[dim]
index = coord.to_index()
if not isinstance(index, pd.MultiIndex):
raise ValueError("coordinate %r has no MultiIndex" % dim)
replace_coords[dim] = IndexVariable(coord.dims,
index.reorder_levels(order))
coords = self._coords.copy()
coords.update(replace_coords)
if inplace:
self._coords = coords
else:
return self._replace(coords=coords)

def stack(self, **dimensions):
"""
Stack any number of existing dimensions into a single new dimension.
Expand Down
Loading