Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add set_index, reset_index and reorder_levels methods #1028

Merged
merged 17 commits into from
Dec 27, 2016
6 changes: 6 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,9 @@ Indexing
Dataset.squeeze
Dataset.reindex
Dataset.reindex_like
Dataset.set_index
Dataset.reset_index
Dataset.reorder_levels

Computation
-----------
Expand Down Expand Up @@ -239,6 +242,9 @@ Indexing
DataArray.squeeze
DataArray.reindex
DataArray.reindex_like
DataArray.set_index
DataArray.reset_index
DataArray.reorder_levels

Comparisons
-----------
Expand Down
66 changes: 61 additions & 5 deletions doc/reshaping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Reshaping and reorganizing data
###############################

These methods allow you to reorganize
These methods allow you to reorganize

.. ipython:: python
:suppress:
Expand Down Expand Up @@ -95,23 +95,79 @@ always succeeds, even if the multi-index being unstacked does not contain all
possible levels. Missing levels are filled in with ``NaN`` in the resulting object:

.. ipython:: python

stacked2 = stacked[::2]
stacked2
stacked2
stacked2.unstack('z')

However, xarray's ``stack`` has an important difference from pandas: unlike
pandas, it does not automatically drop missing values. Compare:

.. ipython:: python

array = xr.DataArray([[np.nan, 1], [2, 3]], dims=['x', 'y'])
array.stack(z=('x', 'y'))
array.stack(z=('x', 'y'))
array.to_pandas().stack()

We departed from pandas's behavior here because predictable shapes for new
array dimensions is necessary for :ref:`dask`.

.. _reshape.set_index:

Set and reset index
-------------------

Complementary to stack / unstack, xarray's ``.set_index``, ``.reset_index`` and
``.reorder_levels`` allow easy manipulation of ``DataArray`` or ``Dataset``
multi-indexes without modifying the data and its dimensions.

You can create a multi-index from several 1-dimensional variables and/or
coordinates using :py:meth:`~xarray.DataArray.set_index`:

.. ipython:: python

da = xr.DataArray(np.random.rand(4),
coords={'band': ('x', ['a', 'a', 'b', 'b']),
'wavenumber': ('x', np.linspace(200, 400, 4))},
dims='x')
da
mda = da.set_index(x=['band', 'wavenumber'])
mda

These coordinates can now be used for indexing, e.g.,

.. ipython:: python

mda.sel(band='a')

Conversely, you can use :py:meth:`~xarray.DataArray.reset_index`
to extract multi-index levels as coordinates (this is mainly useful
for serialization):

.. ipython:: python

mda.reset_index('x')

:py:meth:`~xarray.DataArray.reorder_levels` allows changing the order
of multi-index levels:

.. ipython:: python

mda.reorder_levels(x=['wavenumber', 'band'])

As of xarray v0.9 coordinate labels for each dimension are optional.
You can also use ``.set_index`` / ``.reset_index`` to add / remove
labels for one or several dimensions:

.. ipython:: python

array = xr.DataArray([1, 2, 3], dims='x')
array
array['c'] = ('x', ['a', 'b', 'c'])
array.set_index(x='c')
array.set_index(x='c', inplace=True)
array.reset_index('x', drop=True)

Shift and roll
--------------

Expand Down
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,9 @@ Enhancements
as keyword arguments, e.g., ``ds.sel(time='2000-01')``
(see :ref:`multi-level indexing`).
By `Benoit Bovy <https://github.com/benbovy>`_.
- Added ``set_index``, ``reset_index`` and ``reorder_levels`` methods to
easily create and manipulate (multi-)indexes (see :ref:`reshape.set_index`).
By `Benoit Bovy <https://github.com/benbovy>`_.
- Added the ``compat`` option ``'no_conflicts'`` to ``merge``, allowing the
combination of xarray objects with disjoint (:issue:`742`) or
overlapping (:issue:`835`) coordinates as long as all present data agrees.
Expand Down
99 changes: 98 additions & 1 deletion xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from .common import AbstractArray, BaseDataObject
from .coordinates import (DataArrayCoordinates, LevelCoordinatesSource,
Indexes)
from .dataset import Dataset
from .dataset import Dataset, merge_indexes, split_indexes
from .pycompat import iteritems, basestring, OrderedDict, zip, range
from .variable import (as_variable, Variable, as_compatible_data,
IndexVariable,
Expand Down Expand Up @@ -846,6 +846,103 @@ def swap_dims(self, dims_dict):
ds = self._to_temp_dataset().swap_dims(dims_dict)
return self._from_temp_dataset(ds)

def set_index(self, append=False, inplace=False, **indexes):
"""Set DataArray (multi-)indexes using one or more existing coordinates.

Parameters
----------
append : bool, optional
If True, append the supplied index(es) to the existing index(es).
Otherwise replace the existing index(es) (default).
inplace : bool, optional
Copy link
Member

@shoyer shoyer Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need an inplace=True option. I guess it doesn't hurt. (Just more to test)

If True, set new index(es) in-place. Otherwise, return a new DataArray
object.
**indexes : {dim: index, ...}
Keyword arguments with names matching dimensions and values given
by (lists of) the names of existing coordinates or variables to set
as new (multi-)index.

Returns
-------
obj : DataArray
Another dataarray, with this dataarray's data but replaced coordinates.

See Also
--------
DataArray.reset_index
"""
coords, _ = merge_indexes(indexes, self._coords, set(), append=append)
if inplace:
self._coords = coords
else:
return self._replace(coords=coords)

def reset_index(self, dims_or_levels, drop=False, inplace=False):
"""Reset the specified index(es) or multi-index level(s).

Parameters
----------
dims_or_levels : str or list
Name(s) of the dimension(s) and/or multi-index level(s) that will
be reset.
drop : bool, optional
If True, remove the specified indexes and/or multi-index levels
instead of extracting them as new coordinates (default: False).
inplace : bool, optional
If True, modify the dataarray in-place. Otherwise, return a new
DataArray object.

Returns
-------
obj : DataArray
Another dataarray, with this dataarray's data but replaced
coordinates.

See Also
--------
DataArray.set_index
"""
coords, _ = split_indexes(dims_or_levels, self._coords, set(),
self._level_coords, drop=drop)
if inplace:
self._coords = coords
else:
return self._replace(coords=coords)

def reorder_levels(self, inplace=False, **dim_order):
"""Rearrange index levels using input order.

Parameters
----------
inplace : bool, optional
If True, modify the dataarray in-place. Otherwise, return a new
DataArray object.
**dim_order : optional
Keyword arguments with names matching dimensions and values given
by lists representing new level orders. Every given dimension
must have a multi-index.

Returns
-------
obj : DataArray
Another dataarray, with this dataarray's data but replaced
coordinates.
"""
replace_coords = {}
for dim, order in dim_order.items():
coord = self._coords[dim]
index = coord.to_index()
if not isinstance(index, pd.MultiIndex):
raise ValueError("coordinate %r has no MultiIndex" % dim)
replace_coords[dim] = IndexVariable(coord.dims,
index.reorder_levels(order))
coords = self._coords.copy()
coords.update(replace_coords)
if inplace:
self._coords = coords
else:
return self._replace(coords=coords)

def stack(self, **dimensions):
"""
Stack any number of existing dimensions into a single new dimension.
Expand Down
Loading