pydata · shoyer · Dec 15, 2016 · Nov 5, 2016 · Nov 12, 2016 · Nov 12, 2016
diff --git a/doc/api.rst b/doc/api.rst
@@ -47,6 +47,7 @@ Attributes
    Dataset.coords
    Dataset.attrs
    Dataset.indexes
+   Dataset.get_index
 
 Dictionary interface
 --------------------
@@ -196,6 +197,7 @@ Attributes
    DataArray.attrs
    DataArray.encoding
    DataArray.indexes
+   DataArray.get_index
 
 **ndarray attributes**:
 :py:attr:`~DataArray.ndim`

diff --git a/doc/computation.rst b/doc/computation.rst
@@ -196,7 +196,9 @@ This means, for example, that you always subtract an array from its transpose:
 You can explicitly broadcast xaray data structures by using the
 :py:func:`~xarray.broadcast` function:
 
-    a2, b2 = xr.broadcast(a, b2)
+.. ipython:: python
+
+    a2, b2 = xr.broadcast(a, b)
     a2
     b2
 
@@ -215,15 +217,18 @@ operations. The default result of a binary operation is by the *intersection*
 
 .. ipython:: python
 
-    arr + arr[:1]
+    arr = xr.DataArray(np.arange(3), [('x', range(3))])
+    arr + arr[:-1]
 
-If the result would be empty, an error is raised instead:
+If coordinate values for a dimension are missing on either argument, all
+matching dimensions must have the same size:
 
-.. ipython::
+.. ipython:: python
 
     @verbatim
-    In [1]: arr[:2] + arr[2:]
-    ValueError: no overlapping labels for some dimensions: ['x']
+    In [1]: arr + xr.DataArray([1, 2], dims='x')
+    ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension size(s) {2} than the size of the aligned dimension labels: 3
+
 
 However, one can explicitly change this default automatic alignment type ("inner")
 via :py:func:`~xarray.set_options()` in context manager:

diff --git a/doc/data-structures.rst b/doc/data-structures.rst
@@ -67,18 +67,33 @@ in with default values:
 
     xr.DataArray(data)
 
-As you can see, dimensions and coordinate arrays corresponding to each
-dimension are always present. This behavior is similar to pandas, which fills
-in index values in the same way.
+As you can see, dimension names are always present in the xarray data model: if
+you do not provide them, defaults of the form ``dim_N`` will be created.
+
+.. note::
+
+  Prior to xarray v0.9, coordinates corresponding to dimension were *also*
+  always present in xarray: xarray would create default coordinates of the form
+  ``range(dim_size)`` if coordinates were not supplied explicitly. This is no
+  longer the case.
 
 Coordinates can take the following forms:
 
-- A list of ``(dim, ticks[, attrs])`` pairs with length equal to the number of dimensions
-- A dictionary of ``{coord_name: coord}`` where the values are each a scalar value,
-  a 1D array or a tuple. Tuples are be in the same form as the above, and
-  multiple dimensions can be supplied with the form  ``(dims, data[, attrs])``.
-  Supplying as a tuple allows other coordinates than those corresponding to
-  dimensions (more on these later).
+- A list of values with length equal to the number of dimensions, providing
+  coordinate labels for each dimension. Each value must be of one of the
+  following forms:
+
+  * A :py:class:`~xarray.DataArray` or :py:class:`~xarray.Variable`
+  * A tuple of the form ``(dims, data[, attrs])``, which is converted into
+    arguments for :py:class:`~xarray.Variable`
+  * A pandas object or scalar value, which is converted into a ``DataArray``
+  * A 1D array or list, which is interpreted as values for a one dimensional
+    coordinate variable along the same dimension as it's name
+
+- A dictionary of ``{coord_name: coord}`` where values are of the same form
+  as the list. Supplying coordinates as a dictionary allows other coordinates
+  than those corresponding to dimensions (more on these later). If you supply
+  ``coords`` as a dictionary, you must explicitly provide ``dims``.
 
 As a list of tuples:
 
@@ -128,7 +143,7 @@ Let's take a look at the important properties on our array:
     foo.attrs
     print(foo.name)
 
-You can even modify ``values`` inplace:
+You can modify ``values`` inplace:
 
 .. ipython:: python
 
@@ -228,15 +243,19 @@ Creating a Dataset
 To make an :py:class:`~xarray.Dataset` from scratch, supply dictionaries for any
 variables (``data_vars``), coordinates (``coords``) and attributes (``attrs``).
 
-``data_vars`` are supplied as a dictionary with each key as the name of the variable and each
+- ``data_vars`` should be a dictionary with each key as the name of the variable and each
 value as one of:
 
-- A :py:class:`~xarray.DataArray`
-- A tuple of the form ``(dims, data[, attrs])``
-- A pandas object
+  * A :py:class:`~xarray.DataArray` or :py:class:`~xarray.Variable`
+  * A tuple of the form ``(dims, data[, attrs])``, which is converted into
+    arguments for :py:class:`~xarray.Variable`
+  * A pandas object, which is converted into a ``DataArray``
+  * A 1D array or list, which is interpreted as values for a one dimensional
+    coordinate variable along the same dimension as it's name
+
+- ``coords`` should be a dictionary of the same form as ``data_vars``.
 
-``coords`` are supplied as dictionary of ``{coord_name: coord}`` where the values are scalar values,
-arrays or tuples in the form of ``(dims, data[, attrs])``.
+- ``attrs`` should be a dictionary.
 
 Let's create some fake data for the example we show above:
 
@@ -257,10 +276,6 @@ Let's create some fake data for the example we show above:
                             'reference_time': pd.Timestamp('2014-09-05')})
     ds
 
-Notice that we did not explicitly include coordinates for the "x" or "y"
-dimensions, so they were filled in array of ascending integers of the proper
-length.
-
 Here we pass :py:class:`xarray.DataArray` objects or a pandas object as values
 in the dictionary:
 

diff --git a/doc/examples/quick-overview.rst b/doc/examples/quick-overview.rst
@@ -23,7 +23,7 @@ array or list, with optional *dimensions* and *coordinates*:
 .. ipython:: python
 
     xr.DataArray(np.random.randn(2, 3))
-    data = xr.DataArray(np.random.randn(2, 3), [('x', ['a', 'b']), ('y', [-2, 0, 2])])
+    data = xr.DataArray(np.random.randn(2, 3), coords={'x': ['a', 'b']}, dims=('x', 'y'))
     data
 
 If you supply a pandas :py:class:`~pandas.Series` or
@@ -121,31 +121,55 @@ xarray supports grouped operations using a very similar API to pandas:
     data.groupby(labels).mean('y')
     data.groupby(labels).apply(lambda x: x - x.min())
 
-Convert to pandas
------------------
+pandas
+------
 
-A key feature of xarray is robust conversion to and from pandas objects:
+Xarray objects can be easily converted to and from pandas objects:
 
 .. ipython:: python
 
-    data.to_series()
-    data.to_pandas()
+    series = data.to_series()
+    series
 
-Datasets and NetCDF
--------------------
+    # convert back
+    series.to_xarray()
 
-:py:class:`xarray.Dataset` is a dict-like container of ``DataArray`` objects that share
-index labels and dimensions. It looks a lot like a netCDF file:
+Datasets
+--------
+
+:py:class:`xarray.Dataset` is a dict-like container of aligned ``DataArray``
+objects. You can think of it as a multi-dimensional generalization of the
+:py:class:`pandas.DataFrame`:
 
 .. ipython:: python
 
-    ds = data.to_dataset(name='foo')
+    ds = xr.Dataset({'foo': data, 'bar': ('x', [1, 2]), 'baz': np.pi})
     ds
 
+Use dictionary indexing to pull out ``Dataset`` variables as ``DataArray``
+objects:
+
+.. ipython:: python
+
+    ds['foo']
+
+Variables in datasets can have different ``dtype`` and even different
+dimensions, but all dimensions are assumed to refer to points in the same shared
+coordinate system.
+
 You can do almost everything you can do with ``DataArray`` objects with
-``Dataset`` objects if you prefer to work with multiple variables at once.
+``Dataset`` objects (including indexing and arithmetic) if you prefer to work
+with multiple variables at once.
+
+NetCDF
+------
+
+NetCDF is the recommended binary serialization format for xarray objects. Users
+from the geosciences will recognize that the :py:class:`~xarray.Dataset` data
+model looks very similar to a netCDF file (which, in fact, inspired it).
 
-Datasets also let you easily read and write netCDF files:
+You can directly read and write xarray objects to disk using :py:meth:`~xarray.Dataset.to_netcdf`, :py:func:`~xarray.open_dataset` and
+:py:func:`~xarray.open_dataarray`:
 
 .. ipython:: python
 

diff --git a/doc/indexing.rst b/doc/indexing.rst
@@ -221,7 +221,7 @@ enabling nearest neighbor (inexact) lookups by use of the methods ``'pad'``,
 
 .. ipython:: python
 
-    data = xr.DataArray([1, 2, 3], dims='x')
+    data = xr.DataArray([1, 2, 3], [('x', [0, 1, 2])])
     data.sel(x=[1.1, 1.9], method='nearest')
     data.sel(x=0.1, method='backfill')
     data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
@@ -478,6 +478,30 @@ Both ``reindex_like`` and ``align`` work interchangeably between
     # this is a no-op, because there are no shared dimension names
     ds.reindex_like(other)
 
+.. _indexing.missing_coordinates:
+
+Missing coordinate labels
+-------------------------
+
+Coordinate labels for each dimension are optional (as of xarray v0.9). Label
+based indexing with ``.sel`` and ``.loc`` uses standard positional,
+integer-based indexing as a fallback for dimensions without a coordinate label:
+
+.. ipython:: python
+
+    array = xr.DataArray([1, 2, 3], dims='x')
+    array.sel(x=[0, -1])
+
+Alignment between xarray objects where one or both do not have coordinate labels
+succeeds only if all dimensions of the same name have the same length.
+Otherwise, it raises an informative error:
+
+.. ipython::
+    :verbatim:
+
+    In [62]: xr.align(array, array[:2])
+    ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {2, 3}
+
 Underlying Indexes
 ------------------
 
@@ -491,3 +515,11 @@ through the :py:attr:`~xarray.DataArray.indexes` attribute.
    arr.indexes
    arr.indexes['time']
 
+Use :py:meth:`~xarray.DataArray.get_index` to get an index for a dimension,
+falling back to a default :py:class:`pandas.RangeIndex` if it has no coordinate
+labels:
+
+.. ipython:: python
+
+    array
+    array.get_index('x')
diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -21,10 +21,32 @@ v0.9.0 (unreleased)
 Breaking changes
 ~~~~~~~~~~~~~~~~
 
+- Index coordinates for each dimensions are now optional, and no longer created
+  by default :issue:`1017`. This has a number of implications:
+
+  - :py:func:`~align` and :py:meth:`~Dataset.reindex` can now error, if
+    dimensions labels are missing and dimensions have different sizes.
+  - Because pandas does not support missing indexes, methods such as
+    ``to_dataframe``/``from_dataframe`` and ``stack``/``unstack`` no longer
+    roundtrip faithfully on all inputs. Use :py:meth:`~Dataset.reset_index` to
+    remove undesired indexes.
+  - ``Dataset.__delitem__`` and :py:meth:`~Dataset.drop` no longer delete/drop
+    variables that have dimensions matching a deleted/dropped variable.
+  - ``DataArray.coords.__delitem__`` is now allowed on variables matching
+    dimension names.
+  - ``.sel`` and ``.loc`` now handle indexing along a dimension without
+    coordinate labels by doing integer based indexing. See
+    :ref:`indexing.missing_coordinates` for an example.
+  - :py:attr:`~Dataset.indexes` is no longer guaranteed to include all
+    dimensions names as keys. The new method :py:meth:`~Dataset.get_index` has
+    been added to get an index for a dimension guaranteed, falling back to
+    produce a default ``RangeIndex`` if necessary.
+
 - The default behavior of ``merge`` is now ``compat='no_conflicts'``, so some
   merges will now succeed in cases that previously raised
   ``xarray.MergeError``. Set ``compat='broadcast_equals'`` to restore the
-  previous default.
+  previous default. See :ref:`combining.no_conflicts` for more details.
+
 - Reading :py:attr:`~DataArray.values` no longer always caches values in a NumPy
   array :issue:`1128`. Caching of ``.values`` on variables read from netCDF
   files on disk is still the default when :py:func:`open_dataset` is called with
@@ -150,6 +172,13 @@ Bug fixes
   should be computed or not.
   By `Fabien Maussion <https://github.com/fmaussion>`_.
 
+- Grouping over an dimension with non-unique values with ``groupby`` gives
+  correct groups.
+  By `Stephan Hoyer <https://github.com/shoyer>`_.
+
+- Fixed accessing coordinate variables with non-string names from ``.coords``.
+  By `Stephan Hoyer <https://github.com/shoyer>`_.
+
 - :py:meth:`~xarray.DataArray.rename` now simultaneously renames the array and
   any coordinate with the same name, when supplied via a :py:class:`dict`
   (:issue:`1116`).
@@ -1280,7 +1309,7 @@ Enhancements
 
   .. ipython:: python
 
-      data = xray.DataArray([1, 2, 3], dims='x')
+      data = xray.DataArray([1, 2, 3], [('x', range(3))])
       data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
 
   This will be especially useful once pandas 0.16 is released, at which point

diff --git a/xarray/backends/common.py b/xarray/backends/common.py
@@ -33,25 +33,6 @@ def _decode_variable_name(name):
     return name
 
 
-def is_trivial_index(var):
-    """
-    Determines if in index is 'trivial' meaning that it is
-    equivalent to np.arange().  This is determined by
-    checking if there are any attributes or encodings,
-    if ndims is one, dtype is int and finally by comparing
-    the actual values to np.arange()
-    """
-    # if either attributes or encodings are defined
-    # the index is not trivial.
-    if len(var.attrs) or len(var.encoding):
-        return False
-    # if the index is not a 1d integer array
-    if var.ndim > 1 or not var.dtype.kind == 'i':
-        return False
-    arange = np.arange(var.size, dtype=var.dtype)
-    return np.all(var.values == arange)
-
-
 def robust_getitem(array, key, catch=Exception, max_retries=6,
                    initial_delay=500):
     """
@@ -203,12 +184,6 @@ def store_dataset(self, dataset):
 
     def store(self, variables, attributes, check_encoding_set=frozenset()):
         self.set_attributes(attributes)
-        neccesary_dims = [v.dims for v in variables.values()]
-        neccesary_dims = set(itertools.chain(*neccesary_dims))
-        # set all non-indexes and any index which is not trivial.
-        variables = OrderedDict((k, v) for k, v in iteritems(variables)
-                                if not (k in neccesary_dims and
-                                        is_trivial_index(v)))
         self.set_variables(variables, check_encoding_set)
 
     def set_attributes(self, attributes):

diff --git a/xarray/conventions.py b/xarray/conventions.py
@@ -913,7 +913,7 @@ def decode_cf(obj, concat_characters=True, mask_and_scale=True,
         identify coordinates.
     drop_variables: string or iterable, optional
         A variable or list of variables to exclude from being parsed from the
-        dataset.This may be useful to drop variables with problems or
+        dataset. This may be useful to drop variables with problems or
         inconsistent values.
 
     Returns
@@ -939,7 +939,7 @@ def decode_cf(obj, concat_characters=True, mask_and_scale=True,
         vars, attrs, concat_characters, mask_and_scale, decode_times,
         decode_coords, drop_variables=drop_variables)
     ds = Dataset(vars, attrs=attrs)
-    ds = ds.set_coords(coord_names.union(extra_coords))
+    ds = ds.set_coords(coord_names.union(extra_coords).intersection(vars))
     ds._file_obj = file_obj
     return ds