Merge branch 'master' into yohai-ds_scatter

* master: remove xfail from test_cross_engine_read_write_netcdf4 (pydata#2741) Reenable cross engine read write netCDF test (pydata#2739) remove bottleneck dev build from travis, this test env was failing to build (pydata#2736) CFTimeIndex Resampling (pydata#2593) add tests for handling of empty pandas objects in constructors (pydata#2735) dropna() for a Series indexed by a CFTimeIndex (pydata#2734) deprecate compat & encoding (pydata#2703) Implement integrate (pydata#2653) ENH: resample methods with tolerance (pydata#2716) improve error message for invalid encoding (pydata#2730) silence a couple of warnings (pydata#2727)
yohai · Feb 4, 2019 · 4e41fc3 · 4e41fc3
2 parents 7392c81 + 27cf53f
commit 4e41fc3
Show file tree

Hide file tree

Showing 24 changed files with 899 additions and 131 deletions.
diff --git a/.github/stale.yml b/.github/stale.yml
@@ -28,7 +28,8 @@ staleLabel: stale
 # Comment to post when marking as stale. Set to `false` to disable
 markComment: |
   In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
-  If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically
+
+  If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically
 
 # Comment to post when removing the stale label.
 # unmarkComment: >

diff --git a/.travis.yml b/.travis.yml
@@ -19,7 +19,6 @@ matrix:
     - EXTRA_FLAGS="--run-flaky --run-network-tests"
   - env: CONDA_ENV=py36-dask-dev
   - env: CONDA_ENV=py36-pandas-dev
-  - env: CONDA_ENV=py36-bottleneck-dev
   - env: CONDA_ENV=py36-rasterio
   - env: CONDA_ENV=py36-zarr-dev
   - env: CONDA_ENV=docs
@@ -31,7 +30,6 @@ matrix:
     - CONDA_ENV=py36
     - EXTRA_FLAGS="--run-flaky --run-network-tests"
   - env: CONDA_ENV=py36-pandas-dev
-  - env: CONDA_ENV=py36-bottleneck-dev
   - env: CONDA_ENV=py36-zarr-dev
 
 before_install:

diff --git a/ci/requirements-py36-bottleneck-dev.yml b/ci/requirements-py36-bottleneck-dev.yml
diff --git a/doc/api.rst b/doc/api.rst
@@ -152,6 +152,7 @@ Computation
    Dataset.diff
    Dataset.quantile
    Dataset.differentiate
+   Dataset.integrate
 
 **Aggregation**:
 :py:attr:`~Dataset.all`
@@ -321,6 +322,7 @@ Computation
    DataArray.dot
    DataArray.quantile
    DataArray.differentiate
+   DataArray.integrate
 
 **Aggregation**:
 :py:attr:`~DataArray.all`

diff --git a/doc/computation.rst b/doc/computation.rst
@@ -240,6 +240,8 @@ function or method name to ``coord_func`` option,
   da.coarsen(time=7, x=2, coord_func={'time': 'min'}).mean()
 
 
+.. _compute.using_coordinates:
+
 Computation using Coordinates
 =============================
 
@@ -261,9 +263,17 @@ This method can be used also for multidimensional arrays,
                      coords={'x': [0.1, 0.11, 0.2, 0.3]})
     a.differentiate('x')
 
+:py:meth:`~xarray.DataArray.integrate` computes integration based on
+trapezoidal rule using their coordinates,
+
+.. ipython:: python
+
+    a.integrate('x')
+
 .. note::
-    This method is limited to simple cartesian geometry. Differentiation along
-    multidimensional coordinate is not supported.
+    These methods are limited to simple cartesian geometry. Differentiation
+    and integration along multidimensional coordinate are not supported.
+
 
 .. _compute.broadcasting:
 

diff --git a/doc/time-series.rst b/doc/time-series.rst
@@ -196,11 +196,20 @@ resampling group:
 
    ds.resample(time='6H').reduce(np.mean)
 
-For upsampling, xarray provides four methods: ``asfreq``, ``ffill``, ``bfill``,
-and ``interpolate``. ``interpolate`` extends ``scipy.interpolate.interp1d`` and
-supports all of its schemes. All of these resampling operations work on both
+For upsampling, xarray provides six methods: ``asfreq``, ``ffill``, ``bfill``, ``pad``,
+``nearest`` and ``interpolate``. ``interpolate`` extends ``scipy.interpolate.interp1d``
+and supports all of its schemes. All of these resampling operations work on both
 Dataset and DataArray objects with an arbitrary number of dimensions.
 
+In order to limit the scope of the methods ``ffill``, ``bfill``, ``pad`` and
+``nearest`` the ``tolerance`` argument can be set in coordinate units.
+Data that has indices outside of the given ``tolerance`` are set to ``NaN``.
+
+.. ipython:: python
+
+    ds.resample(time='1H').nearest(tolerance='1H')
+
+
 For more examples of using grouped operations on a time dimension, see
 :ref:`toy weather data`.
 
@@ -300,31 +309,34 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
 
    da.differentiate('time')
 
-- And serialization:
+- Serialization:
 
 .. ipython:: python
 
    da.to_netcdf('example-no-leap.nc')
    xr.open_dataset('example-no-leap.nc')
 
+- And resampling along the time dimension for data indexed by a :py:class:`~xarray.CFTimeIndex`:
+
+.. ipython:: python
+
+    da.resample(time='81T', closed='right', label='right', base=3).mean()
+
 .. note::
 
    While much of the time series functionality that is possible for standard
    dates has been implemented for dates from non-standard calendars, there are
    still some remaining important features that have yet to be implemented,
    for example:
 
-   - Resampling along the time dimension for data indexed by a
-     :py:class:`~xarray.CFTimeIndex` (:issue:`2191`, :issue:`2458`)
    - Built-in plotting of data with :py:class:`cftime.datetime` coordinate axes
      (:issue:`2164`).   
 
    For some use-cases it may still be useful to convert from
    a :py:class:`~xarray.CFTimeIndex` to a :py:class:`pandas.DatetimeIndex`,
-   despite the difference in calendar types (e.g. to allow the use of some
-   forms of resample with non-standard calendars).  The recommended way of
-   doing this is to use the built-in
-   :py:meth:`~xarray.CFTimeIndex.to_datetimeindex` method:
+   despite the difference in calendar types. The recommended way of doing this
+   is to use the built-in :py:meth:`~xarray.CFTimeIndex.to_datetimeindex`
+   method:
 
    .. ipython:: python
       :okwarning:
@@ -334,8 +346,7 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
        da
        datetimeindex = da.indexes['time'].to_datetimeindex()
        da['time'] = datetimeindex
-       da.resample(time='Y').mean('time')
-   
+
    However in this case one should use caution to only perform operations which
    do not depend on differences between dates (e.g. differentiation,
    interpolation, or upsampling with resample), as these could introduce subtle

diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -24,6 +24,10 @@ Breaking changes
 - Remove support for Python 2. This is the first version of xarray that is
   Python 3 only. (:issue:`1876`).
   By `Joe Hamman <https://github.com/jhamman>`_.
+- The `compat` argument to `Dataset` and the `encoding` argument to 
+  `DataArray` are deprecated and will be removed in a future release.
+  (:issue:`1188`)
+  By `Maximilian Roos <https://github.com/max-sixty>`_.
 
 Enhancements
 ~~~~~~~~~~~~
@@ -45,6 +49,21 @@ Enhancements
   By `Benoit Bovy <https://github.com/benbovy>`_.
 - Dataset plotting API! Currently only :py:meth:`Dataset.plot.scatter` is implemented.
   By `Yohai Bar Sinai <https://github.com/yohai>`_ and `Deepak Cherian <https://github.com/dcherian>`_
+- Resampling of standard and non-standard calendars indexed by
+  :py:class:`~xarray.CFTimeIndex` is now possible. (:issue:`2191`).
+  By `Jwen Fai Low <https://github.com/jwenfai>`_ and
+  `Spencer Clark <https://github.com/spencerkclark>`_.
+- Add ``tolerance`` option to ``resample()`` methods ``bfill``, ``pad``,
+  ``nearest``. (:issue:`2695`)
+  By `Hauke Schulz <https://github.com/observingClouds>`_.
+- :py:meth:`~xarray.DataArray.integrate` and
+  :py:meth:`~xarray.Dataset.integrate` are newly added.
+  See :ref:`_compute.using_coordinates` for the detail.
+  (:issue:`1332`)
+  By `Keisuke Fujii <https://github.com/fujiisoup>`_.
+- :py:meth:`pandas.Series.dropna` is now supported for a
+  :py:class:`pandas.Series` indexed by a :py:class:`~xarray.CFTimeIndex`
+  (:issue:`2688`). By `Spencer Clark <https://github.com/spencerkclark>`_.
 
 Bug fixes
 ~~~~~~~~~
@@ -114,6 +133,7 @@ Breaking changes
   (:issue:`2565`). The previous behavior was to decode them only if they
   had specific time attributes, now these attributes are copied
   automatically from the corresponding time coordinate. This might
+  break downstream code that was relying on these variables to be
   brake downstream code that was relying on these variables to be
   not decoded.
   By `Fabien Maussion <https://github.com/fmaussion>`_.

diff --git a/xarray/backends/netCDF4_.py b/xarray/backends/netCDF4_.py
@@ -217,8 +217,9 @@ def _extract_nc4_variable_encoding(variable, raise_on_invalid=False,
     if raise_on_invalid:
         invalid = [k for k in encoding if k not in valid_encodings]
         if invalid:
-            raise ValueError('unexpected encoding parameters for %r backend: '
-                             ' %r' % (backend, invalid))
+            raise ValueError(
+                'unexpected encoding parameters for %r backend: %r. Valid '
+                'encodings are: %r' % (backend, invalid, valid_encodings))
     else:
         for k in list(encoding):
             if k not in valid_encodings:

diff --git a/xarray/coding/cftime_offsets.py b/xarray/coding/cftime_offsets.py
@@ -358,29 +358,41 @@ def rollback(self, date):
 class Day(BaseCFTimeOffset):
     _freq = 'D'
 
+    def as_timedelta(self):
+        return timedelta(days=self.n)
+
     def __apply__(self, other):
-        return other + timedelta(days=self.n)
+        return other + self.as_timedelta()
 
 
 class Hour(BaseCFTimeOffset):
     _freq = 'H'
 
+    def as_timedelta(self):
+        return timedelta(hours=self.n)
+
     def __apply__(self, other):
-        return other + timedelta(hours=self.n)
+        return other + self.as_timedelta()
 
 
 class Minute(BaseCFTimeOffset):
     _freq = 'T'
 
+    def as_timedelta(self):
+        return timedelta(minutes=self.n)
+
     def __apply__(self, other):
-        return other + timedelta(minutes=self.n)
+        return other + self.as_timedelta()
 
 
 class Second(BaseCFTimeOffset):
     _freq = 'S'
 
+    def as_timedelta(self):
+        return timedelta(seconds=self.n)
+
     def __apply__(self, other):
-        return other + timedelta(seconds=self.n)
+        return other + self.as_timedelta()
 
 
 _FREQUENCIES = {
@@ -427,6 +439,11 @@ def __apply__(self, other):
     _FREQUENCY_CONDITION)
 
 
+# pandas defines these offsets as "Tick" objects, which for instance have
+# distinct behavior from monthly or longer frequencies in resample.
+CFTIME_TICKS = (Day, Hour, Minute, Second)
+
+
 def to_offset(freq):
     """Convert a frequency string to the appropriate subclass of
     BaseCFTimeOffset."""

diff --git a/xarray/coding/cftimeindex.py b/xarray/coding/cftimeindex.py
@@ -335,11 +335,13 @@ def _maybe_cast_slice_bound(self, label, side, kind):
     # e.g. series[1:5].
     def get_value(self, series, key):
         """Adapted from pandas.tseries.index.DatetimeIndex.get_value"""
-        if not isinstance(key, slice):
-            return series.iloc[self.get_loc(key)]
-        else:
+        if np.asarray(key).dtype == np.dtype(bool):
+            return series.iloc[key]
+        elif isinstance(key, slice):
             return series.iloc[self.slice_indexer(
                 key.start, key.stop, key.step)]
+        else:
+            return series.iloc[self.get_loc(key)]
 
     def __contains__(self, key):
         """Adapted from

diff --git a/xarray/core/alignment.py b/xarray/core/alignment.py
@@ -495,7 +495,7 @@ def _broadcast_array(array):
         coords = OrderedDict(array.coords)
         coords.update(common_coords)
         return DataArray(data, coords, data.dims, name=array.name,
-                         attrs=array.attrs, encoding=array.encoding)
+                         attrs=array.attrs)
 
     def _broadcast_dataset(ds):
         data_vars = OrderedDict(

diff --git a/xarray/core/common.py b/xarray/core/common.py
@@ -713,6 +713,13 @@ def resample(self, indexer=None, skipna=None, closed=None, label=None,
         array([ 0.      ,  0.032258,  0.064516, ..., 10.935484, 10.967742, 11.      ])
         Coordinates:
           * time     (time) datetime64[ns] 1999-12-15 1999-12-16 1999-12-17 ...
+        
+        Limit scope of upsampling method
+        >>> da.resample(time='1D').nearest(tolerance='1D')
+        <xarray.DataArray (time: 337)>
+        array([ 0.,  0., nan, ..., nan, 11., 11.])
+        Coordinates:
+          * time     (time) datetime64[ns] 1999-12-15 1999-12-16 ... 2000-11-15
 
         References
         ----------
@@ -749,23 +756,16 @@ def resample(self, indexer=None, skipna=None, closed=None, label=None,
         dim_coord = self[dim]
 
         if isinstance(self.indexes[dim_name], CFTimeIndex):
-            raise NotImplementedError(
-                'Resample is currently not supported along a dimension '
-                'indexed by a CFTimeIndex.  For certain kinds of downsampling '
-                'it may be possible to work around this by converting your '
-                'time index to a DatetimeIndex using '
-                'CFTimeIndex.to_datetimeindex.  Use caution when doing this '
-                'however, because switching to a DatetimeIndex from a '
-                'CFTimeIndex with a non-standard calendar entails a change '
-                'in the calendar type, which could lead to subtle and silent '
-                'errors.'
-            )
-
+            from .resample_cftime import CFTimeGrouper
+            grouper = CFTimeGrouper(freq, closed, label, base, loffset)
+        else:
+            # TODO: to_offset() call required for pandas==0.19.2
+            grouper = pd.Grouper(freq=freq, closed=closed, label=label,
+                                 base=base,
+                                 loffset=pd.tseries.frequencies.to_offset(
+                                     loffset))
         group = DataArray(dim_coord, coords=dim_coord.coords,
                           dims=dim_coord.dims, name=RESAMPLE_DIM)
-        # TODO: to_offset() call required for pandas==0.19.2
-        grouper = pd.Grouper(freq=freq, closed=closed, label=label, base=base,
-                             loffset=pd.tseries.frequencies.to_offset(loffset))
         resampler = self._resample_cls(self, group=group, dim=dim_name,
                                        grouper=grouper,
                                        resample_dim=RESAMPLE_DIM)