Skip to content
forked from pydata/xarray

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into fix/user-coordinates
Browse files Browse the repository at this point in the history
* upstream/master:
  add missing pint integration tests (pydata#3508)
  DOC: update bottleneck repo url (pydata#3507)
  add drop_sel, drop_vars, map to api.rst (pydata#3506)
  remove syntax warning (pydata#3505)
  Dataset.map, GroupBy.map, Resample.map (pydata#3459)
  tests for datasets with units (pydata#3447)
  fix pandas-dev tests (pydata#3491)
  unpin pseudonetcdf (pydata#3496)
  whatsnew corrections (pydata#3494)
  drop_vars; deprecate drop for variables (pydata#3475)
  uamiv test using only raw uamiv variables (pydata#3485)
  Optimize dask array equality checks. (pydata#3453)
  • Loading branch information
dcherian committed Nov 12, 2019
2 parents 2279c36 + 4e9240a commit d49ceef
Show file tree
Hide file tree
Showing 34 changed files with 2,569 additions and 356 deletions.
2 changes: 1 addition & 1 deletion ci/azure/install.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ steps:
--pre \
--upgrade \
matplotlib \
pandas=0.26.0.dev0+628.g03c1a3db2 \ # FIXME https://github.com/pydata/xarray/issues/3440
pandas \
scipy
# numpy \ # FIXME https://github.com/pydata/xarray/issues/3409
pip install \
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/py36.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ dependencies:
- pandas
- pint
- pip
- pseudonetcdf<3.1 # FIXME https://github.com/pydata/xarray/issues/3409
- pseudonetcdf
- pydap
- pynio
- pytest
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/py37-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ dependencies:
- pandas
- pint
- pip
- pseudonetcdf<3.1 # FIXME https://github.com/pydata/xarray/issues/3409
- pseudonetcdf
- pydap
# - pynio # Not available on Windows
- pytest
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/py37.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ dependencies:
- pandas
- pint
- pip
- pseudonetcdf<3.1 # FIXME https://github.com/pydata/xarray/issues/3409
- pseudonetcdf
- pydap
- pynio
- pytest
Expand Down
14 changes: 8 additions & 6 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ Dataset contents
Dataset.rename_dims
Dataset.swap_dims
Dataset.expand_dims
Dataset.drop
Dataset.drop_vars
Dataset.drop_dims
Dataset.set_coords
Dataset.reset_coords
Expand All @@ -118,6 +118,7 @@ Indexing
Dataset.loc
Dataset.isel
Dataset.sel
Dataset.drop_sel
Dataset.head
Dataset.tail
Dataset.thin
Expand Down Expand Up @@ -154,7 +155,7 @@ Computation
.. autosummary::
:toctree: generated/

Dataset.apply
Dataset.map
Dataset.reduce
Dataset.groupby
Dataset.groupby_bins
Expand Down Expand Up @@ -263,7 +264,7 @@ DataArray contents
DataArray.rename
DataArray.swap_dims
DataArray.expand_dims
DataArray.drop
DataArray.drop_vars
DataArray.reset_coords
DataArray.copy

Expand All @@ -283,6 +284,7 @@ Indexing
DataArray.loc
DataArray.isel
DataArray.sel
DataArray.drop_sel
DataArray.head
DataArray.tail
DataArray.thin
Expand Down Expand Up @@ -542,10 +544,10 @@ GroupBy objects
:toctree: generated/

core.groupby.DataArrayGroupBy
core.groupby.DataArrayGroupBy.apply
core.groupby.DataArrayGroupBy.map
core.groupby.DataArrayGroupBy.reduce
core.groupby.DatasetGroupBy
core.groupby.DatasetGroupBy.apply
core.groupby.DatasetGroupBy.map
core.groupby.DatasetGroupBy.reduce

Rolling objects
Expand All @@ -566,7 +568,7 @@ Resample objects
================

Resample objects also implement the GroupBy interface
(methods like ``apply()``, ``reduce()``, ``mean()``, ``sum()``, etc.).
(methods like ``map()``, ``reduce()``, ``mean()``, ``sum()``, etc.).

.. autosummary::
:toctree: generated/
Expand Down
6 changes: 3 additions & 3 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ a value when aggregating:

Note that rolling window aggregations are faster and use less memory when bottleneck_ is installed. This only applies to numpy-backed xarray objects.

.. _bottleneck: https://github.com/kwgoodman/bottleneck/
.. _bottleneck: https://github.com/pydata/bottleneck/

We can also manually iterate through ``Rolling`` objects:

Expand Down Expand Up @@ -462,13 +462,13 @@ Datasets support most of the same methods found on data arrays:
abs(ds)
Datasets also support NumPy ufuncs (requires NumPy v1.13 or newer), or
alternatively you can use :py:meth:`~xarray.Dataset.apply` to apply a function
alternatively you can use :py:meth:`~xarray.Dataset.map` to map a function
to each variable in a dataset:

.. ipython:: python
np.sin(ds)
ds.apply(np.sin)
ds.map(np.sin)
Datasets also use looping over variables for *broadcasting* in binary
arithmetic. You can do arithmetic between any ``DataArray`` and a dataset:
Expand Down
2 changes: 1 addition & 1 deletion doc/dask.rst
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ For the best performance when using Dask's multi-threaded scheduler, wrap a
function that already releases the global interpreter lock, which fortunately
already includes most NumPy and Scipy functions. Here we show an example
using NumPy operations and a fast function from
`bottleneck <https://github.com/kwgoodman/bottleneck>`__, which
`bottleneck <https://github.com/pydata/bottleneck>`__, which
we use to calculate `Spearman's rank-correlation coefficient <https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient>`__:

.. code-block:: python
Expand Down
4 changes: 2 additions & 2 deletions doc/data-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -393,14 +393,14 @@ methods (like pandas) for transforming datasets into new objects.

For removing variables, you can select and drop an explicit list of
variables by indexing with a list of names or using the
:py:meth:`~xarray.Dataset.drop` methods to return a new ``Dataset``. These
:py:meth:`~xarray.Dataset.drop_vars` methods to return a new ``Dataset``. These
operations keep around coordinates:

.. ipython:: python
ds[['temperature']]
ds[['temperature', 'temperature_double']]
ds.drop('temperature')
ds.drop_vars('temperature')
To remove a dimension, you can use :py:meth:`~xarray.Dataset.drop_dims` method.
Any variables using that dimension are dropped:
Expand Down
15 changes: 8 additions & 7 deletions doc/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,11 @@ Let's create a simple example dataset:
.. ipython:: python
ds = xr.Dataset({'foo': (('x', 'y'), np.random.rand(4, 3))},
coords={'x': [10, 20, 30, 40],
'letters': ('x', list('abba'))})
arr = ds['foo']
ds = xr.Dataset(
{"foo": (("x", "y"), np.random.rand(4, 3))},
coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))},
)
arr = ds["foo"]
ds
If we groupby the name of a variable or coordinate in a dataset (we can also
Expand Down Expand Up @@ -93,15 +94,15 @@ Apply
~~~~~

To apply a function to each group, you can use the flexible
:py:meth:`~xarray.DatasetGroupBy.apply` method. The resulting objects are automatically
:py:meth:`~xarray.DatasetGroupBy.map` method. The resulting objects are automatically
concatenated back together along the group axis:

.. ipython:: python
def standardize(x):
return (x - x.mean()) / x.std()
arr.groupby('letters').apply(standardize)
arr.groupby('letters').map(standardize)
GroupBy objects also have a :py:meth:`~xarray.DatasetGroupBy.reduce` method and
methods like :py:meth:`~xarray.DatasetGroupBy.mean` as shortcuts for applying an
Expand Down Expand Up @@ -202,7 +203,7 @@ __ http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dimen
dims=['ny','nx'])
da
da.groupby('lon').sum(...)
da.groupby('lon').apply(lambda x: x - x.mean(), shortcut=False)
da.groupby('lon').map(lambda x: x - x.mean(), shortcut=False)
Because multidimensional groups have the ability to generate a very large
number of bins, coarse-binning via :py:meth:`~xarray.Dataset.groupby_bins`
Expand Down
2 changes: 1 addition & 1 deletion doc/howdoi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ How do I ...
* - convert a possibly irregularly sampled timeseries to a regularly sampled timeseries
- :py:meth:`DataArray.resample`, :py:meth:`Dataset.resample` (see :ref:`resampling` for more)
* - apply a function on all data variables in a Dataset
- :py:meth:`Dataset.apply`
- :py:meth:`Dataset.map`
* - write xarray objects with complex values to a netCDF file
- :py:func:`Dataset.to_netcdf`, :py:func:`DataArray.to_netcdf` specifying ``engine="h5netcdf", invalid_netcdf=True``
* - make xarray objects look like other xarray objects
Expand Down
6 changes: 3 additions & 3 deletions doc/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -232,14 +232,14 @@ Using indexing to *assign* values to a subset of dataset (e.g.,
Dropping labels and dimensions
------------------------------

The :py:meth:`~xarray.Dataset.drop` method returns a new object with the listed
The :py:meth:`~xarray.Dataset.drop_sel` method returns a new object with the listed
index labels along a dimension dropped:

.. ipython:: python
ds.drop(space=['IN', 'IL'])
ds.drop_sel(space=['IN', 'IL'])
``drop`` is both a ``Dataset`` and ``DataArray`` method.
``drop_sel`` is both a ``Dataset`` and ``DataArray`` method.

Use :py:meth:`~xarray.Dataset.drop_dims` to drop a full dimension from a Dataset.
Any variables with these dimensions are also dropped:
Expand Down
2 changes: 1 addition & 1 deletion doc/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ For accelerating xarray

- `scipy <http://scipy.org/>`__: necessary to enable the interpolation features for
xarray objects
- `bottleneck <https://github.com/kwgoodman/bottleneck>`__: speeds up
- `bottleneck <https://github.com/pydata/bottleneck>`__: speeds up
NaN-skipping and rolling window aggregations by a large factor
- `numbagg <https://github.com/shoyer/numbagg>`_: for exponential rolling
window operations
Expand Down
2 changes: 1 addition & 1 deletion doc/quick-overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ xarray supports grouped operations using a very similar API to pandas (see :ref:
labels = xr.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels')
labels
data.groupby(labels).mean('y')
data.groupby(labels).apply(lambda x: x - x.min())
data.groupby(labels).map(lambda x: x - x.min())
Plotting
--------
Expand Down
24 changes: 22 additions & 2 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,19 @@ Breaking changes

New Features
~~~~~~~~~~~~
- :py:meth:`Dataset.drop_sel` & :py:meth:`DataArray.drop_sel` have been added for dropping labels.
:py:meth:`Dataset.drop_vars` & :py:meth:`DataArray.drop_vars` have been added for
dropping variables (including coordinates). The existing ``drop`` methods remain as a backward compatible
option for dropping either labels or variables, but using the more specific methods is encouraged.
(:pull:`3475`)
By `Maximilian Roos <https://github.com/max-sixty>`_
- :py:meth:`Dataset.map` & :py:meth:`GroupBy.map` & :py:meth:`Resample.map` have been added for
mapping / applying a function over each item in the collection, reflecting the widely used
and least surprising name for this operation.
The existing ``apply`` methods remain for backward compatibility, though using the ``map``
methods is encouraged.
(:pull:`3459`)
By `Maximilian Roos <https://github.com/max-sixty>`_
- :py:meth:`Dataset.transpose` and :py:meth:`DataArray.transpose` now support an ellipsis (`...`)
to represent all 'other' dimensions. For example, to move one dimension to the front,
use ``.transpose('x', ...)``. (:pull:`3421`)
Expand Down Expand Up @@ -74,6 +87,10 @@ Bug fixes
- Fix grouping over variables with NaNs. (:issue:`2383`, :pull:`3406`).
By `Deepak Cherian <https://github.com/dcherian>`_.
- Sync with cftime by removing ``dayofwk=-1`` for cftime>=1.0.4.
- Use dask names to compare dask objects prior to comparing values after computation.
(:issue:`3068`, :issue:`3311`, :issue:`3454`, :pull:`3453`).
By `Deepak Cherian <https://github.com/dcherian>`_.
- Sync with cftime by removing `dayofwk=-1` for cftime>=1.0.4.
By `Anderson Banihirwe <https://github.com/andersy005>`_.
- Fix :py:meth:`xarray.core.groupby.DataArrayGroupBy.reduce` and
:py:meth:`xarray.core.groupby.DatasetGroupBy.reduce` when reducing over multiple dimensions.
Expand All @@ -98,7 +115,7 @@ Internal Changes
~~~~~~~~~~~~~~~~

- Added integration tests against `pint <https://pint.readthedocs.io/>`_.
(:pull:`3238`) by `Justus Magin <https://github.com/keewis>`_.
(:pull:`3238`, :pull:`3447`, :pull:`3508`) by `Justus Magin <https://github.com/keewis>`_.

.. note::

Expand All @@ -114,6 +131,8 @@ Internal Changes
- Run basic CI tests on Python 3.8. (:pull:`3477`)
By `Maximilian Roos <https://github.com/max-sixty>`_

- Enable type checking on default sentinel values (:pull:`3472`)
By `Maximilian Roos <https://github.com/max-sixty>`_

.. _whats-new.0.14.0:

Expand Down Expand Up @@ -3721,7 +3740,7 @@ Breaking changes
warnings: methods and attributes that were deprecated in xray v0.3 or earlier
(e.g., ``dimensions``, ``attributes```) have gone away.

.. _bottleneck: https://github.com/kwgoodman/bottleneck
.. _bottleneck: https://github.com/pydata/bottleneck

Enhancements
~~~~~~~~~~~~
Expand Down Expand Up @@ -3752,6 +3771,7 @@ Enhancements
explicitly listed variables or index labels:

.. ipython:: python
:okwarning:
# drop variables
ds = xray.Dataset({'x': 0, 'y': 1})
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[tool:pytest]
python_files=test_*.py
testpaths=xarray/tests properties
# Fixed upstream in https://github.com/kwgoodman/bottleneck/pull/199
# Fixed upstream in https://github.com/pydata/bottleneck/pull/199
filterwarnings =
ignore:Using a non-tuple sequence for multidimensional indexing is deprecated:FutureWarning
env =
Expand Down
58 changes: 38 additions & 20 deletions xarray/core/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from . import dtypes, utils
from .alignment import align
from .duck_array_ops import lazy_array_equiv
from .merge import _VALID_COMPAT, unique_variable
from .variable import IndexVariable, Variable, as_variable
from .variable import concat as concat_vars
Expand Down Expand Up @@ -189,26 +190,43 @@ def process_subset_opt(opt, subset):
# all nonindexes that are not the same in each dataset
for k in getattr(datasets[0], subset):
if k not in concat_over:
# Compare the variable of all datasets vs. the one
# of the first dataset. Perform the minimum amount of
# loads in order to avoid multiple loads from disk
# while keeping the RAM footprint low.
v_lhs = datasets[0].variables[k].load()
# We'll need to know later on if variables are equal.
computed = []
for ds_rhs in datasets[1:]:
v_rhs = ds_rhs.variables[k].compute()
computed.append(v_rhs)
if not getattr(v_lhs, compat)(v_rhs):
concat_over.add(k)
equals[k] = False
# computed variables are not to be re-computed
# again in the future
for ds, v in zip(datasets[1:], computed):
ds.variables[k].data = v.data
equals[k] = None
variables = [ds.variables[k] for ds in datasets]
# first check without comparing values i.e. no computes
for var in variables[1:]:
equals[k] = getattr(variables[0], compat)(
var, equiv=lazy_array_equiv
)
if equals[k] is not True:
# exit early if we know these are not equal or that
# equality cannot be determined i.e. one or all of
# the variables wraps a numpy array
break
else:
equals[k] = True

if equals[k] is False:
concat_over.add(k)

elif equals[k] is None:
# Compare the variable of all datasets vs. the one
# of the first dataset. Perform the minimum amount of
# loads in order to avoid multiple loads from disk
# while keeping the RAM footprint low.
v_lhs = datasets[0].variables[k].load()
# We'll need to know later on if variables are equal.
computed = []
for ds_rhs in datasets[1:]:
v_rhs = ds_rhs.variables[k].compute()
computed.append(v_rhs)
if not getattr(v_lhs, compat)(v_rhs):
concat_over.add(k)
equals[k] = False
# computed variables are not to be re-computed
# again in the future
for ds, v in zip(datasets[1:], computed):
ds.variables[k].data = v.data
break
else:
equals[k] = True

elif opt == "all":
concat_over.update(
Expand Down Expand Up @@ -370,7 +388,7 @@ def ensure_common_dims(vars):
result = result.set_coords(coord_names)
result.encoding = result_encoding

result = result.drop(unlabeled_dims, errors="ignore")
result = result.drop_vars(unlabeled_dims, errors="ignore")

if coord is not None:
# add concat dimension last to ensure that its in the final Dataset
Expand Down
Loading

0 comments on commit d49ceef

Please sign in to comment.