Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect user-specified coordinates attribute. #3487

Merged
merged 23 commits into from
Dec 10, 2019
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
cd8f727
Respect user-specified coordinates attribute.
dcherian Oct 4, 2019
59a87f1
Add whats-new
dcherian Nov 5, 2019
b80bf20
Better if statement.
dcherian Nov 5, 2019
4ca804d
maybe it's better to not raise an error if "coordinates" in attrs.
dcherian Nov 5, 2019
7a3cdd7
tweak whats-new.
dcherian Nov 5, 2019
acdfdcd
more thorough test.
dcherian Nov 6, 2019
c595795
Merge branch 'master' into fix/user-coordinates
Nov 6, 2019
ab0c67d
Emit one "coordinates" warning per dataset, instead of one per variable.
dcherian Nov 6, 2019
46de4b7
Preserve attrs["coordinates"] when roundtripping with decode_coords=F…
dcherian Nov 6, 2019
2279c36
Avoid raising warnings.
dcherian Nov 7, 2019
d49ceef
Merge remote-tracking branch 'upstream/master' into fix/user-coordinates
dcherian Nov 12, 2019
bb62590
Merge branch 'fix/user-coordinates' of github.com:dcherian/xarray int…
dcherian Nov 12, 2019
0abe242
fix whats-new
dcherian Nov 12, 2019
1a42c6c
[minor] add comments
dcherian Nov 14, 2019
e1a3823
Merge remote-tracking branch 'upstream/master' into fix/user-coordinates
dcherian Nov 14, 2019
1c97c1e
fix whats-new
dcherian Nov 14, 2019
5773e5e
Actually test global "coordinates" handling.
dcherian Nov 14, 2019
670f97f
filerwarning not necessary.
dcherian Nov 14, 2019
e1c48c9
Merge branch 'master' into fix/user-coordinates
dcherian Nov 19, 2019
77c7c0f
Merge remote-tracking branch 'upstream/master' into fix/user-coordinates
dcherian Dec 5, 2019
3ca669b
Add comment
dcherian Dec 5, 2019
d5d3734
Merge branch 'fix/user-coordinates' of github.com:dcherian/xarray int…
dcherian Dec 5, 2019
8c526fd
fix whats-new
dcherian Dec 5, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,20 @@ supported by netCDF4-python: 'standard', 'gregorian', 'proleptic_gregorian' 'nol
By default, xarray uses the 'proleptic_gregorian' calendar and units of the smallest time
difference between values, with a reference time of the first time value.


.. _io.coordinates:

Coordinates
...........

You can control the ``coordinates`` attribute written to disk by specifying ``DataArray.encoding["coordinates"]``.
If not specified, xarray automatically sets ``DataArray.encoding["coordinates"]`` to a space-delimited list
of names of coordinate variables that share dimensions with the variable.
This allows perfect roundtripping of xarray datasets but may not be desirable.
When an xarray dataset contains non-dimensional coordinates that do not share dimensions with any of
the variables, these coordinate variable names are saved under a "global" ``"coordinates"`` attribute.
This is not CF-compliant but again facilitates roundtripping of xarray datasets.

Invalid netCDF files
~~~~~~~~~~~~~~~~~~~~

Expand Down
23 changes: 13 additions & 10 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,25 +30,25 @@ Breaking changes
(`cftime/126 <https://github.com/Unidata/cftime/issues/126>`_);
please use version 1.0.4.2 instead.

- All leftover support for dates from non-standard calendars through netcdftime, the
- All leftover support for dates from non-standard calendars through ``netcdftime``, the
module included in versions of netCDF4 prior to 1.4 that eventually became the
cftime package, has been removed in favor of relying solely on the standalone
cftime package (:pull:`3450`).
``cftime`` package (:pull:`3450`).
By `Spencer Clark <https://github.com/spencerkclark>`_.

New Features
~~~~~~~~~~~~
- :py:meth:`Dataset.transpose` and :py:meth:`DataArray.transpose` now support an ellipsis (`...`)
to represent all 'other' dimensions. For example, to move one dimension to the front,
use `.transpose('x', ...)`. (:pull:`3421`)
use ``.transpose('x', ...)``. (:pull:`3421`)
By `Maximilian Roos <https://github.com/max-sixty>`_
- Changed `xr.ALL_DIMS` to equal python's `Ellipsis` (`...`), and changed internal usages to use
`...` directly. As before, you can use this to instruct a `groupby` operation
to reduce over all dimensions. While we have no plans to remove `xr.ALL_DIMS`, we suggest
- Changed ``xr.ALL_DIMS`` to equal python's ``Ellipsis`` (`...`), and changed internal usages to use
`...` directly. As before, you can use this to instruct a ``groupby`` operation
to reduce over all dimensions. While we have no plans to remove ``xr.ALL_DIMS``, we suggest
using `...`. (:pull:`3418`)
By `Maximilian Roos <https://github.com/max-sixty>`_
- :py:func:`~xarray.dot`, and :py:func:`~xarray.DataArray.dot` now support the
`dims=...` option to sum over the union of dimensions of all input arrays
- :py:func:`xarray.dot`, and :py:meth:`DataArray.dot` now support the
``dims=...`` option to sum over the union of dimensions of all input arrays
(:issue:`3423`) by `Mathias Hauser <https://github.com/mathause>`_.
- Added new :py:meth:`Dataset._repr_html_` and :py:meth:`DataArray._repr_html_` to improve
representation of objects in jupyter. By default this feature is turned off
Expand All @@ -59,10 +59,13 @@ New Features
<https://docs.dask.org/en/latest/custom-collections.html#deterministic-hashing>`_
for xarray objects. Note that xarray objects with a dask.array backend already used
deterministic hashing in previous releases; this change implements it when whole
xarray objects are embedded in a dask graph, e.g. when :meth:`DataArray.map` is
xarray objects are embedded in a dask graph, e.g. when :py:meth:`DataArray.map_blocks` is
invoked. (:issue:`3378`, :pull:`3446`)
By `Deepak Cherian <https://github.com/dcherian>`_ and
`Guido Imperiale <https://github.com/crusaderky>`_.
- xarray now respects the ``DataArray.encoding["coordinates"]`` attribute when writing to disk.
See :ref:`io.coordinates` for more. (:issue:`3351`, :pull:`3487`)
By `Deepak Cherian <https://github.com/dcherian>`_.

Bug fixes
~~~~~~~~~
Expand All @@ -73,7 +76,7 @@ Bug fixes
- Use dask names to compare dask objects prior to comparing values after computation.
(:issue:`3068`, :issue:`3311`, :issue:`3454`, :pull:`3453`).
By `Deepak Cherian <https://github.com/dcherian>`_.
- Sync with cftime by removing `dayofwk=-1` for cftime>=1.0.4.
- Sync with cftime by removing ``dayofwk=-1`` for cftime>=1.0.4.
By `Anderson Banihirwe <https://github.com/andersy005>`_.
- Fix :py:meth:`xarray.core.groupby.DataArrayGroupBy.reduce` and
:py:meth:`xarray.core.groupby.DatasetGroupBy.reduce` when reducing over multiple dimensions.
Expand Down
37 changes: 24 additions & 13 deletions xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -663,31 +663,42 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
global_coordinates.discard(coord_name)

variables = {k: v.copy(deep=False) for k, v in variables.items()}

# These coordinates are saved according to CF conventions
for var_name, coord_names in variable_coordinates.items():
attrs = variables[var_name].attrs
for name, var in variables.items():
encoding = var.encoding
attrs = var.attrs
if "coordinates" in attrs:
raise ValueError(
"cannot serialize coordinates because variable "
"%s already has an attribute 'coordinates'" % var_name
warnings.warn(
f"'coordinates' found in attrs for variable {name!r}. This will be ignored. Instead please specify the 'coordinates' string in encoding.",
UserWarning,
)
attrs["coordinates"] = " ".join(map(str, coord_names))

if "coordinates" in encoding:
warnings.warn(
f"'coordinates' attribute detected on variable {name!r}. This may prevent faithful roundtripping of xarray Datasets.",
UserWarning,
)
attrs["coordinates"] = encoding["coordinates"]
else:
if variable_coordinates[name]:
attrs["coordinates"] = " ".join(map(str, variable_coordinates[name]))

# These coordinates are not associated with any particular variables, so we
# save them under a global 'coordinates' attribute so xarray can roundtrip
# the dataset faithfully. Because this serialization goes beyond CF
# conventions, only do it if necessary.
# Reference discussion:
# http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2014/057771.html
# http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2014/007571.html
if global_coordinates:
attributes = dict(attributes)
if "coordinates" in attributes:
raise ValueError(
"cannot serialize coordinates because the global "
"attribute 'coordinates' already exists"
warnings.warn(
f"cannot serialize global coordinates {global_coordinates!r} because the global "
f"attribute 'coordinates' already exists. This may prevent faithful roundtripping"
f"of xarray datasets",
UserWarning,
)
attributes["coordinates"] = " ".join(map(str, global_coordinates))
else:
attributes["coordinates"] = " ".join(map(str, global_coordinates))

return variables, attributes

Expand Down
17 changes: 17 additions & 0 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -810,6 +810,23 @@ def equals_latlon(obj):
assert "coordinates" not in ds["lat"].attrs
assert "coordinates" not in ds["lon"].attrs

original["temp"].encoding["coordinates"] = "lat"
with pytest.warns(UserWarning, match="'coordinates' attribute"):
with self.roundtrip(original) as actual:
assert_identical(actual, original)
original["precip"].encoding["coordinates"] = "lat"
with create_tmp_file() as tmp_file:
with pytest.warns(UserWarning, match="'coordinates' attribute"):
original.to_netcdf(tmp_file)
with open_dataset(tmp_file, decode_coords=True) as ds:
# lon was explicitly excluded from coordinates so it should
# be a data variable
assert "lon" not in ds["temp"].encoding["coordinates"]
assert "lon" not in ds["precip"].encoding["coordinates"]
assert "lon" not in ds.coords
assert "coordinates" not in ds["lat"].encoding
assert "coordinates" not in ds["lon"].encoding

def test_roundtrip_endian(self):
ds = Dataset(
{
Expand Down
16 changes: 16 additions & 0 deletions xarray/tests/test_conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ def test_missing_fillvalue(self):
with pytest.warns(Warning, match="floating point data as an integer"):
conventions.encode_cf_variable(v)

@pytest.mark.filterwarnings("ignore::UserWarning")
def test_multidimensional_coordinates(self):
# regression test for GH1763
# Set up test case with coordinates that have overlapping (but not
Expand Down Expand Up @@ -136,6 +137,21 @@ def test_multidimensional_coordinates(self):
# Should not have any global coordinates.
assert "coordinates" not in attrs

def test_do_not_overwrite_user_coordinates(self):
orig = Dataset(
coords={"x": [0, 1, 2], "y": ("x", [5, 6, 7]), "z": ("x", [8, 9, 10])},
data_vars={"a": ("x", [1, 2, 3]), "b": ("x", [3, 5, 6])},
)
orig["a"].encoding["coordinates"] = "y"
orig["b"].encoding["coordinates"] = "z"
with pytest.warns(UserWarning, match="'coordinates' attribute"):
enc, _ = conventions.encode_dataset_coordinates(orig)
assert enc["a"].attrs["coordinates"] == "y"
assert enc["b"].attrs["coordinates"] == "z"
orig["a"].attrs["coordinates"] = "foo"
with pytest.warns(UserWarning, match="'coordinates' found in attrs"):
conventions.encode_dataset_coordinates(orig)

@requires_dask
def test_string_object_warning(self):
original = Variable(("x",), np.array(["foo", "bar"], dtype=object)).chunk()
Expand Down