Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect user-specified coordinates attribute. #3487

Merged
merged 23 commits into from
Dec 10, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
cd8f727
Respect user-specified coordinates attribute.
dcherian Oct 4, 2019
59a87f1
Add whats-new
dcherian Nov 5, 2019
b80bf20
Better if statement.
dcherian Nov 5, 2019
4ca804d
maybe it's better to not raise an error if "coordinates" in attrs.
dcherian Nov 5, 2019
7a3cdd7
tweak whats-new.
dcherian Nov 5, 2019
acdfdcd
more thorough test.
dcherian Nov 6, 2019
c595795
Merge branch 'master' into fix/user-coordinates
Nov 6, 2019
ab0c67d
Emit one "coordinates" warning per dataset, instead of one per variable.
dcherian Nov 6, 2019
46de4b7
Preserve attrs["coordinates"] when roundtripping with decode_coords=F…
dcherian Nov 6, 2019
2279c36
Avoid raising warnings.
dcherian Nov 7, 2019
d49ceef
Merge remote-tracking branch 'upstream/master' into fix/user-coordinates
dcherian Nov 12, 2019
bb62590
Merge branch 'fix/user-coordinates' of github.com:dcherian/xarray int…
dcherian Nov 12, 2019
0abe242
fix whats-new
dcherian Nov 12, 2019
1a42c6c
[minor] add comments
dcherian Nov 14, 2019
e1a3823
Merge remote-tracking branch 'upstream/master' into fix/user-coordinates
dcherian Nov 14, 2019
1c97c1e
fix whats-new
dcherian Nov 14, 2019
5773e5e
Actually test global "coordinates" handling.
dcherian Nov 14, 2019
670f97f
filerwarning not necessary.
dcherian Nov 14, 2019
e1c48c9
Merge branch 'master' into fix/user-coordinates
dcherian Nov 19, 2019
77c7c0f
Merge remote-tracking branch 'upstream/master' into fix/user-coordinates
dcherian Dec 5, 2019
3ca669b
Add comment
dcherian Dec 5, 2019
d5d3734
Merge branch 'fix/user-coordinates' of github.com:dcherian/xarray int…
dcherian Dec 5, 2019
8c526fd
fix whats-new
dcherian Dec 5, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions doc/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,17 @@ supported by netCDF4-python: 'standard', 'gregorian', 'proleptic_gregorian' 'nol
By default, xarray uses the 'proleptic_gregorian' calendar and units of the smallest time
difference between values, with a reference time of the first time value.

Coordinates
...........

You can control the ``coordinates`` attribute written to disk by specifying ``encoding["coordinates"]``.
dcherian marked this conversation as resolved.
Show resolved Hide resolved
If not specified, xarray automatically sets the ``"coordinates"`` attribute to a space-delimited list
of names of coordinate variables that share dimensions with the variable.
This allows perfect roundtripping of xarray datasets but may not be desirable.
When an xarray dataset contains non-dimensional coordinates that do not share dimensions with any of
the variables, these coordinate variable names are saved under a global ``"coordinates"`` attribute.
This is not CF-compliant but again facilitates roundtripping of xarray datasets.

Invalid netCDF files
~~~~~~~~~~~~~~~~~~~~

Expand Down
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ New Features
invoked. (:issue:`3378`, :pull:`3446`)
By `Deepak Cherian <https://github.com/dcherian>`_ and
`Guido Imperiale <https://github.com/crusaderky>`_.
- xarray now respects the ``encoding["coordinates"]`` attribute when writing to disk. (:issue:`3351`, :pull:`3487`)
By `Deepak Cherian <https://github.com/dcherian>`_.

Bug fixes
~~~~~~~~~
Expand Down
37 changes: 24 additions & 13 deletions xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -663,31 +663,42 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
global_coordinates.discard(coord_name)

variables = {k: v.copy(deep=False) for k, v in variables.items()}

# These coordinates are saved according to CF conventions
for var_name, coord_names in variable_coordinates.items():
attrs = variables[var_name].attrs
for name, var in variables.items():
encoding = var.encoding
attrs = var.attrs
if "coordinates" in attrs:
raise ValueError(
"cannot serialize coordinates because variable "
"%s already has an attribute 'coordinates'" % var_name
warnings.warn(
f"'coordinates' found in attrs for variable {name!r}. This will be ignored. Instead please specify the 'coordinates' string in encoding.",
UserWarning,
)
attrs["coordinates"] = " ".join(map(str, coord_names))

if "coordinates" in encoding:
warnings.warn(
f"'coordinates' attribute detected on variable {name!r}. This may prevent faithful roundtripping of xarray Datasets.",
UserWarning,
)
attrs["coordinates"] = encoding["coordinates"]
else:
if variable_coordinates[name]:
attrs["coordinates"] = " ".join(map(str, variable_coordinates[name]))

# These coordinates are not associated with any particular variables, so we
# save them under a global 'coordinates' attribute so xarray can roundtrip
# the dataset faithfully. Because this serialization goes beyond CF
# conventions, only do it if necessary.
# Reference discussion:
# http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2014/057771.html
# http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2014/007571.html
if global_coordinates:
attributes = dict(attributes)
if "coordinates" in attributes:
raise ValueError(
"cannot serialize coordinates because the global "
"attribute 'coordinates' already exists"
warnings.warn(
f"cannot serialize global coordinates {global_coordinates!r} because the global "
f"attribute 'coordinates' already exists. This may prevent faithful roundtripping"
f"of xarray datasets",
UserWarning,
)
attributes["coordinates"] = " ".join(map(str, global_coordinates))
else:
attributes["coordinates"] = " ".join(map(str, global_coordinates))

return variables, attributes

Expand Down
11 changes: 11 additions & 0 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -810,6 +810,17 @@ def equals_latlon(obj):
assert "coordinates" not in ds["lat"].attrs
assert "coordinates" not in ds["lon"].attrs

original["temp"].encoding["coordinates"] = "lat"
with self.roundtrip(original) as actual:
assert_identical(actual, original)
with create_tmp_file() as tmp_file:
original.to_netcdf(tmp_file)
with open_dataset(tmp_file, decode_coords=False) as ds:
assert equals_latlon(ds.precip.attrs["coordinates"])
assert "lon" not in ds["temp"].attrs["coordinates"]
assert "coordinates" not in ds["lat"].attrs
assert "coordinates" not in ds["lon"].attrs
dcherian marked this conversation as resolved.
Show resolved Hide resolved

def test_roundtrip_endian(self):
ds = Dataset(
{
Expand Down
15 changes: 15 additions & 0 deletions xarray/tests/test_conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,21 @@ def test_multidimensional_coordinates(self):
# Should not have any global coordinates.
assert "coordinates" not in attrs

def test_do_not_overwrite_user_coordinates(self):
orig = Dataset(
coords={"x": [0, 1, 2], "y": ("x", [5, 6, 7]), "z": ("x", [8, 9, 10])},
data_vars={"a": ("x", [1, 2, 3]), "b": ("x", [3, 5, 6])},
)
orig["a"].encoding["coordinates"] = "y"
orig["b"].encoding["coordinates"] = "z"
with pytest.warns(UserWarning, match="'coordinates' attribute"):
enc, _ = conventions.encode_dataset_coordinates(orig)
assert enc["a"].attrs["coordinates"] == "y"
assert enc["b"].attrs["coordinates"] == "z"
orig["a"].attrs["coordinates"] = "foo"
with pytest.warns(UserWarning, match="'coordinates' found in attrs"):
conventions.encode_dataset_coordinates(orig)

@requires_dask
def test_string_object_warning(self):
original = Variable(("x",), np.array(["foo", "bar"], dtype=object)).chunk()
Expand Down