Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CF accessor adds grid mapping as coordinate #513

Open
juseg opened this issue May 31, 2024 · 5 comments
Open

CF accessor adds grid mapping as coordinate #513

juseg opened this issue May 31, 2024 · 5 comments

Comments

@juseg
Copy link

juseg commented May 31, 2024

Possibly a duplicate of #357 or a discussion topic. Opinions welcomed.

Description

Since cf_xarray == 0.8.0 (#391), selecting variables by standard name with ds.cf[standard_name] results in a different set of coordinates than selecting them by their short name with ds[short_name]. This prevents re-injecting these variables in a compatible (and even the original) dataset using Dataset.assign() (and possibly other xarray merge operations). Instead trying to Dataset.assign() a cf-picked, georeferenced variable into a raises a MergeError.

Could you help me to understand if this is a design choice or a bug? In my case it comes as a breaking change and I am not exactly sure how to go about (as a cf-xarray fix or downstream in juseg/hyoga#73). Happy to contribute as I can.

Miminal example

import xarray as xr
import cf_xarray

# make grid-mapped fake data (or use xr.open_dataset)
ds = xr.Dataset({
    'tas': ((), None, {
        'grid_mapping': 'crs',
        'standard_name': 'air_temperature'}),
    'crs': ((), None, {
        'grid_mapping_name': 'latitude_longitude'})})

# ds and ds.cf variables have different coordinates
print('crs' in ds['tas'].coords)  # False
print('crs' in ds.cf['air_temperature'].coords)  # True

# assign a cf-picked variable (only works on cf_xarray < 0.8.0)
ds.assign(new=ds.cf['air_temperature'])  # xarray.core.merge.MergeError

Error message

Traceback (most recent call last):
  File "/home/julien/git/code/hyoga/bugs/cfxr08.py", line 17, in <module>
    ds.assign(new=ds.cf['air_temperature'])  # xarray.core.merge.MergeError
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/xarray/core/dataset.py", line 6080, in assign
    data.update(results)
  File "/usr/lib/python3.12/site-packages/xarray/core/dataset.py", line 4946, in update
    merge_result = dataset_update_method(self, other)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/xarray/core/merge.py", line 1104, in dataset_update_method
    return merge_core(
           ^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/xarray/core/merge.py", line 772, in merge_core
    raise MergeError(
xarray.core.merge.MergeError: unable to determine if these variables should be coordinates or not in the merged result: {'crs'}
@dcherian
Copy link
Contributor

dcherian commented May 31, 2024

This is a design choice :) .

Many users complain that xarray's default behaviour isn't what cf-xarray currently does.

Some solutions:

  1. Extract the variable name manually: ds[*ds.cf.standard_names['air_temperature']]
  2. Use the underlying variable object: ds.assign(a=ds.cf['air_temperature'].variable)

xarray.core.merge.MergeError: unable to determine if these variables should be coordinates or not in the merged result: {'crs'}

I also think this should just set crs as a coord. Not sure if there is an xarray issue for this.

@juseg
Copy link
Author

juseg commented Jun 4, 2024

This is a design choice :) .

Thank you for the clarification and code samples! :-) I will settle on 2. I think 1 needs double brackets:

ds[[*ds.cf.standard_names['air_temperature']]]

I also think this should just set crs as a coord. Not sure if there is an xarray issue for this.

I could not find an xarray issue on that (https://github.com/pydata/xarray/issues?q=is%3Aissue+mapping+MergeError). However I am still unconvinced whether grid mappings should be considered coordinates (see https://docs.xarray.dev/en/stable/user-guide/terminology.html#term-Coordinate).

Perhaps that is far-fetched, but would you consider a cf-xarray re-implementation of assign? I think it could assign by standard name, and overcome any xarray incompatibility.

ds.cf.assign(air_temperature=darray)

I do something like that at https://github.com/juseg/hyoga/blob/v0.3.0/hyoga/core/accessor.py#L134-L186.

@dcherian
Copy link
Contributor

dcherian commented Jun 4, 2024

could not find an xarray issue on that
I am still unconvinced whether grid mappings should be considered coordinates

pydata/xarray#6447 . In the xarray model "coordinates" is the only way to link variables to each other. That's why I think they should be... Either way, I think this merge should succeed, whether the variable is promoted to coordinates is a somewhat arbitrary implementation detail.

Perhaps that is far-fetched, but would you consider a cf-xarray re-implementation of assign?

Not far fetched at all! I'd be happy to merge a PR adding this.

@juseg
Copy link
Author

juseg commented Jun 5, 2024

OK I understand your point with the coordinates now. And great, I will be working on an assign PR then!

@dcherian
Copy link
Contributor

dcherian commented Jun 5, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants