-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drop spatial bounds coordinates if present in cmip6 cleaning #177
drop spatial bounds coordinates if present in cmip6 cleaning #177
Conversation
I haven't tested this in an actual workflow. I checked this in a notebook. I was hoping to merge this in master to be able to test it in a workflow using @brews could you have a look ? Let me know if you see any problem with this change. Thank you ! |
@@ -357,6 +357,12 @@ def standardize_gcm(ds, leapday_removal=True): | |||
coords_to_drop, drop=True | |||
) | |||
|
|||
# Some models have coordinates values (e.g. [1.0, 2.0]) for the spatial bounds dimension. We don't need this. | |||
if "bnds" in ds_cleaned.variables: | |||
ds_cleaned = ds_cleaned.drop( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ds_cleaned = ds_cleaned.drop( | |
ds_cleaned = ds_cleaned.drop_vars( |
I think .drop()
was lightly deprecated in favor of more specific .drop_vars()
or .drop_sel()
back in pydata/xarray#3475.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...Interesting because xarray's error message says to use .drop()
:
ValueError: when setting
region
explicitly in to_zarr(), all variables in the dataset to write must have at least one dimension in common with the region's dimensions ['time'], but that is not the case for some variables here. To drop these variables from this dataset before exporting to zarr, write: .drop(['bnds'])
Might be worth a PR to xarray updating this message.
Thanks for making progress on this, @emileten! As I look at this, I see there might actually be a better way around this back in the regridding workflowtemplate... let me try something and I'll follow up here or in ClimateImpactLab/downscaleCMIP6#494 |
I'm going to close this because I think we resolved this problem elsewhere. Please reopen or file a new issue if still needed. |
The presence of coordinates values associated with the spatial bounds dimension can cause errors downstream, in particular when doing distributed regridding. We have an example of this kind of problem in the above linked issue.
Since the libraries we use do not need this information -- these libraries have been doing their job correctly when exposed to datasets that do not have these coordinates -- I decided to simply drop this information in cleaning.
This only drops the coordinates, not the dimension, which we need.