-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conflicting values for variable 'lat_bnds' #162
Comments
The time dimension gets concatenated to non-time variables like lat_bnds (shown below) when
The possible solution, in addition to
# flake8: noqa F401
#%%
import xarray as xr
import xcdat
#%%
p = "/p/css03/scratch/cmip6/CMIP/NCC/NorCPM1/historical/r10i1p1f1/Amon/tas/gn/v20190914/*nc"
#%%
# xarray without data_vars="minimal"
# -------------------------------------
# This concats time dimension to non-time vars (non-desirable behavior).
ds_xr_no_min = xr.open_mfdataset(p)
ds_xr_no_min.lat_bnds.shape # (2160, 142, 2)
#%%
# xcdat with only data_vars="minimal"
# -------------------------------------
# MergeError: conflicting values for variable 'lat_bnds' on objects to be
# combined. You can skip this check by # specifying compat='override'.
ds_xcdat_no_min = xcdat.open_mfdataset(p)
#%%
# xarray with all correct settings
# -------------------------------------
# Does not concat time dimension, but an outer join is performed on incorrect
# mismatching values, resulting in an increase in the number of latitude
# coordinate points (non-desirable behavior).
ds_xr_settings = xr.open_mfdataset(
p, data_vars="minimal", coords="minimal", compat="override"
)
ds_xr_settings.lat_bnds.shape # (142, 2)
#%%
# xcdat with all correct settings
# -------------------------------------
# Does not concat time dimension, but an outer join is performed on incorrect
# mismatching values, resulting in an increase in the number of latitude
# coordinate points (non-desirable behavior).
ds_xcdat_settings = xcdat.open_mfdataset(p, coords="minimal", compat="override")
ds_xcdat_settings.lat_bnds.shape # (142, 2)
After resolving the concatenation of the time dimension and data variable compatibility issues, I still noticed
|
I think nan values are being produced because there are very small differences in lat_bnds values as you mentioned (also shown in code example below) and an outer join (union of object indexes) is performed by default. These nan values subsequently increase the size of lat_bnds from (96, 2) to (142, 2) after calling Comparing shapes of lat_bnds between datasets and performing floating point comparison: # flake8: noqa F401
#%%
import numpy as np
import xarray as xr
import xcdat
path = "/p/css03/scratch/cmip6/CMIP/NCC/NorCPM1/historical/r10i1p1f1/Amon/tas/gn/v20190914/"
p = f"{path}*nc"
#%%
# 1. Check xcdat with all correct settings
# -------------------------------------
# Does not concat time dimension, but an outer join is performed on incorrect
# mismatching values, resulting in an increase in the number of latitude
# coordinate points (non-desirable behavior).
ds_mf = xcdat.open_mfdataset(p, coords="minimal", compat="override")
ds_mf.lat_bnds.shape # (142, 2)
# Check for nans
nan_indices = np.where(np.isnan(ds_mf.lat_bnds[:, :].values))[0]
nan_indices.size # 92
#%%
# 2. Check latitude sizes of individual files
# ------------------------------------------
# Make sure that the sizes of the latitude bounds are aligned
ds1 = xr.open_dataset(
f"{path}tas_Amon_NorCPM1_historical_r10i1p1f1_gn_185001-201412.nc"
)
ds2 = xr.open_dataset(
f"{path}tas_Amon_NorCPM1_historical_r10i1p1f1_gn_201501-201812.nc"
)
ds3 = xr.open_dataset(
f"{path}tas_Amon_NorCPM1_historical_r10i1p1f1_gn_201901-202912.nc"
)
ds1.lat_bnds.shape # (96, 2)
ds2.lat_bnds.shape # (96, 2)
ds3.lat_bnds.shape # (96, 2)
#%%
# 3. Check for floating point differences between files
# --------------------------------------------------
np.testing.assert_allclose(ds1.lat_bnds, ds2.lat_bnds)
"""
Not equal to tolerance rtol=1e-07, atol=0
Mismatched elements: 2 / 192 (1.04%)
Max absolute difference: 2.84217094e-14
Max relative difference: 1.17190208e-15
x: array([[-9.000000e+01, -8.905263e+01],
[-8.905263e+01, -8.715789e+01],
[-8.715789e+01, -8.526316e+01],...
y: array([[-90. , -89.052632],
[-89.052632, -87.157895],
[-87.157895, -85.263158],.
"""
np.testing.assert_allclose(ds2.lat_bnds, ds3.lat_bnds) # True
np.testing.assert_allclose(ds1.lat_bnds, ds3.lat_bnds)
"""
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Mismatched elements: 2 / 192 (1.04%)
Max absolute difference: 2.84217094e-14
Max relative difference: 1.17190208e-15
x: array([[-9.000000e+01, -8.905263e+01],
[-8.905263e+01, -8.715789e+01],
[-8.715789e+01, -8.526316e+01],...
y: array([[-90. , -89.052632],
[-89.052632, -87.157895],
[-87.157895, -85.263158],...
""" Comparing #%%
# 4. Use different joins to avoid concatenating additional coordinate points
# ------------------------------------------------------------------------
# a. Outer join (default)
# ~~~~~~~~~~~~~~~~~~~~~~~
# use the union of object indexes, produces nans if there are floating point
# diffs between values
ds_outer = xcdat.open_mfdataset(
p, data_vars="minimal", coords="minimal", compat="override", join="outer"
)
ds_outer.lat_bnds.shape # (142, 2)
nan_indices = np.where(np.isnan(ds_outer.lat_bnds[:, :].values))[0]
nan_indices.size # 92
#%%
# b. Left join
# ~~~~~~~~~~~~~~~~
# use indexes from the first object with each dimension
ds_left = xcdat.open_mfdataset(
p, data_vars="minimal", coords="minimal", compat="override", join="left"
)
ds_left.lat_bnds.shape # (92, 2)
nan_indices = np.where(np.isnan(ds_left.lat_bnds[:, :].values))[0]
nan_indices.size # 0
#%%
# c. Override join
# ~~~~~~~~~~~~~~~~~
# if indexes are of same size, rewrite indexes to be those of the first object
# with that dimension. Indexes for the same dimension must have the same size in
# all objects.
ds_override = xcdat.open_mfdataset(
p, data_vars="minimal", coords="minimal", compat="override", join="override"
)
ds_override.lat_bnds.shape # (92, 2)
nan_indices = np.where(np.isnan(ds_override.lat_bnds[:, :].values))[0]
nan_indices.size # 0
# %%
ds_left.lat_bnds.identical(ds_override.lat_bnds) # True
# Conclusion -- Use data_vars="minimal", coords="minimal", compat="override",
# and join="left" or "override" if datasets have conflictings bounds values The possible solutions I found so far are:
|
Based on my findings above, I think we should provide those two options in the docs/docstring for cases where Datasets have conflicting values. Since xarray/xcdat provides kwarg args to handle this edge case, we should probably avoid implementing code to try to handle it. |
This seems like a reasonable solution. One slightly more complex way of handling this situation would be to try to detect this problem with a pre-processor function, which could throw a warning or exception. I wasn't sure if the pre-processor functions can communicate information between files (e.g., to compare bounds across files) or if they must act independently on each netcdf file (and thus cannot compare bounds across files). |
Thanks, I'll add documentation to cover this situation. The preprocessing function is performed on each file independently before they are merged together into a single Dataset, so there is no communication between files. |
What versions of software are you using?
What are the steps to reproduce this issue?
What happens? Any logs, error output, etc?
I think this is happening because the
lat_bnd
values have slight (e.g., 10-13) differences in the different netCDF files.Any other comments?
Opening with xarray works fine (as long as you do not set
data_vars="minimal"
), though thelat_bnd
is larger than expected:It appears that the bounds differ depending on timestep:
I'm not totally sure why there are
NaN
values. They don't appear when I usencdump -v lat_bnds /p/css03/scratch/cmip6/CMIP/NCC/NorCPM1/historical/r10i1p1f1/Amon/tas/gn/v20190914/tas_Amon_NorCPM1_historical_r10i1p1f1_gn_185001-201412.nc
, but they do appear after loading the dataset with xarray.I do not know what to do with this problem, though CDAT seems to be able to load the bounds with no problem.
The text was updated successfully, but these errors were encountered: