Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: spatial average error following temporal resample #583

Closed
arfriedman opened this issue Jan 2, 2024 · 6 comments
Closed

[Bug]: spatial average error following temporal resample #583

arfriedman opened this issue Jan 2, 2024 · 6 comments

Comments

@arfriedman
Copy link

What happened?

I am encountering an error when taking an xcdat spatial average following xarray temporal resampling.

I am working with monthly model precipitation output named pr_mon:

In [7]: pr_mon
Out[7]:
<xarray.Dataset>
Dimensions:   (lon: 192, lat: 96, time: 7056, bnds: 2)
Coordinates:
  * lon       (lon) float64 -180.0 -178.1 -176.2 -174.4 ... 174.4 176.2 178.1
  * lat       (lat) float64 88.57 86.72 84.86 83.0 ... -84.86 -86.72 -88.57
  * time      (time) object 1421-01-31 00:00:00 ... 2008-12-31 00:00:00
Dimensions without coordinates: bnds
Data variables:
    lon_bnds  (lon, bnds) float64 -180.9 -179.1 -179.1 ... 177.2 177.2 179.1
    lat_bnds  (lat, bnds) float64 89.5 87.65 87.65 85.79 ... -87.65 -87.65 -89.5
    precip    (time, lat, lon) float64 13.27 13.14 13.05 ... 3.607 3.466 3.355

Taking a spatial average of this monthly variable works fine:
In [8]: pr_mon_avg = pr_mon.spatial.average("precip")

However, after I temporally resample the data using xarray, xcdat spatial averaging returns an error. Here is an example for a three-month temporal sum:


In [10]: pr_3M = pr_mon.resample(time="3ME").sum()
In [11]: pr_3M
Out[11]:
<xarray.Dataset>
Dimensions:   (lon: 192, lat: 96, time: 2353, bnds: 2)
Coordinates:
  * lon       (lon) float64 -180.0 -178.1 -176.2 -174.4 ... 174.4 176.2 178.1
  * lat       (lat) float64 88.57 86.72 84.86 83.0 ... -84.86 -86.72 -88.57
  * time      (time) object 1421-01-31 00:00:00 ... 2009-01-31 00:00:00

In [12]: pr_3M_avg = pr_3M.spatial.average("precip")
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[12], line 1
----> 1 pr_3M_avg = pr_3M.spatial.average("precip")

File ~/miniconda3/envs/Nile-Laki/lib/python3.11/site-packages/xcdat/spatial.py:195, in SpatialAccessor.average(self, data_var, axis, weights, keep_weights, lat_bounds, lon_bounds)
    192 elif isinstance(weights, xr.DataArray):
    193     self._weights = weights
--> 195 self._validate_weights(dv, axis)
    196 ds[dv.name] = self._averager(dv, axis)
    198 if keep_weights:

File ~/miniconda3/envs/Nile-Laki/lib/python3.11/site-packages/xcdat/spatial.py:687, in SpatialAccessor._validate_weights(self, data_var, axis)
    685     dim_name = get_dim_keys(data_var, key)
    686     if dim_name not in self._weights.dims:
--> 687         raise KeyError(
    688             f"The weights DataArray does not include an {key} axis, or the "
    689             "dimension names are not the same."
    690         )
    692 # Check the weight dim sizes equal data var dim sizes.
    693 dim_sizes = {key: data_var.sizes[key] for key in self._weights.sizes.keys()}

KeyError: 'The weights DataArray does not include an X axis, or the dimension names are not the same.'

Do you know what might account for this error?

What did you expect to happen? Are there are possible answers you came across?

No response

Minimal Complete Verifiable Example (MVCE)

No response

Relevant log output

No response

Anything else we need to know?

No response

Environment

xCDAT version 0.6.1

In [13]: xr.show_versions()
/home/andrew/miniconda3/envs/Nile-Laki/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-86-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2023.12.0
pandas: 2.1.4
numpy: 1.26.2
scipy: 1.11.4
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: 2023.12.1
distributed: 2023.12.1
matplotlib: 3.8.2
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.12.2

@pochedls
Copy link
Collaborator

pochedls commented Jan 2, 2024

Thanks for reporting this. In trying to see if something obvious was wrong, I ended up having a similar (but different) issue.

# grab a random file
wget http://esgf-data.ucar.edu/thredds/fileServer/esg_dataroot/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120/pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc

I get an issue with the resample command

# ESGF file
fn = 'pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc'
# or, locally, /p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120/pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc
ds = xc.open_dataset(fn)
pr_3M = ds.pr.resample(time="3ME").sum()

ValueError: Invalid frequency string provided

but the following resample string works (and results in a different error message)

pr_3M = ds.resample(time="3M").sum()
pr_3M_avg = pr_3M.spatial.average('pr')

ValueError: More than one grid cell spans prime meridian.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Jan 3, 2024

@arfriedman Can you provide us with a public link to the dataset so we can take a look at it? It looks like the precip variable is either missing the CF "axis" or "standard_name" attributes or they are incorrectly set.

I get an issue with the resample command

# ESGF file
fn = 'pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc'
# or, locally, /p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120/pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc
ds = xc.open_dataset(fn)
pr_3M = ds.pr.resample(time="3ME").sum()

ValueError: Invalid frequency string provided

but the following resample string works (and results in a different error message)

pr_3M = ds.pr.resample(time="3ME").sum()
pr_3M_avg = pr_3M.spatial.average('pr')

ValueError: More than one grid cell spans prime meridian.

@pochedls I tried your code but it didn't work for me. The code below works for resampling, but I also get the ValueError for spatial averaging. I resample using "3M" and not "3ME".

import xcdat as xc

fn = "/p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120/pr_Amon_CESM2-FV2_historical_r1i1p1f1_gn_185001-189912.nc"

# Resampling works
ds = xc.open_dataset(fn)
ds = ds.resample(time="3M").sum()

# ValueError: More than one grid cell spans prime meridian.
ds_3m_avg = ds.spatial.average("pr")

@pochedls
Copy link
Collaborator

pochedls commented Jan 3, 2024

Thanks @tomvothecoder – I pasted in the wrong code excerpt (I also used 3M) – this is fixed now. We're getting the same error.

In looking at this again, I think the longitude bounds get messed up after resample. They have a different shape (the latitude bounds are modified also).

@arfriedman – Do your bounds also change? Is there a way to use resample while retaining the lon_bnds shape? A potential workaround in this case is to copy the original bounds to the dataset after re-sample.

ds_resampled['lon_bnds'] = ds_original.lon_bnds
ds_resampled['lat_bnds'] = ds_original.lat_bnds

@tomvothecoder / @arfriedman – do you think this is an xarray issue / question?

original bounds

ds = xc.open_dataset(fn)
print(ds.lon)

<xarray.DataArray 'lon' (lon: 144)>
array([ 0. , 2.5, 5. , 7.5, 10. , 12.5, 15. , 17.5, 20. , 22.5,
25. , 27.5, 30. , 32.5, 35. , 37.5, 40. , 42.5, 45. , 47.5,
50. , 52.5, 55. , 57.5, 60. , 62.5, 65. , 67.5, 70. , 72.5,
75. , 77.5, 80. , 82.5, 85. , 87.5, 90. , 92.5, 95. , 97.5,
100. , 102.5, 105. , 107.5, 110. , 112.5, 115. , 117.5, 120. , 122.5,
125. , 127.5, 130. , 132.5, 135. , 137.5, 140. , 142.5, 145. , 147.5,
150. , 152.5, 155. , 157.5, 160. , 162.5, 165. , 167.5, 170. , 172.5,
175. , 177.5, 180. , 182.5, 185. , 187.5, 190. , 192.5, 195. , 197.5,
200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
325. , 327.5, 330. , 332.5, 335. , 337.5, 340. , 342.5, 345. , 347.5,
350. , 352.5, 355. , 357.5])
Coordinates:

  • lon (lon) float64 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
    ...
print(ds.lon_bnds.values)

[[ -1.25 1.25]
[ 1.25 3.75]
...
[356.25 358.75]]

print(ds.lon_bnds.shape)

(144, 2)

Bounds after re-sample

ds2 = ds.resample(time="3M").sum()
print(np.allclose(ds.lon, ds2.lon))

True

print(ds2.lon_bnds.values)

[[[ -1.25 1.25]
[ 1.25 3.75]
[ 3.75 6.25]
...
[351.25 353.75]
[353.75 356.25]
[356.25 358.75]]

[[ -1.25 1.25]
[ 1.25 3.75]
[ 3.75 6.25]
...
[351.25 353.75]
[353.75 356.25]
[356.25 358.75]]
...

print(ds2.lon_bnds.shape)

(201, 144, 2)

@arfriedman
Copy link
Author

arfriedman commented Jan 4, 2024

Thanks @tomvothecoder and @pochedls.

@tomvothecoder here's a link to the model output -- apologies for the large file size:
https://unibe365-my.sharepoint.com/:u:/g/personal/andrew_friedman_unibe_ch/Ec0kAh1-AthCnUVrdH5oGvwBf2zXmzLH0kENqwPL5iqnuA?e=y1fqW4

@pochedls I confirm that spatial averaging works after applying your work-around of copying the original dataset lat and lon bounds. To me this does suggest that the issue is from xarray.

@tomvothecoder
Copy link
Collaborator

@pochedls @arfriedman

xarray.Dataset.resample() adds the time dimension to variables that did not originally have a time dimension. There is an Xarray issue for this topic that has been open since 2018. shoyer's comment explains this undesired behavior and possible fixes that need to be implemented in the Xarray API.

If anybody is interested, reviving the thread sounds like a good idea since the last comment was from early 2022.

Example

Before resampling -- no time dimension on "lat_bnds" and "lon_bnds"

ds = xc.open_dataset(fn)
print(ds.data_vars)
"""
Data variables:
    pr         (time, lat, lon) float32 ...
    time_bnds  (time, nbnd) object ...
    lat_bnds   (lat, nbnd) float64 ...
    lon_bnds   (lon, nbnd) float64 ...
"""

After resampling -- time dimension added to "lat_bnds" and "lon_bnds" -> breaks spatial averaging

ds_3m = ds.resample(time="3M").sum()
"""
Data variables:
    pr        (time, lat, lon) float32 2.919e-06 2.918e-06 ... 4.918e-06
    lat_bnds  (time, lat, nbnd) float64 -90.95 -89.05 -89.05 ... 89.05 90.95
    lon_bnds  (time, lon, nbnd) float64 -1.25 1.25 1.25 ... 356.2 356.2 358.8
"""

Workaround

Use bounds from the original dataset

ds_3m["lat_bnds"] = ds.lat_bnds.copy()
ds_3m["lon_bnds"] = ds.lon_bnds.copy()
ds_3m_avg = ds_3m.spatial.average("pr")

@tomvothecoder
Copy link
Collaborator

Closing this issue since it is rooted in Xarray

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants