Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]:.spatial.average(lat_bnds=(a,b) does not appear to be working properly #494

Closed
AaronDonahue opened this issue Jun 2, 2023 · 27 comments · Fixed by #495
Closed

[Bug]:.spatial.average(lat_bnds=(a,b) does not appear to be working properly #494

AaronDonahue opened this issue Jun 2, 2023 · 27 comments · Fixed by #495
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@AaronDonahue
Copy link

AaronDonahue commented Jun 2, 2023

What happened?

I am a new user to XCDAT so this could definitely be user error. I have a data set that I have am loading on perlmutter with this code

# Load SCREAMv1 regridded files with the Surface related output
directory_prefix='/global/cfs/cdirs/e3sm/terai/SCREAM/v1_production'
f_Jan_SurfVars=xcdat.open_mfdataset(directory_prefix+'/ne1024pg2_ne1024pg2_DY2/output.scream.SurfVars.INSTANT.nmins_x15.20*nc', add_bounds=True, parallel=True)

and then I attempt to take a spatial average with latitude bounds. As a sanity check I took the average for wildly different areas of the globe expecting them to be different, but they end up matching exactly.
I noticed this when plotting the average for more realistic cases and noticing the plots matched.

What did you expect to happen? Are there are possible answers you came across?

I expected the two different lat bounds to produce different answers.

Minimal Complete Verifiable Example (MVCE)

# Load SCREAMv1 regridded files with the Surface related output
directory_prefix='/global/cfs/cdirs/e3sm/terai/SCREAM/v1_production'
f_Jan_SurfVars=xcdat.open_mfdataset(directory_prefix+'/ne1024pg2_ne1024pg2_DY2/output.scream.SurfVars.INSTANT.nmins_x15.20*nc', add_bounds=True, parallel=True)

varname = "T_2m"
filespec = f_Jan_SurfVars
a = np.array(filespec.spatial.average(varname, lat_bounds=(0,10),axis=["X","Y"], weights=v1_weights)[varname].sel(time=slice('2020-01-20 00:00:00','2020-02-28 23:59:59')))
b = np.array(filespec.spatial.average(varname, lat_bounds=(80,90),axis=["X","Y"], weights=v1_weights)[varname].sel(time=slice('2020-01-20 00:00:00','2020-02-28 23:59:59')))
errmax = np.amax(np.abs(a-b))
print("%e, %e, %e" %(a[0],b[0], errmax))

produces:

2.852244e+02, 2.852244e+02, 0.000000e+00

Relevant log output

No response

Anything else we need to know?

No response

Environment

TBH I don't know how to do this. xcdat.show_versions() throws an error. I am using the e3sm_unified environment on NERSC - perlmutter which I loaded with

source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh

xarray.show_versions produces:

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 5.14.21-150400.24.46_12.0.73-cray_shasta_c
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 2022.10.0
pandas: 1.5.2
numpy: 1.23.5
scipy: 1.9.3
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.10.0
distributed: 2022.10.0
matplotlib: 3.6.2
cartopy: 0.21.0
seaborn: 0.12.1
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: 0.20.1
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 22.3.1
conda: None
pytest: 7.2.0
IPython: 8.7.0
sphinx: None
@AaronDonahue AaronDonahue added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Jun 2, 2023
@pochedls
Copy link
Collaborator

pochedls commented Jun 2, 2023

@AaronDonahue – I don't have permissions to see these files – what does filespec[varname] look like? Is it on a lat/lon grid (spatial averaging in xcdat is for rectilinear grids)?

@crterai
Copy link

crterai commented Jun 2, 2023

@pochedls - They were files I had created, so I've just updated the permissions so you should be able to seee it. They're regridded files, so they are on a rectilinear grid.

@pochedls
Copy link
Collaborator

pochedls commented Jun 2, 2023

Okay – not a grid issue. I think I see the problem. @AaronDonahue is supplying weights and lat_bounds. Note (emphasis mine from the docs):

lat_bounds (Optional[RegionAxisBounds], optional) – A tuple of floats/ints for the regional latitude lower and upper boundaries. This arg is used when calculating axis weights, but is ignored if weights are supplied. The lower bound cannot be larger than the upper bound, by default None.

If weights are not provided, xcdat basically makes weights that conform to your region (so zero weight outside your region of interest, partial weight for grid cells partially in the domain, and full weight inside the region of interest).

I am guessing you are providing your own spatial weights (or some other kind of weight), but you would like xcdat to update those weights to conform to a different region – is that right? Without going into detail, this could be a little tricky (to modify user provided weights) – I can explain if you're interested.

Can you just let xcdat calculate the weights (or are they not spatial weights)? Another alternative approach would be to subset the domain(s) and then average (though in that case I think you would basically just be using xarray's weighted mean functionality and there would be no spatial treatment for grid cells partially in your averaging domain).

@crterai – I'm still getting the following:

ls: cannot access '/global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2/tput.scream.SurfVars.INSTANT.nmins_x15.20*nc': Permission denied

I think you need to use something like chmod a+rx /global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2/ and chmod -R a+r /global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2/*

@pochedls
Copy link
Collaborator

pochedls commented Jun 2, 2023

One other option would be to use get_weights() to get area weights corresponding to the region of interest. You could then combine the spatial weights with your weights. I think something like this might work (untested):

area_weights = ds.spatial.get_weights(lat_bounds=(-10, 10))
my_weights = ...
weights = area_weights * my_weights
ds.spatial.average(varId, axis=["X", "Y"], weights=weights)

@crterai
Copy link

crterai commented Jun 2, 2023

Thanks for the suggestion @pochedls . Okay, so weights and lat_bounds don't work well together. I can try one of your suggested fixes.

And hmm, that's odd. Maybe need executable access for the files as well..
This should fix it

terai@perlmutter:login05:/pscratch/sd/t/terai/E3SM_code/v3_branch> ls -lah /global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2/output.scream.SurfVars.INSTANT.nmins_x15.2020-0*nc
-rwxr-xr-x 1 terai terai 626M Jan 25 11:39 /global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2/output.scream.SurfVars.INSTANT.nmins_x15.2020-01-20-00900.nc
-rwxr-xr-x 1 terai terai 626M Jan 25 11:39 /global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2/output.scream.SurfVars.INSTANT.nmins_x15.2020-01-21-00900.nc
-rwxr-xr-x 1 terai terai 626M Jan 25 11:39 /global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2/output.scream.SurfVars.INSTANT.nmins_x15.2020-01-22-00900.nc
-rwxr-xr-x 1 terai terai 626M Jan 25 11:39 /global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2/output.scream.SurfVars.INSTANT.nmins_x15.2020-01-23-00900.nc
-rwxr-xr-x 1 terai terai 626M Jan 25 11:39 /global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2/output.scream.SurfVars.INSTANT.nmins_x15.2020-01-24-00900.nc

and

ls -lah /global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2
total 189G
drwxr-xr-x 4 terai terai  32K May 11 14:51 .
drwxr-xr-x 6 terai terai  16K Jan 25 11:39 ..
drwxr-xr-x 2 terai terai 4.0K Mar 22 14:02 1800x3600
-rw-r--r-- 1 terai terai 1.9G Jan 25 11:39 CAT_LW_flux_up_tom.2020-01-20_02-28.nc

@AaronDonahue
Copy link
Author

@pochedls , thanks for the help. I tried letting xcdat calculate the weights but run into a separate issue:

area_weights = filespec.spatial.get_weights(lat_bounds=(-10, 10),axis=["X"])

provides the error

xarray can't set arrays with multiple array indices to dask yet.

I get the same error if I use spatial.average like I am in the above example, or when passing axis=["X","Y"]. I did a quick google search and found you hit the same error back in 2021: #115. Is something similar happening here too do you think? Or is it user error. Pinging @tomvothecoder just in case he has insight since he worked on #115 as well.

@AaronDonahue
Copy link
Author

Update, I think I've got it. When loading the data I'm using parallel=true and I think that is causing this issue.

f_Jan_SurfVars=xcdat.open_mfdataset(directory_prefix+'/ne1024pg2_ne1024pg2_DY2/output.scream.SurfVars.INSTANT.nmins_x15.20*nc', add_bounds=True, parallel=True)

fails, but using a different data set, f_Oct_SurfVars which I loaded without parallel=true, works.

f_Oct_SurfVars=xcdat.open_mfdataset(directory_prefix+'/ne1024pg2_ne1024pg2_40dayrun/output.scream.SurfVars.INSTANT.nmins_x15.20*nc', add_bounds=True)

@AaronDonahue
Copy link
Author

I actually think the above may be a red-herring. If I remove parallel=true from how I load the f_Jan_SurfVars variable I get the error back. I'm not sure what the difference is, although I can see that the chunksize for vars for the Oct dataset is set to 4, while for Jan it is 96. Could this be the issue?

@pochedls
Copy link
Collaborator

pochedls commented Jun 5, 2023

I get Permission denied from ls /global/cfs/cdirs/e3sm/terai/SCREAM/v1_production/ne1024pg2_ne1024pg2_DY2/output.scream.SurfVars.INSTANT.nmins_x15.20*nc.

I'm happy to look at this, but if you could create a complete, minimal example (import, open dataset, do something, error message) with data that I can access (e.g., "give" me a file on the LC or something) that would be helpful.

@AaronDonahue
Copy link
Author

@crterai , could you give @pochedls permission to the file above. @pochedls , I'll see what I can do about getting a simpler recreator. I'm at a loss because I don't understand what the difference between the Jan and Oct datasets is that cause the error. Except that the Oct dataset has fewer timesnaps per file (4) than Jan (96)

@crterai
Copy link

crterai commented Jun 5, 2023

Okay, it seems like there's somewhere in that tree of directories that doesn't allow access to @pochedls, so I've copied them over to a scratch directory.
/pscratch/sd/t/terai/EAMxx/shareWpochedls/output.scream.SurfVars.INSTANT.nmins_x15.2020-*nc
Can you try accessing the files there?

@pochedls
Copy link
Collaborator

pochedls commented Jun 5, 2023

@crterai shared the data. The initial issue was that both weights and lat_bounds were passed into .spatial.average(), but that function ignores lat_bounds if custom weights are provided.

The next issue was that the longitude bounds wrap around a prime meridian in this dataset. This creates problems, because weights are determined by the difference between the longitude bound values (and a prime meridian introduces a discontinuity, e.g., [359.6, 0.35] in this case). The logic to correct this does not work for multifile datasets (which led to the error xarray can't set arrays with multiple array indices to dask yet. which came from a function _force_domain_order_low_to_high).

I also noticed that the _force_domain_order_low_to_high could produce incorrect weights for datasets that wrap a prime meridian (this didn't occur because of the above issues).

I've addressed the multifile dataset issue and the prime meridian issues in PR #495, which will hopefully be included in our next release.

@crterai
Copy link

crterai commented Jun 5, 2023

Thanks for looking into this, @pochedls. Until the next release, what would you recommend we do for doing regional averages with the dataset that we have? Should we not use the .spatial.average function and calculate it a different way?

@pochedls
Copy link
Collaborator

pochedls commented Jun 6, 2023

I think you could subset your dataset and then average:

import xarray as xr

fn = 'output.scream.SurfVars.INSTANT.nmins_x15.2020-01-31-00900.nc'
ds = xr.open_dataset(fn)
ds = ds.sel(lat=slice(-10, 10))
area = ds.area
tas = ds.T_2m
tasw = tas.weighted(area)
tasw = tas.mean(("lat", "lon"))

This should be fine for large spatial averages, but if you look at small domains I don't know exactly how it handles the (partial) weights for grid cells straddling the domain of interest.

@AaronDonahue
Copy link
Author

Excellent, thanks for this Stephen. I was able to use your snippet above to write my own mini spatial averaging function which I can use for now.

@pochedls
Copy link
Collaborator

pochedls commented Jun 8, 2023

@AaronDonahue / @crterai / @chengzhuzhang / @tomvothecoder

I spent some time re-evaluating spatial averaging in light of this issue (see #500). This was useful to understand why xCDAT sometimes gets different spatial average values compared to CDAT (in short, differences are small and I think xcdat is doing the right thing when differences do exist).

The initial error that @AaronDonahue got (xarray can't set arrays with multiple array indices to dask yet.) was easy to resolve, but in fixing that I noticed another problem. This dataset has a longitude bound spanning 359.6484375 : 0.3515625. This kind of bound is more commonly written as -0.3515625 : 0.3515625. Since the weights are calculated as the difference between the bounds (i.e., 359.6484375 - 0.3515625 instead of 0.3515625 - -0.3515625), the corresponding longitude value gets too much weight in the spatial averaging routine (and erroneous spatial average values).

This seems to be an uncommon issue:

  • It only affects datasets with a grid cells spanning across the prime meridian.
  • CMIP data does not seem to be affected (a longitude bound spanning 359 : 1 is written as -1 : 1 in the CMIP data I looked at).
  • If longitude bounds are missing, xcdat calculates increasing bounds (so this wouldn't be a problem if bounds are missing in the dataset).
  • E3SM 1-degree data does not seem to have bounds that span the prime meridian (I think the standard data has bounds corresponding to 359 : 360 and 0 : 1).
  • If the user opened the dataset as a multifile dataset they would have hit the error that @AaronDonahue hit (before getting to an incorrect spatial average).

In general, I think this tends to be rare because bounds are supposed to increase (see Fig. 7.1). My PR catches and addresses this situation when it occurs.

I wanted to raise this and explain the bug that I found (and corresponding PR); if you think that other users may have hit this issue or have datasets that I could test with this PR, let me know.

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Jun 9, 2023

@pochedls thank you very much for your effort documenting this uncommon case and to have a fix. The provided file from SCREAM (/pscratch/sd/t/terai/EAMxx/shareWpochedls/output.scream.SurfVars.INSTANT.nmins_x15.2020-*nc) was processed with ncremap and I think it is worthwhile to raise this to @czender so that Charlie is aware this behavior from remapped SCREAM datasets, and to advise if we should change this by perhaps updating the mapping file or ncremap.

I suspect this could affect some use cases of regriding as well. ..

@czender
Copy link

czender commented Jun 13, 2023

I'm trying to figure out if/what is being asked of me. My understanding is that a lon_bounds variable created by ncremap led to some issues because xCDAT expects cells straddle the prime meridian to have negative and positive longitudes rather than longitudes in [0,360]. Is this correct? And, further, that a PR for xCDAT has been submitted to resolve this issue? Is that correct? Is the suggestion here for ncremap to use the negative/positive longitude convention, or is this more of a helpful reminder to "be aware" that such tricky bounds variables exist and should be treated carefully? Please clarify so I don't drop the ball...

@chengzhuzhang
Copy link
Collaborator

Hi Charlie,

Thanks for following up!

I'm trying to figure out if/what is being asked of me. My understanding is that a lon_bounds variable created by ncremap led to some issues because xCDAT expects cells straddle the prime meridian to have negative and positive longitudes rather than longitudes in [0,360]. Is this correct? And, further, that a PR for xCDAT has been submitted to resolve this issue? Is that correct?

Yes to both

Is the suggestion here for ncremap to use the negative/positive longitude convention, or is this more of a helpful reminder to "be aware" that such tricky bounds variables exist and should be treated carefully? Please clarify so I don't drop the ball...

Based on the the cf-convention doc, Fig. 7.1: the bounds suppose to be increasing. Although this PR submitted resolves this particular issue. I think it is still worthwhile to correct to use the negative/positive longitude convention for lon_bounds created by ncremap. The tricky bounds could impact other operations that rely on bounds.

@czender
Copy link

czender commented Jun 13, 2023

Thanks Jill for the pointer. I did not realize there was a CF Convention exactly about this issue. Last question, where exactly is the dataset that ncremap created that includes the non-compliant lon_bounds variable? I need to look at its history to see how it was created in order to verify that I resolve the issue.

@chengzhuzhang
Copy link
Collaborator

I was not sure about the convention but @pochedls found the session in the document. @crterai shared the files on NERSC: /pscratch/sd/t/terai/EAMxx/shareWpochedls/output.scream.SurfVars.INSTANT.nmins_x15.2020-*nc. Thank you for helping on this.

@pochedls
Copy link
Collaborator

@taylor13 – I wanted to see if you could review one aspect of this issue (@czender / @chengzhuzhang - FYI – I thought it would be useful for @taylor13 to weigh in here).

If I have a longitude boundary that spans across 0o longitude, e.g., [359, 1], should it be written as [359, 1] or [-1, 1]?

All of the CMIP data I looked at I have found (looping over historical tas data for all models) writes the bounds as the latter [-1, 1] and CF-conventions seem to suggest that bounds should be increasing. But longitude is also special (it is periodic) so maybe this doesn't apply?

I think this ultimately doesn't matter for xcdat (we now handle this case), but should data generally be written as [-1, 1] (or is either [-1, 1] or [359, 1] okay)?

@taylor13
Copy link

CF requires coordinates to be monotonically increasing or decreasing, and the bounds on a 1-D coordinate should be consistent with this (so if coordinate increases, the 2nd bound should be greater than first bound. This makes it simple to calculate the cell width. So in you example, [359,1] is incorrect and [-1,1] is correct. (I don't think CF generally is aware of cyclic coordinates, except maybe time conversions involving hours in a day and days in a year.)

In the special case of a single cell locate at 0 degE spanning [-1,1], it would be legal to store bounds of either [-1,1] or [1,-1], but I think [359,1] would be interpreted as having a width of 358 degrees longitude.

pochedls added a commit that referenced this issue Jun 13, 2023
…on bounds span prime meridian (#495)

* Fix for #494 (and tested for robustness in #500)
* Updates spatial averaging logic to handle datasets with non-monotonic longitude bounds

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
@pochedls
Copy link
Collaborator

@crterai / @AaronDonahue – we produced a bug fix for this issue that will be incorporated into the next version of xcdat (not yet released). Could you check to see if this issue is resolved in this candidate/pre-release (see this discussion).

@czender
Copy link

czender commented Jul 12, 2023 via email

@crterai
Copy link

crterai commented Jul 13, 2023

@pochedls - I can give it a try.

@tomvothecoder
Copy link
Collaborator

Thanks @crterai! I tagged you and @AaronDonahue on the related test case in our v0.6.0rc1 testing plan, which Steve also mentioned in his comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
7 participants