Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Additional dimension for lat_bnds and lon_bnds #390

Closed
ShihengDuan opened this issue Nov 15, 2022 · 6 comments
Closed

[Bug]: Additional dimension for lat_bnds and lon_bnds #390

ShihengDuan opened this issue Nov 15, 2022 · 6 comments
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@ShihengDuan
Copy link

What happened?

When I load data with xcdat, sometimes it generates additional dimensions for lat_bnds and lon_bnds. An example code is here:

MIROC_cmip_tas = xc.open_dataset('tas/tas_MIROC5_piControl.nc')
MIROC_cmip_tas = MIROC_cmip_tas.sel(time=slice("2100-01-01", "2619-01-01"))
MIROC_cmip_tas = MIROC_cmip_tas.drop_vars('height')
print(MIROC_cmip_tas['lat_bnds'])
print(MIROC_cmip_tas['lon_bnds'])

It generates the following output:

<xarray.DataArray 'lat_bnds' (time: 6228, lat: 132, bnds: 2)>
[1644192 values with dtype=float64]
Coordinates:
  * lat      (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
  * time     (time) object 2100-01-16 12:00:00 ... 2618-12-16 12:00:00
Dimensions without coordinates: bnds

For lon_bnds, an additional latitude dimension is generated.

<xarray.DataArray 'lon_bnds' (lat: 132, time: 6228, lon: 256, bnds: 2)>
[420913152 values with dtype=float64]
Coordinates:
  * lat      (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
  * lon      (lon) float64 0.0 1.406 2.812 4.219 ... 354.4 355.8 357.2 358.6
  * time     (time) object 2100-01-16 12:00:00 ... 2618-12-16 12:00:00
Dimensions without coordinates: bnds

The situation changes for different files. Sometimes I got (time, lon, bnds) for lon_bnds.

The version of xCDAT is 0.4.0.

What did you expect to happen?

Keep only latitude/longitude and bnds dimensions for bnds variables.

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

No response

Environment

xCDAT is 0.4.0.
xr.show_versions() yields the following

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.71.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 2022.6.0
pandas: 1.4.4
numpy: 1.22.3
scipy: 1.8.1
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.9.0
distributed: 2022.9.0
matplotlib: 3.5.2
cartopy: 0.21.0
seaborn: None
numbagg: None
fsspec: 2022.10.0
cupy: None
pint: None
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 65.5.0
pip: 22.2.2
conda: None
pytest: None
IPython: 8.6.0
sphinx: None
@ShihengDuan ShihengDuan added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Nov 15, 2022
@pochedls
Copy link
Collaborator

When I look at the what I think is the original version of this dataset, I get the following:

import xcdat as xc
fn = '/p/css03/cmip5_css02/data/cmip5/output1/MIROC/MIROC5/piControl/mon/atmos/Amon/r1i1p1/tas/1/*.nc'
ds = xc.open_mfdataset(fn)

MergeError: conflicting values for variable 'lat_bnds' on objects to be combined. You can skip this check by specifying compat='override'.

See #162 for more on this.

I think this error arises when the lat_bnds values vary across files, which I suspect may be what is happening here. It looks like the data has been re-saved out in tas/tas_MIROC5_piControl.nc. What does an ncdump -h look like @ShihengDuan?

@ShihengDuan
Copy link
Author

Here is the output from ncdump -h:

(duan) /home/duan5/ISIMIP/MIROC5/piControl/tas$ ncdump -h tas_MIROC5_piControl.nc 
netcdf tas_MIROC5_piControl {
dimensions:
	lat = 132 ;
	lon = 256 ;
	time = 10440 ;
	bnds = 2 ;
variables:
	double lat(lat) ;
		lat:_FillValue = NaN ;
		lat:bounds = "lat_bnds" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
		lat:long_name = "latitude" ;
		lat:standard_name = "latitude" ;
	double lon(lon) ;
		lon:_FillValue = NaN ;
		lon:bounds = "lon_bnds" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
		lon:long_name = "longitude" ;
		lon:standard_name = "longitude" ;
	double time(time) ;
		time:_FillValue = NaN ;
		time:bounds = "time_bnds" ;
		time:axis = "T" ;
		time:long_name = "time" ;
		time:standard_name = "time" ;
		time:units = "days since 2000-01-01" ;
		time:calendar = "noleap" ;
	double time_bnds(lat, time, bnds) ;
		time_bnds:_FillValue = NaN ;
		time_bnds:coordinates = "height" ;
	double lat_bnds(time, lat, bnds) ;
		lat_bnds:_FillValue = NaN ;
		lat_bnds:coordinates = "height" ;
	double lon_bnds(lat, time, lon, bnds) ;
		lon_bnds:_FillValue = NaN ;
		lon_bnds:coordinates = "height" ;
	double height ;
		height:_FillValue = NaN ;
		height:units = "m" ;
		height:axis = "Z" ;
		height:positive = "up" ;
		height:long_name = "height" ;
		height:standard_name = "height" ;
	float tas(time, lat, lon) ;
		tas:_FillValue = 1.e+20f ;
		tas:standard_name = "air_temperature" ;
		tas:long_name = "Near-Surface Air Temperature" ;
		tas:units = "K" ;
		tas:original_name = "T2" ;
		tas:cell_methods = "time: mean" ;
		tas:cell_measures = "area: areacella" ;
		tas:history = "2011-09-15T06:51:00Z altered by CMOR: Treated scalar dimension: \'height\'. 2011-09-15T06:51:00Z altered by CMOR: replaced missing value flag (-999) with standard missing value (1e+20). 2011-09-15T06:51:00Z altered by CMOR: Inverted axis: lat." ;
		tas:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_MIROC5_piControl_r0i0p0.nc areacella: areacella_fx_MIROC5_piControl_r0i0p0.nc" ;
		tas:coordinates = "height" ;
		tas:missing_value = 1.e+20f ;

// global attributes:
		:institution = "AORI (Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba, Japan), NIES (National Institute for Environmental Studies, Ibaraki, Japan), JAMSTEC (Japan Agency for Marine-Earth Science and Technology, Kanagawa, Japan)" ;
		:institute_id = "MIROC" ;
		:experiment_id = "piControl" ;
		:source = "MIROC5 2010 atmosphere: MIROC-AGCM6 (T85L40); ocean: COCO (COCO4.5, 256x224 L50); sea ice: COCO (COCO4.5); land: MATSIRO (MATSIRO, L6); aerosols: SPRINTARS (SPRINTARS 5.00, T85L40)" ;
		:model_id = "MIROC5" ;
		:forcing = "N/A" ;
		:parent_experiment_id = "N/A" ;
		:parent_experiment_rip = "N/A" ;
		:branch_time = 0. ;
		:contact = "Masahiro Watanabe (hiro@aori.u-tokyo.ac.jp), Seita Emori (emori@nies.go.jp), Masayoshi Ishii (ism@jamstec.go.jp), Masahide Kimoto (kimoto@aori.u-tokyo.ac.jp)" ;
		:references = "Watanabe et al., 2010: Improved climate simulation by MIROC5: Mean states, variability, and climate sensitivity. J. Climate, 23, 6312-6335" ;
		:initialization_method = 1 ;
		:physics_version = 1 ;
		:tracking_id = "d5a62be5-f4e9-4e63-b63f-5fea30c77a52" ;
		:product = "output" ;
		:experiment = "pre-industrial control" ;
		:frequency = "mon" ;
		:creation_date = "2011-09-15T06:51:00Z" ;
		:history = "2011-09-15T06:51:00Z CMOR rewrote data to comply with CF standards and CMIP5 requirements." ;
		:Conventions = "CF-1.4" ;
		:project_id = "CMIP5" ;
		:table_id = "Table Amon (26 July 2011) 976b7fd1d9e1be31dddd28f5dc79b7a1" ;
		:title = "MIROC5 model output prepared for CMIP5 pre-industrial control" ;
		:parent_experiment = "N/A" ;
		:modeling_realm = "atmos" ;
		:realization = 1 ;
		:cmor_version = "2.7.1" ;

I concatenated all the piControl simulation with the following code:

files = glob.glob('data/tas_Amon_*.nc')
# print(sorted(files))
tas = xa.open_mfdataset(sorted(files))
print(tas)
print(np.unique(tas.time, return_counts=True))
print('Done')
print(len(np.unique(tas.time)))
print(len(tas.time))

tas.to_netcdf('tas_MIROC5_piControl.nc')

I forgot why I didn't do tas=tas.compute() before saving to nc files, but when I add this code, it still shows time dimension. See below:

(duan) /home/duan5/ISIMIP/MIROC5/piControl/tas$ python
Python 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xcdat as xc
>>> data = xc.open_dataset('tas_MIROC5_piControl_compute.nc')
>>> data['lat_bnds']
<xarray.DataArray 'lat_bnds' (time: 10440, lat: 132, bnds: 2)>
[2756160 values with dtype=float64]
Coordinates:
  * lat      (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
  * time     (time) object 2000-01-16 12:00:00 ... 2869-12-16 12:00:00
    height   float64 2.0
Dimensions without coordinates: bnds
>>> data = xc.open_dataset('tas_MIROC5_piControl.nc')
>>> data['lat_bnds']
<xarray.DataArray 'lat_bnds' (time: 10440, lat: 132, bnds: 2)>
[2756160 values with dtype=float64]
Coordinates:
  * lat      (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
  * time     (time) object 2000-01-16 12:00:00 ... 2869-12-16 12:00:00
    height   float64 2.0
Dimensions without coordinates: bnds
>>> 

tas_MIROC5_piControl.nc is the original file and tas_MIROC5_piControl_compute.nc is what I get when I do tas.compute() before saving to nc file.

@pochedls
Copy link
Collaborator

@ShihengDuan – I realize that when I go back to the original issue on this ticket, if I use xarray instead of xcdat when opening the dataset (or the original data), I think I have the same issue (lat_bnds has time as a dim). I think this may either be an issue with the underlying data that needs to be resolved or an issue with xarray (on which xcdat is based).

Do you agree?

@ShihengDuan
Copy link
Author

Oh yes, I didn't realize that. I guess it is related to how the original data file is formatted?

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Nov 15, 2022

I think this may either be an issue with the underlying data that needs to be resolved or an issue with xarray (on which xcdat is based).

Based on the error below, I believe this is an issue with the underlying data rather than with xarray.

import xcdat as xc
fn = '/p/css03/cmip5_css02/data/cmip5/output1/MIROC/MIROC5/piControl/mon/atmos/Amon/r1i1p1/tas/1/*.nc'
ds = xc.open_mfdataset(fn)
MergeError: conflicting values for variable 'lat_bnds' on objects to be combined. You can skip this check by specifying compat='override'.

Here is info from the FAQs that covers this specific issue and a workaround with xarray:
https://xcdat.readthedocs.io/en/latest/faqs.html#how-do-i-open-a-multi-file-dataset-with-values-that-conflict

In xarray, the default setting for checking compatibility across a multi-file dataset is compat='no_conflicts'. If conflicting values exists between files, xarray raises MergeError: conflicting values for variable <VARIABLE NAME> on objects to be combined. You can skip this check by specifying compat="override".

If you still intend on working with these datasets and recognize the source of the issue (e.g., minor floating point diffs), follow the instructions below. Please understand the potential implications before proceeding!

The xcdat/xarray workaround is to configure open_dataset()/open_mfdataset() with these settings:

import xcdat as xc
fn = '/p/css03/cmip5_css02/data/cmip5/output1/MIROC/MIROC5/piControl/mon/atmos/Amon/r1i1p1/tas/1/*.nc'

ds = xc.open_mfdataset(fn, compat="override", join="override", coords="minimal")

print(ds.lat_bnds)

# Notice how with `coords="minimal"`, time dimension is not being added to lat_bnds and lon_bnds
<xarray.DataArray 'lat_bnds' (lat: 128, bnds: 2)>
dask.array<open_dataset-752fd673e13445722f06a881808577f1lat_bnds, shape=(128, 2), dtype=float64, chunksize=(128, 2), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
    height   float64 ...
Dimensions without coordinates: bnds
  1. compat="override": skip comparing and pick variable from first
  2. join="override": if indexes are of same size, rewrite indexes to be those of the first object with that dimension. Indexes for the same dimension must have the same size in all objects.
  3. coords="minimal": Only coordinates in which the dimension already appears are included.
    • This avoids dimensions unintentionally being added to variables

@ShihengDuan
Copy link
Author

Thanks for the information. This makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

3 participants