Dataset.resample() adds time dimension to independent variables #2145

malmans2 · 2018-05-17T01:15:01Z

Code Sample, a copy-pastable example if possible

ds = ds.resample(time='1D',keep_attrs=True).mean()

Problem description

I'm downsampling in time a dataset which also contains timeless variables.
I've noticed that resample adds the time dimension to the timeless variables.
One workaround is:

Split the dataset in a timeless and a time-dependent dataset
Resample the time-dependent dataset
Merge the two datasets

This is not a big deal, but I was wondering if I'm missing some flag that avoids this behavior.
If not, is it something that can be easily implemented in resample?
It would be very useful for datasets with variables on staggered grids.

Output of `xr.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.17.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: None.None

xarray: 0.10.3
pandas: 0.20.2
numpy: 1.12.1
scipy: 0.19.1
netCDF4: 1.2.4
h5netcdf: 0.5.1
h5py: 2.7.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.4
distributed: 1.21.8
matplotlib: 2.0.2
cartopy: 0.16.0
seaborn: 0.7.1
setuptools: 39.1.0
pip: 9.0.1
conda: 4.5.3
pytest: 3.1.2
IPython: 6.1.0
sphinx: 1.6.2

The text was updated successfully, but these errors were encountered:

fmaussion · 2018-05-17T14:29:46Z

Thanks for the report! Do you think you can craft a minimal working example ?

malmans2 · 2018-05-18T16:50:47Z

In my previous comment I said that this would be useful for staggered grids, but then I realized that resample only operates on the time dimension. Anyway, here is my example:

import xarray as xr
import pandas as pd
import numpy as np

# Create coordinates
time  = pd.date_range('1/1/2018', periods=365, freq='D')
space = pd.np.arange(10)

# Create random variables
var_withtime1 = np.random.randn(len(time), len(space))
var_withtime2 = np.random.randn(len(time), len(space))
var_timeless1 = np.random.randn(len(space))
var_timeless2 = np.random.randn(len(space))

# Create dataset
ds = xr.Dataset({'var_withtime1': (['time', 'space'], var_withtime1),
                 'var_withtime2': (['time', 'space'], var_withtime2),
                 'var_timeless1': (['space'], var_timeless1),
                 'var_timeless2': (['space'], var_timeless2)},
                coords={'time': (['time',], time),
                        'space': (['space',], space)})

# Standard resample: this add the time dimension to the timeless variables
ds_resampled = ds.resample(time='1M').mean()

# My workaround: this does not add the time dimension to the timeless variables
ds_withtime = ds.drop([ var for var in ds.variables if not 'time' in ds[var].dims ])
ds_timeless = ds.drop([ var for var in ds.variables if     'time' in ds[var].dims ])
ds_workaround = xr.merge([ds_timeless, ds_withtime.resample(time='1M').mean()])

Datasets:

>>> ds
<xarray.Dataset>
Dimensions:        (space: 10, time: 365)
Coordinates:
  * time           (time) datetime64[ns] 2018-01-01 2018-01-02 2018-01-03 ...
  * space          (space) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    var_withtime1  (time, space) float64 -1.137 -0.5727 -1.287 0.8102 ...
    var_withtime2  (time, space) float64 1.406 0.8448 1.276 0.02579 0.5684 ...
    var_timeless1  (space) float64 0.02073 -2.117 -0.2891 1.735 -1.535 0.209 ...
    var_timeless2  (space) float64 0.4357 -0.3257 -0.8321 0.8409 0.1454 ...

>> ds_resampled
<xarray.Dataset>
Dimensions:        (space: 10, time: 12)
Coordinates:
  * time           (time) datetime64[ns] 2018-01-31 2018-02-28 2018-03-31 ...
  * space          (space) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    var_withtime1  (time, space) float64 0.08149 0.02121 -0.05635 0.1788 ...
    var_withtime2  (time, space) float64 0.08991 0.5728 0.05394 0.214 0.3523 ...
    var_timeless1  (time, space) float64 0.02073 -2.117 -0.2891 1.735 -1.535 ...
    var_timeless2  (time, space) float64 0.4357 -0.3257 -0.8321 0.8409 ...

>>> ds_workaround
<xarray.Dataset>
Dimensions:        (space: 10, time: 12)
Coordinates:
  * space          (space) int64 0 1 2 3 4 5 6 7 8 9
  * time           (time) datetime64[ns] 2018-01-31 2018-02-28 2018-03-31 ...
Data variables:
    var_timeless1  (space) float64 0.4582 -0.6946 -0.3451 1.183 -1.14 0.1849 ...
    var_timeless2  (space) float64 1.658 -0.1719 -0.2202 -0.1789 -1.247 ...
    var_withtime1  (time, space) float64 -0.3901 0.3725 0.02935 -0.1315 ...
    var_withtime2  (time, space) float64 0.07145 -0.08536 0.07049 0.1025 ...

fmaussion · 2018-05-18T17:21:11Z

I see. Note that groupby does the same. I don't know what the rationale is behind that decision, but there might be a reason...

shoyer · 2018-05-22T19:34:48Z

This is not really desirable behavior, but it's an implication of how xarray implements ds.resample(time='1M').mean():

Resample is converted into a groupby call, e.g., ds.groupby(time_starts).mean('time')
.mean('time') for each grouped dataset averages over the 'time' dimension, resulting in a dataset with only a 'space' dimension, e.g.,

>>> list(ds.resample(time='1M'))[0][1].mean('time')
<xarray.Dataset>
Dimensions:        (space: 10)
Coordinates:
  * space          (space) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    var_withtime1  (space) float64 0.008982 -0.09879 0.1361 -0.2485 -0.023 ...
    var_withtime2  (space) float64 0.2621 0.06009 -0.1686 0.07397 0.1095 ...
    var_timeless1  (space) float64 0.8519 -0.4253 -0.8581 0.9085 -0.4797 ...
    var_timeless2  (space) float64 0.8006 1.954 -0.5349 0.3317 1.778 -0.7954 ...

concat() is used to combine grouped datasets into the final result, but it doesn't know anything about which variables were aggregated, so every data variable gets the "time" dimension added.

To fix this I would suggest three steps:

Add a keep_dims argument to xarray reductions like mean(), indicating that a dimension should be preserved with length 1, like keep_dims=True for numpy reductions (keepdims=True for xarray reductions #2170).
Fix concat to only concatenate variables that already have the concatenated dimension, as discussed in concat_dim getting added to *all* variables of multifile datasets #2064
Use keep_dims=True in groupby reductions. Then the result should automatically only include aggregated dimensions. This would convenient allow us to remove existing logic in groupby() for restoring the original order of aggregated dimensions (see _restore_dim_order()).

dcherian · 2022-03-21T05:15:52Z

There is compatibility code in GroupBy._binary_op that could be removed when this is fixed. (See #6160)

fmaussion changed the title ~~Resample add usless dimensions~~ Dataset.resample() adds time dimension to independant variables May 18, 2018

bonnland mentioned this issue Jul 14, 2019

concat_dim getting added to *all* variables of multifile datasets #2064

Open

dcherian added the topic-groupby label Oct 15, 2019

dcherian mentioned this issue Oct 16, 2019

Mean called on groupby object adds dimensions to undesired variables #3398

Closed

dcherian mentioned this issue Jan 11, 2022

Vectorize groupby binary ops #6160

Merged

2 tasks

dcherian added the grant-nasa label Feb 24, 2022

Illviljan mentioned this issue Jul 15, 2022

Refactor groupby binary ops code. #6789

Merged

tomvothecoder mentioned this issue Jan 4, 2024

[Bug]: spatial average error following temporal resample xCDAT/xcdat#583

Closed

dcherian mentioned this issue Jul 25, 2024

DataArray.set_index can add new dimensions that are absent on the underlying array. #9278

Open

max-sixty changed the title ~~Dataset.resample() adds time dimension to independant variables~~ Dataset.resample() adds time dimension to independent variables Jul 25, 2024

dcherian removed the grant-nasa label Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset.resample() adds time dimension to independent variables #2145

Dataset.resample() adds time dimension to independent variables #2145

malmans2 commented May 17, 2018 •

edited

Loading

fmaussion commented May 17, 2018

malmans2 commented May 18, 2018 •

edited by shoyer

Loading

fmaussion commented May 18, 2018

shoyer commented May 22, 2018 •

edited

Loading

dcherian commented Mar 21, 2022

Dataset.resample() adds time dimension to independent variables #2145

Dataset.resample() adds time dimension to independent variables #2145

Comments

malmans2 commented May 17, 2018 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Output of xr.show_versions()

fmaussion commented May 17, 2018

malmans2 commented May 18, 2018 • edited by shoyer Loading

fmaussion commented May 18, 2018

shoyer commented May 22, 2018 • edited Loading

dcherian commented Mar 21, 2022

malmans2 commented May 17, 2018 •

edited

Loading

Output of `xr.show_versions()`

malmans2 commented May 18, 2018 •

edited by shoyer

Loading

shoyer commented May 22, 2018 •

edited

Loading