Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warn when updating coord.values : indexes are not updated #3470

Closed
folmerkrikken opened this issue Oct 30, 2019 · 7 comments · Fixed by #3862
Closed

warn when updating coord.values : indexes are not updated #3470

folmerkrikken opened this issue Oct 30, 2019 · 7 comments · Fixed by #3862

Comments

@folmerkrikken
Copy link

Hi, I've recently come across what it seems to be a bit annoying bug. After adding an offset to the time dimension using pd.add_offset(), the wrong times are selected when using .sel(). I've come across this behavior in recent versions of xarray, starting from 0.12. I've now worked around it by using ds.assign_coords(time = ...). I'm not sure however if this is a possible bug in Xarray or Pandas. Below a minimum working example.

MCVE Code Sample

import xarray as xr
import pandas as pd
import numpy as np

temp = 15 + 8 * np.random.randn(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
ds = xr.Dataset({'temperature': (['x', 'y', 'time'],  temp)},
                 coords={'lon': (['x', 'y'], lon),
                         'lat': (['x', 'y'], lat),
                         'time': pd.date_range('2014-09-01', periods=3,freq='MS'),
                         })

seltime = ds.time[2]
ds.time.values = pd.to_datetime(ds.time.values) + pd.DateOffset(months=2)
print(ds.sel(time=seltime).time)
<xarray.DataArray 'time' ()>
array('2015-01-01T00:00:00.000000000', dtype='datetime64[ns]')
Coordinates:
    time     datetime64[ns] 2015-01-01
print(seltime)
<xarray.DataArray 'time' ()>
array('2014-11-01T00:00:00.000000000', dtype='datetime64[ns]')
Coordinates:
    time     datetime64[ns] 2014-11-01

Expected Output

Expected output is that seltime and ds.sel(time=seltime).time are the same

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Apr 3 2019, 19:16:38) [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] python-bits: 64 OS: Linux OS-release: 4.15.0-58-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.0 libnetcdf: 4.6.0

xarray: 0.14.0
pandas: 0.25.2
numpy: 1.17.3
scipy: 1.3.1
netCDF4: 1.4.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: 1.1.0
PseudoNetCDF: None
rasterio: 1.0.26
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.3.0
distributed: None
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 41.2.0
pip: 9.0.1
conda: None
pytest: None
IPython: 7.7.0
sphinx: None

@dcherian
Copy link
Contributor

ds.time.values = pd.to_datetime(ds.time.values) + pd.DateOffset(months=2)
By doing this, ds.indexes["time"] is not updated.

@folmerkrikken How about ds.assign_coords(time=pd.to_datetime(ds.time.values) + pd.DateOffset(months=2))

@shoyer I've seen this type of code elsewhere: direct assignment to .values and .data of a coordinate variable which then leads to inconsistent objects and confusing behaviour. Is there something we can do to alert users?

@max-sixty
Copy link
Collaborator

I've seen this type of code elsewhere: direct assignment to .values and .data of a coordinate variable which then leads to inconsistent objects and confusing behaviour. Is there something we can do to alert users?

Should we raise an error? Is there ever a use here?
It might be messy to implement though; the ds.time would have to know it's a coord.
Does the bigger index refactor help us at all here?

@folmerkrikken
Copy link
Author

@dcherian Yes, ds.assign_coords does work and that's what I'm using now.

So can ds.assign_coords(..) be seen as the right way and changing the values inplace with ds.time.values = pd.... the wrong way? It all worked fine until changing from version 0.11.3 to 0.12.0.

@dcherian dcherian changed the title Wrong time selected after pd.add_offset() indexing after updating coord.values Dec 16, 2019
@dcherian dcherian changed the title indexing after updating coord.values warn when updating coord.values : indexes are not updated Dec 16, 2019
@kefirbandi
Copy link
Contributor

Is it already decided what the resolution should be?

  • Giving a warning, as the title of this thread suggests?
  • Disable setting .values directly for dimensions?
  • Or making sure that .indexes are updated when .values are set directly

@shoyer
Copy link
Member

shoyer commented Dec 16, 2019

The main problem is that when you write ds.time.values = something, you are modifying an attribute of a DataArray that happens to use the same variable as the as the Dataset, but otherwise the DataArray and Dataset are entirely independent. The DataArray does not have a reference to the original dataset, so it can't check or update the indexes.

This happened to work in previous versions of xarray, but only because indexes always mapped directly into the data of an xarray.Variable. But I don't think this is the right pattern to continue in the future, especially because we want to do define indexes over multiple variables (e.g., for MultiIndex).

I think what we probably want to do is mark variables used in indexes as having immutable data in xarray's data model, and raise an error for attempts to modify them in-place.

@dcherian
Copy link
Contributor

I think what we probably want to do is mark variables used in indexes as having immutable data in xarray's data model, and raise an error for attempts to modify them in-place.

How do I implement this?

@keewis
Copy link
Collaborator

keewis commented Jan 14, 2020

I might be wrong, but I think you should be able to implement this by overriding Variable.values.setter in IndexVariable, the same way it is done for Variable.data.setter.

dcherian added a commit to dcherian/xarray that referenced this issue Mar 17, 2020
dcherian added a commit to dcherian/xarray that referenced this issue Mar 17, 2020
dcherian added a commit that referenced this issue Mar 23, 2020
…ta (#3862)

* Raise error when assigning IndexVariable.values, IndexVariable.data

Fixes #3470

* fix existing tests

* Add new test

* whats-new

* Fix more existing tests

* Update doc/whats-new.rst

* fix docs

* update whats-new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants