-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiIndex listed multiple times in Dataset.indexes property #6752
Comments
Thanks for trying out our pre-release @lukasbindreiter ! This is an intentional change. Can you tell us more about why this breaks your code? |
The change is because starting from version 2022.6.0, multi-index level coordinates are no longer "virtual" but now correspond to real coordinates. The There has been some discussions prior to the explicit indexes refactor about whether those properties should return a mapping of a unique vs. non-unique index objects. We choose the latter as it simplifies a lot of things internally (and perhaps externally too). @lukasbindreiter although it is unlikely that we'll change this in the future, it would be interesting to get your feedback! How does this choice impact your workflow? Note that both |
We used the Saving dataset as NetCDF:
And then loading it again:
When testing the pre-release version I noticed some of our tests failing, which is why I raised this issue in the first place - in case those changes were unwanted. I was not aware that you were actively working on multi index changes and therefore expecting API changes here. With that in mind I'll probably be able to adapt our code to this new API of |
@benbovy I also just tested the Taking the above dataset > ds.indexes.get_unique()
TypeError: unhashable type: 'MultiIndex' However, for > ds.xindexes.get_unique()
[<xarray.core.indexes.PandasMultiIndex at 0x7f105bf1df20>] |
Thanks for the issue report @lukasbindreiter, I opened #6987. As a workaround, you could use Regarding (de)serialization (from)to netCDF or other formats, I wonder if building multi-indexes or other custom indexes when opening the dataset couldn't be done via some custom Xarray IO backend (https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html). I'm not sure how easy / hard it is to implement a custom backend on top of an existing one, though. For the serialization, Xarray doesn't support custom writable backends (yet), but since multi-index levels are now real coordinates maybe a custom backend is not really needed. Right now Xarray raises |
Thanks for the suggestions, I'll look into this And with regards to the (de)serialization: I haven't investigated yet how the As for the original issue discussed here: That can probably be closed then, since it was an intentional change. |
Not yet, this still has to be detailed in the documentation (tracked in #6293 along with other todo items related to indexes). The |
Yes I think we can close it. Thanks for your feedback and for the issue report! |
What happened?
When upgrading to 2022.6.0.rc0 from 2022.3.0 I noticed a possible unexpected breaking change in the Dataset.indexes property. MultiIndices are now listed for each dimension they apply for as well as once for the multi index itself when accessing
dataset.indexes
.What did you expect to happen?
Same behaviour as before, see example below.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.8.10 (default, Jan 28 2022, 09:41:12)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.102.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.5
libnetcdf: 4.6.3
xarray: 2022.3.0
pandas: 1.4.3
numpy: 1.23.0
scipy: 1.9.0rc1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3b3
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.2
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 56.0.0
pip: 21.3.1
conda: None
pytest: 7.1.2
IPython: 8.4.0
sphinx: 4.5.0
INSTALLED VERSIONS
commit: None
python: 3.8.10 (default, Jan 28 2022, 09:41:12)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.102.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.5
libnetcdf: 4.6.3
xarray: 2022.6.0rc0
pandas: 1.4.3
numpy: 1.23.0
scipy: 1.9.0rc1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3b3
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.2
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 56.0.0
pip: 21.3.1
conda: None
pytest: 7.1.2
IPython: 8.4.0
sphinx: 4.5.0
The text was updated successfully, but these errors were encountered: