Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping a MultiIndex variable raises an error after explicit indexes refactor #6505

Closed
shoyer opened this issue Apr 20, 2022 · 3 comments · Fixed by #6592 or #6798
Closed

Dropping a MultiIndex variable raises an error after explicit indexes refactor #6505

shoyer opened this issue Apr 20, 2022 · 3 comments · Fixed by #6592 or #6798

Comments

@shoyer
Copy link
Member

shoyer commented Apr 20, 2022

What happened?

With the latest released version of Xarray, it is possible to delete all variables corresponding to a MultiIndex by simply deleting the name of the MultiIndex.

After the explicit indexes refactor (i.e,. using the "main" development branch) this now raises error about how this would "corrupt" index state. This comes up when using drop() and assign_coords() and possibly some other methods.

This is not hard to work around, but we may want to consider this bug a blocker for the next Xarray release. I found the issue surfaced in several projects when attempting to use the new version of Xarray inside Google's codebase.

CC @benbovy in case you have any thoughts to share.

What did you expect to happen?

For now, we should preserve the behavior of deleting the variables corresponding to MultiIndex levels, but should issue a deprecation warning encouraging users to explicitly delete everything.

Minimal Complete Verifiable Example

import xarray

array = xarray.DataArray(
    [[1, 2], [3, 4]],
    dims=['x', 'y'],
    coords={'x': ['a', 'b']},
)
stacked = array.stack(z=['x', 'y'])
print(stacked.drop('z'))
print()
print(stacked.assign_coords(z=[1, 2, 3, 4]))

Relevant log output

ValueError                                Traceback (most recent call last)
Input In [1], in <cell line: 9>()
      3 array = xarray.DataArray(
      4     [[1, 2], [3, 4]],
      5     dims=['x', 'y'],
      6     coords={'x': ['a', 'b']},
      7 )
      8 stacked = array.stack(z=['x', 'y'])
----> 9 print(stacked.drop('z'))
     10 print()
     11 print(stacked.assign_coords(z=[1, 2, 3, 4]))

File ~/dev/xarray/xarray/core/dataarray.py:2425, in DataArray.drop(self, labels, dim, errors, **labels_kwargs)
   2408 def drop(
   2409     self,
   2410     labels: Mapping = None,
   (...)
   2414     **labels_kwargs,
   2415 ) -> DataArray:
   2416     """Backward compatible method based on `drop_vars` and `drop_sel`
   2417
   2418     Using either `drop_vars` or `drop_sel` is encouraged
   (...)
   2423     DataArray.drop_sel
   2424     """
-> 2425     ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
   2426     return self._from_temp_dataset(ds)

File ~/dev/xarray/xarray/core/dataset.py:4590, in Dataset.drop(self, labels, dim, errors, **labels_kwargs)
   4584 if dim is None and (is_scalar(labels) or isinstance(labels, Iterable)):
   4585     warnings.warn(
   4586         "dropping variables using `drop` will be deprecated; using drop_vars is encouraged.",
   4587         PendingDeprecationWarning,
   4588         stacklevel=2,
   4589     )
-> 4590     return self.drop_vars(labels, errors=errors)
   4591 if dim is not None:
   4592     warnings.warn(
   4593         "dropping labels using list-like labels is deprecated; using "
   4594         "dict-like arguments with `drop_sel`, e.g. `ds.drop_sel(dim=[labels]).",
   4595         DeprecationWarning,
   4596         stacklevel=2,
   4597     )

File ~/dev/xarray/xarray/core/dataset.py:4549, in Dataset.drop_vars(self, names, errors)
   4546 if errors == "raise":
   4547     self._assert_all_in_dataset(names)
-> 4549 assert_no_index_corrupted(self.xindexes, names)
   4551 variables = {k: v for k, v in self._variables.items() if k not in names}
   4552 coord_names = {k for k in self._coord_names if k in variables}

File ~/dev/xarray/xarray/core/indexes.py:1394, in assert_no_index_corrupted(indexes, coord_names)
   1392 common_names_str = ", ".join(f"{k!r}" for k in common_names)
   1393 index_names_str = ", ".join(f"{k!r}" for k in index_coords)
-> 1394 raise ValueError(
   1395     f"cannot remove coordinate(s) {common_names_str}, which would corrupt "
   1396     f"the following index built from coordinates {index_names_str}:\n"
   1397     f"{index}"
   1398 )

ValueError: cannot remove coordinate(s) 'z', which would corrupt the following index built from coordinates 'z', 'x', 'y':
<xarray.core.indexes.PandasMultiIndex object at 0x148c95150>

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: 33cdabd
python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:42:03) [Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 0.18.3.dev137+g96c56836
pandas: 1.4.2
numpy: 1.22.3
scipy: 1.8.0
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: None
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: None
pytest: 7.1.1
IPython: 8.2.0
sphinx: None

@shoyer shoyer added bug needs triage Issue that has not been reviewed by xarray team member and removed needs triage Issue that has not been reviewed by xarray team member labels Apr 20, 2022
@benbovy
Copy link
Member

benbovy commented Apr 21, 2022

I agree a depreciation warning is a bit nicer than an error message for this specific case.

If we agree on eventually removing the pandas multi-index dimension coordinate, perhaps this issue should be addressed in that wider context as it will be directly impacted? It would make our plans clear to users if we issue a depreciation warning message that already mentions this future removal (and that we could also issue now or later in other places like sel).

@dcherian
Copy link
Contributor

dcherian commented May 4, 2022

Here's another version of the same bug affecting `Dataset.assign)

import xarray

array = xarray.DataArray(
    [[1, 2], [3, 4]],
    dims=['x', 'y'],
    coords={'x': ['a', 'b']},
)
stacked = array.stack(z=['x', 'y']).to_dataset(name="a")
stacked.assign_coords(f1=2*stacked.a)  # works
stacked.assign(f2=2*stacked.a)  # error
``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [33], in () 8 stacked = array.stack(z=['x', 'y']).to_dataset(name="a") 9 stacked.assign_coords(f1=2*stacked.a) ---> 10 stacked.assign(f2=2*stacked.a)

File ~/work/python/xarray/xarray/core/dataset.py:5444, in Dataset.assign(self, variables, **variables_kwargs)
5442 results = data._calc_assign_results(variables)
5443 # ... and then assign
-> 5444 data.update(results)
5445 return data

File ~/work/python/xarray/xarray/core/dataset.py:4421, in Dataset.update(self, other)
4385 def update(self, other: CoercibleMapping) -> Dataset:
4386 """Update this dataset's variables with those from another dataset.
4387
4388 Just like :py:meth:dict.update this is a in-place operation.
(...)
4419 Dataset.merge
4420 """
-> 4421 merge_result = dataset_update_method(self, other)
4422 return self._replace(inplace=True, **merge_result._asdict())

File ~/work/python/xarray/xarray/core/merge.py:1068, in dataset_update_method(dataset, other)
1062 coord_names = [
1063 c
1064 for c in value.coords
1065 if c not in value.dims and c in dataset.coords
1066 ]
1067 if coord_names:
-> 1068 other[key] = value.drop_vars(coord_names)
1070 return merge_core(
1071 [dataset, other],
1072 priority_arg=1,
1073 indexes=dataset.xindexes,
1074 combine_attrs="override",
1075 )

File ~/work/python/xarray/xarray/core/dataarray.py:2406, in DataArray.drop_vars(self, names, errors)
2387 def drop_vars(
2388 self, names: Hashable | Iterable[Hashable], *, errors: str = "raise"
2389 ) -> DataArray:
2390 """Returns an array with dropped variables.
2391
2392 Parameters
(...)
2404 New Dataset copied from self with variables removed.
2405 """
-> 2406 ds = self._to_temp_dataset().drop_vars(names, errors=errors)
2407 return self._from_temp_dataset(ds)

File ~/work/python/xarray/xarray/core/dataset.py:4549, in Dataset.drop_vars(self, names, errors)
4546 if errors == "raise":
4547 self._assert_all_in_dataset(names)
-> 4549 assert_no_index_corrupted(self.xindexes, names)
4551 variables = {k: v for k, v in self._variables.items() if k not in names}
4552 coord_names = {k for k in self._coord_names if k in variables}

File ~/work/python/xarray/xarray/core/indexes.py:1394, in assert_no_index_corrupted(indexes, coord_names)
1392 common_names_str = ", ".join(f"{k!r}" for k in common_names)
1393 index_names_str = ", ".join(f"{k!r}" for k in index_coords)
-> 1394 raise ValueError(
1395 f"cannot remove coordinate(s) {common_names_str}, which would corrupt "
1396 f"the following index built from coordinates {index_names_str}:\n"
1397 f"{index}"
1398 )

ValueError: cannot remove coordinate(s) 'y', 'x', which would corrupt the following index built from coordinates 'z', 'x', 'y':
<xarray.core.indexes.PandasMultiIndex object at 0x16d3a6430>

</details>

dcherian added a commit to dcherian/xarray that referenced this issue May 11, 2022
@shoyer shoyer reopened this Jul 13, 2022
@shoyer
Copy link
Member Author

shoyer commented Jul 13, 2022

Reopening because my second example print(stacked.assign_coords(z=[1, 2, 3, 4])) is still broken with the same error message. It would be ideal to fix this before the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants