Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken state when using assign_coords with multiindex #7097

Closed
4 tasks done
znichollscr opened this issue Sep 28, 2022 · 2 comments · Fixed by #7101
Closed
4 tasks done

Broken state when using assign_coords with multiindex #7097

znichollscr opened this issue Sep 28, 2022 · 2 comments · Fixed by #7101

Comments

@znichollscr
Copy link

znichollscr commented Sep 28, 2022

What happened?

I was trying to assign coordinates on a dataset that had been created by using stack. After assigning the coordinates, the dataset was in a state where its length was coming out as less than zero, which caused all sorts of issues.

What did you expect to happen?

I think the issue is with the updating of _coord_names, perhaps in

def _maybe_drop_multiindex_coords(self, coords: set[Hashable]) -> None:
.

I expected to just be able to assign the coords and then print the array to see the result.

Minimal Complete Verifiable Example

import xarray as xr


ds = xr.DataArray(
    [[[1, 1], [0, 0]], [[2, 2], [1, 1]]],
    dims=("lat", "year", "month"),
    coords={"lat": [-60, 60], "year": [2010, 2020], "month": [3, 6]},
    name="test",
).to_dataset()

stacked = ds.stack(time=("year", "month"))
stacked = stacked.assign_coords(
    {"time": [y + m / 12 for y, m in stacked["time"].values]}
)

# Both these fail with ValueError: __len__() should return >= 0
len(stacked)
print(stacked)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Traceback (most recent call last):
  File "mre.py", line 17, in <module>
    len(stacked)
  File ".../xarray-tests/xarray/core/dataset.py", line 1364, in __len__
    return len(self.data_vars)
ValueError: __len__() should return >= 0

Anything else we need to know?

Here's a test (I put it in test_dataarray.py but maybe there is a better spot)

def test_assign_coords_drop_coord_names(self) -> None:
        ds = DataArray(
            [[[1, 1], [0, 0]], [[2, 2], [1, 1]]],
            dims=("lat", "year", "month"),
            coords={"lat": [-60, 60], "year": [2010, 2020], "month": [3, 6]},
            name="test",
        ).to_dataset()

        stacked = ds.stack(time=("year", "month"))
        stacked = stacked.assign_coords(
            {"time": [y + m / 12 for y, m in stacked["time"].values]}
        )

        # this seems to be handled correctly
        assert set(stacked._variables.keys()) == {"test", "time", "lat"}
        # however, _coord_names doesn't seem to update as expected
        # the below fails
        assert set(stacked._coord_names) == {"time", "lat"}

        # the incorrect value of _coord_names means that all the below fails too
        # The failure is because the length of a dataset is calculated as (via len(data_vars))
        # len(dataset._variables) - len(dataset._coord_names). For the situation
        # above, where len(dataset._coord_names) is greater than len(dataset._variables),
        # you get a length less than zero which then fails because length must return
        # a value greater than zero

        # Both these fail with ValueError: __len__() should return >= 0
        len(stacked)
        print(stacked)

Environment

INSTALLED VERSIONS

commit: e678a1d
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14)
[Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 0.1.dev4312+ge678a1d.d20220928
pandas: 1.5.0
numpy: 1.22.4
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.3.1
cfgrib: 0.9.10.1
iris: 3.3.0
bottleneck: 1.3.5
dask: 2022.9.1
distributed: 2022.9.1
matplotlib: 3.6.0
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 65.4.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: None
sphinx: None

@znichollscr znichollscr added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 28, 2022
@benbovy
Copy link
Member

benbovy commented Sep 28, 2022

Hi @znichollscr, thanks for the report. Indeed it looks like _coord_names are not updated properly.

@benbovy benbovy added topic-indexing and removed needs triage Issue that has not been reviewed by xarray team member labels Sep 28, 2022
@benbovy benbovy self-assigned this Sep 28, 2022
@znichollscr
Copy link
Author

Thanks, somehow I missed the warning... Doing as it advised fixed the issue on my side too, thanks for your reply and fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants