Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.drop() no longer works with a set as an argument #3552

Closed
xylar opened this issue Nov 20, 2019 · 15 comments · Fixed by #3693
Closed

Dataset.drop() no longer works with a set as an argument #3552

xylar opened this issue Nov 20, 2019 · 15 comments · Fixed by #3693
Labels

Comments

@xylar
Copy link

xylar commented Nov 20, 2019

MCVE Code Sample

# Your code here
#!/usr/bin/env python

import xarray
import numpy

ds = xarray.Dataset()
ds['x'] = ('x', numpy.linspace(0., 1., 100))
ds['y'] = ('y', numpy.linspace(0., 1., 50))
print(ds)
ds = ds.drop(set('x'))
print(ds)

Expected Output

<xarray.Dataset>
Dimensions:  (x: 100, y: 50)
Coordinates:
  * x        (x) float64 0.0 0.0101 0.0202 0.0303 ... 0.9697 0.9798 0.9899 1.0
  * y        (y) float64 0.0 0.02041 0.04082 0.06122 ... 0.9592 0.9796 1.0
Data variables:
    *empty*
<xarray.Dataset>
Dimensions:  (y: 50)
Coordinates:
  * y        (y) float64 0.0 0.02041 0.04082 0.06122 ... 0.9592 0.9796 1.0
Data variables:
    *empty*

Problem Description

In versions before xarray 0.14.1, the code above involving calls to Dataset.drop() with a set as an argument worked as expected. With the new release, the result is an error as shown below. This is breaking backwards compatibility with our software (MPAS-Analysis).

<xarray.Dataset>
Dimensions:  (x: 100, y: 50)
Coordinates:
  * x        (x) float64 0.0 0.0101 0.0202 0.0303 ... 0.9697 0.9798 0.9899 1.0
  * y        (y) float64 0.0 0.02041 0.04082 0.06122 ... 0.9592 0.9796 1.0
Data variables:
    *empty*
Traceback (most recent call last):
  File "./drop_issue.py", line 10, in <module>
    ds = ds.drop(set('x'))
  File "/home/xylar/miniconda3/envs/test/lib/python3.7/site-packages/xarray/core/dataset.py", line 3643, in drop
    return self.drop_sel(labels, errors=errors)
  File "/home/xylar/miniconda3/envs/test/lib/python3.7/site-packages/xarray/core/dataset.py", line 3689, in drop_sel
    labels = either_dict_or_kwargs(labels, labels_kwargs, "drop")
  File "/home/xylar/miniconda3/envs/test/lib/python3.7/site-packages/xarray/core/utils.py", line 257, in either_dict_or_kwargs
    "the first argument to .%s must be a dictionary" % func_name
ValueError: the first argument to .drop must be a dictionary

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-1063-oem machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.1

xarray: 0.14.1
pandas: 0.25.3
numpy: 1.17.3
scipy: 1.3.2
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.1
dask: 2.8.0
distributed: 2.8.0
matplotlib: 3.1.2
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 41.6.0.post20191101
pip: 19.3.1
conda: None
pytest: 5.3.0
IPython: None
sphinx: None

@xylar
Copy link
Author

xylar commented Nov 20, 2019

I believe a set() object should be a valid labels argument according to the documentation of drop():
http://xarray.pydata.org/en/v0.14.0/generated/xarray.Dataset.drop.html#xarray.Dataset.drop

@dcherian
Copy link
Contributor

How about drop_vars instead?

@rabernat
Copy link
Contributor

@dcherian - are you suggesting that @xylar change his downstream code to use drop_vars instead? drop_vars has been deprecated. I don't think that's the right response here.

Instead we should fix this bug. The docs for drop suggest that you should be able to provide a set as an input. Somehow we made that not work in the latest release.

@dcherian
Copy link
Contributor

Yes yes I agree!

drop_vars would be a quick unblocking fix...

@xylar
Copy link
Author

xylar commented Nov 20, 2019

@dcherian - are you suggesting that @xylar change his downstream code to use drop_vars instead? drop_vars has been deprecated. I don't think that's the right response here.

I'm confused. In 0.14.1, drop_vars seems to have just been introduced and drop seems to have been dropped.

How about drop_vars instead?

Yes, I switched to drop_vars but first I had to make a new build of my conda package that constrains the dependency xarray <0.14.0 because it's broken with 0.14.1. So that's why I'm concerned. I have to do a new release before this fix propagates to my users.

So @rabernat and @dcherian, please clarify if I should be switching to drop_vars or if I am going to run into issues yet again if I do so?

@keewis
Copy link
Collaborator

keewis commented Nov 20, 2019

I think drop was deprecated in favor of drop_sel and drop_vars (#3475). Using v0.14.1, this works:

ds = xr.Dataset({"a": ("b", [0, 1, 2]), "c": [-1, 2, 4]}, coords={"b": list("abc")})
ds.drop_vars({"a", "b"})
ds.drop_vars(["a", "b"])
ds.drop(["a", "b"])

Additionally, I think we have a documentation bug: I can't reach the documentation on drop.

@keewis
Copy link
Collaborator

keewis commented Nov 20, 2019

so I guess we need to figure out why sets don't work anymore?

@xylar
Copy link
Author

xylar commented Nov 20, 2019

Additionally, I think we have a documentation bug: I can't reach the documentation on drop.

Yes, I noticed that, too. I assumed it wasn't being documented anymore because it was considered deprecated.

@xylar
Copy link
Author

xylar commented Nov 20, 2019

I apologize for this slight tangent but it's related to why we use sets in drop(). Is there a good way in xarray to only keep a list of variables? Our approach is to create a set of all variable and subtract from it a set of variables we want to keep and to drop the resulting set of variables we don't want to keep. But it's hard for me to imagine others don't also need this functionality.

@keewis
Copy link
Collaborator

keewis commented Nov 20, 2019

explicitly selecting should work like this:

ds = xr.Dataset(
    {"a": ("b", [0, 1, 2]), "c": ("b", [-1, 2, 4]), "d": ("b", [5, 1, 3])},
    coords={"b": list("abc")},
)
variables = ["a", "d"]
ds[variables]

I think Dataset.__getitem__ allows any kind of sequence here (e.g. sets)

@xylar
Copy link
Author

xylar commented Nov 20, 2019

Ah, thanks very much! I'll switch to using that syntax instead.

@rabernat
Copy link
Contributor

drop_vars has been deprecated. Apparently that is wrong. Sorry for adding to the noise.

xylar added a commit to xylar/MPAS-Analysis that referenced this issue Nov 20, 2019
We were using a subset_variables() function whereas it is simple
to just index the dataset with the desired variables, see:
pydata/xarray#3552 (comment)
@keewis keewis mentioned this issue Nov 20, 2019
1 task
@keewis
Copy link
Collaborator

keewis commented Nov 20, 2019

the reason for set raising an error is that in drop is_list_like(labels) or is_scalar(labels) is used instead of isinstance(labels, str) or not isinstance(labels, Iterable). Since is_list_like explicitly checks for tuple and list, passing set obviously won't call the right method.

@max-sixty
Copy link
Collaborator

drop_vars was just added! Let me know any questions / issues. We should fix set, though for your case it's best to switch to drop_vars

👍 re @keewis 's point re ds[variables]

@benbovy
Copy link
Member

benbovy commented Dec 10, 2019

We should fix set, though for your case it's best to switch to drop_vars

This also happens with other iterables like dict_keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants