-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bool(ds) should raise a "the truth value of a Dataset is ambiguous" error #6124
Comments
A DataFrame has a similar error, same cooks I suppose: bool(pd.DataFrame())
*** ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
bool(pd.DataFrame([[0, 2], [0, 4]], columns=['A', 'B']))
*** ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
bool({})
False
bool({'a': False})
True I see "if not empty do x"-checks all the time with dicts in python code, |
I definitely empathize with the tradeoff here. That you found xarray's test's were making this error is fairly damning. But the biggest impediment to changing this behavior is that If there's a synthesis of keeping the truthiness while reducing the chance of these mistakes, that would be very welcome. I'm not sure this is an improvement, but in the example converting to a (I wrote this before seeing @Illviljan 's response, which is very similar) |
Yeah… I do understand how it’s currently working and why, and the behavior is certainly intuitive to those who appreciate the mapping inheritance. That said, I feel I have to make a last stand argument because this trips people up quite often (on my team and elsewhere). I haven’t yet come across an example of anyone using this correctly, but I see users misusing it all the time. The examples and behavior you’re showing @Illviljan seem to me like more the natural result of an implementation detail than a critical principle of the dataset design. While it’s obvious why I don’t know much about the mapping protocol or how closely it must be followed. Is the idea here that packages building on xarray (or interoperability features in e.g. numpy or dask) depend on a strict adherence to the full spec? |
@max-sixty im not sure what this would look like. Do you mean a warning or are you hinting that the bar that would need to be met is a silver bullet that preserves bool(ds) but somehow isn’t confusing? |
The original intention here was definitely to honor the |
I realize this may be a larger discussion, but the implementation is so easy I went ahead and filed a PR that issues a PendingDeprecationWarning in |
Would this mean that |
@dcherian So I guess we decided to go for it? :) @max-sixty We can still inherit from Mapping and just explicitly raise |
Sorry! I thought we had consensus but perhaps not? Shall we revert? That said it's |
TBC, I am fine merging! I would lightly vote against it, but weigh my vote far below @shoyer 's. And more broadly let's not wait for everyone to agree on everything! |
I do wonder at what point a mapping isn't a mapping anymore? For example DataFrames aren't considered mappings: isinstance(df, collections.abc.Mapping)
Out[4]: False And if we are to follow pandas example maybe we should just remove the Mapping inheritance? Line 584 in 60754fd
|
After discussing this a little more, I am on the fence about whether this change this is a good idea. If we can't come to concensus about expected behavior, then that is probably an indication that we should leave things the same to avoid churn. |
I made a Twitter poll, let's see what that says 😄 https://twitter.com/xarray_dev/status/1478776987925684224?s=20 |
$0.02 from the peanut gallery is that my mental model of While I'm not going to sit here and argue that |
I'm also late to the party but I would say I fall squarely in the Dataset is a dict-like camp. If we remove |
Throwing this out there - happy to be shot down if people are opposed.
Current behavior / griping
Currently, coercing a dataset to a boolean invokes
ds.__bool__
, which in turn callsbool(ds.data_vars)
:This has the unfortunate property of returning True as long as there is at least one data_variable, regardless of the contents.
Currently, the behavior of
Dataset.__bool__
is, at least as far as I've seen, never helpful but frequently unhelpful. I've seen (and written) tests written for DataArrays being passed a Dataset and suddenly the tests are meaningless so many times. Conversely, I've never found a legitimate use case forbool(ds)
. As far as I can tell, this is essentially the same aslen(ds.data_vars) > 0
.In fact, while testing out my proposed changes below on a fork, I found two tests in the xarray test suite that had succumbed to this issue: see #6122 and #6123.
This has been discussed before - see #4290. This discussion focused on the question "should
bool(xr.Dataset({'a': False}))
return False?". I agree that it's not clear when it should be false and picking a behavior which deviates from Mapping feels arbitrary and gross.Proposed behavior
I'm proposing that the API be changed, so that
bool(xr.Dataset({'a': False}))
raise an error, similar to the implementation inpd.Series
.In this implementation in pandas, attempting to evaluate even a single-element series as a boolean raises an error:
I understand hesitancy around changing the core API. That said, if anyone can find an important, correct use of
bool(ds)
in the wild I'll eat my hat :)Implementation
This could be as simple as raising an error on
ds.__bool__
, something like:The only other change that would be needed is an assertion that directly calls
bool(ds)
in test_dataset::TestDataset.test_properties, which checks for the exact behavior I'm changing:This would need to be changed to:
If this sounds good, I can submit a PR with these changes.
The text was updated successfully, but these errors were encountered: