-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What should Dataset.count
return for missing dims?
#6749
Comments
This is quite confusing and I doubt it's intentional. I would've expected The final value is the result of import numpy as np
from xarray.core.duck_array_ops import isnull
np.sum(np.logical_not(isnull(ds.b.data)), axis=())
# np.sum([True, True], axis=()) What happens when you call a ufunc with an empty axis tuple? I bet this is just casting bool to int. |
This should also happen with all other ufuncs then? |
We discussed:
For the other reductions import numpy as np
import xarray as xr
from xarray.core.duck_array_ops import count
ds = xr.Dataset({"a": ("x", [1, 2, 3]), "b": ("y", [4, 5])})
for func in [np.nansum, np.nanprod, np.nanmean, np.nanvar, np.nanstd, count]:
print(f"{func.__name__!s}({ds.b.data}, axis=()) = {func(ds.b.data, axis=())}") gives
I guess the output for nansum, nanprod doesn't match what you would get by broadcasting along the absent dimension. |
I think that changing the behavior of sum is quite a large breaking change. |
Another option is to add an option: But for workflows of variables that are either DataArray or Dataset, this argument should be added to |
What is your issue?
When using a dataset with multiple variables and using
Dataset.count("x")
it will return ones for variables that are missing dimension "x", e.g.:I can understand why "1" can be a valid answer, but the result is probably a bit philosophical.
For my usecase I would like it to return an array of
ds.sizes["x"]
/ 0. I think this is also a valid return value, considering the broadcasting rules, where the size of the missing dimension is actually known in the dataset.Maybe one could make this behavior adjustable with a kwarg, e.g. "missing_dim_value: {int, "size"}, default 1.
The text was updated successfully, but these errors were encountered: