-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xarray.DataArray.where always returns array of float64 regardless of input dtype #3390
Comments
@pmallas - it looks like you figured this out but I'll just report on what was likely the confusion here. Xarray's where methods use xref: http://xarray.pydata.org/en/stable/computation.html#missing-values, http://xarray.pydata.org/en/stable/generated/xarray.DataArray.where.html |
Yes, I read the return type as the 'same type as caller' and at first I expected the array type to be the same. I soon realized that means a DataArray or DataSet. And for your output array to support nan values, it has to be float. My bad - sorry for the clutter. |
@pmallas it would be nice to update the docstring to make that clear if you are up for it |
@dcherian Ok, I think I proposed a change correctly - never done this before. |
Looks great. You did well! |
If My use case is a simple slicing of a dataset -- no missing values. The use of I can workaround using |
The trouble with returning the same I don't entirely remember why we don't allow I suspect it might have something to do with alignment. But as long as |
What about the case of no missing values, when I'm capable of just recasting for my use case, if this is becoming an idea that would be difficult to maintain/document. |
Could you give a concrete example of what this would look like? It seems rather unlikely to me to have an example of I guess it could happen if you're trying to index out exactly one element along a dimension? In the long term, the cleaner solution for this will be some form for support for more flexibly / multi-dimensional indexing. |
What about the case of no missing values, when other wouldn't be needed?
Could the same dtype be returned then? This is my case, since I'm
re-purposing where to do sel for non-dimension coordinates.
Could you give a concrete example of what this would look like?
It seems rather unlikely to me to have an example of where with drop=True
where the condition is *exactly* aligned with the grid, such that there
are no missing values.
I guess it could happen if you're trying to index out exactly one element
along a dimension?
That's exactly right. I am just selecting one slice of a data array, using
`data.where(data.coords['stain'] == 'DAPI')`.
In the long term, the cleaner solution for this will be some form for
support for more flexibly / multi-dimensional indexing.
Agreed. Once I actually get things running, I'll be ready to try and
contribute fixes for all my TODOs that reference xarray github issues. :)
|
Actually, this is a really common pattern ds = xr.tutorial.open_dataset('air_temperature')
ds.where(ds.time.dt.hour.isin([0, 12]), drop=True) The efficient way to do this is ds.loc[{"time": ds.time.dt.hour.isin([0, 12])}] or ds.sel(time=ds.time.dt.hour.isin([0, 12])) At this point Lines 1270 to 1273 in 48378c4
Shall we raise a warning in |
I'm not sure that either of these is a good idea. The problem with raising a warning is that this is well-defined behavior. It may not always be useful, but well defined but useless behavior arises all the time in programs, so it's annoying to raise a warning for a special case. The problem with skipping |
MCVE Code Sample
import numpy as np
import xarray as xr
a = xr.DataArray(np.arange(25).reshape(5, 5), dims=('x', 'y'))
print(a.dtype)
'int32'
a_sub = a.where(a.x + a.y < 4)
a_sub.dtype
'float64'
Expected Output
a_sub should be an xarray of dtype int32
Problem Description
The documentation (http://xarray.pydata.org/en/stable/generated/xarray.DataArray.where.html)
states that return type should be the same type as caller. However, the return type is always float64
Output of
xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.1 | packaged by conda-forge | (default, Mar 13 2019, 13:32:59) [MSC v.1900 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
libhdf5: 1.10.4
libnetcdf: 4.6.2
xarray: 0.13.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.22
cfgrib: None
iris: None
bottleneck: None
dask: 2.5.2
distributed: None
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: None
pytest: None
IPython: 7.8.0
sphinx: None
The text was updated successfully, but these errors were encountered: