-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized indexing (isel) of chunked data with 1D indices gives weird chunks #4555
Comments
dask/dask#3648 seems related. |
The 1D case works now, it uses the new take implementation that takes care of the chunks. One thing that bothers me though is the 2D indexing, that just returns a single chunk and doesn't support Dask Arrays. @dcherian 2 questions:
|
Thanks @phofl re: the nD case we sequentially apply each indexer here: xarray/xarray/core/indexing.py Lines 1618 to 1627 in ca2e9d6
dask arrays as indexers does not help Xarray today since we need to construct output coordinates from the indexer, so we'll just compute it. I think there are definitely some uses for it. index using the output of |
Isn't this one using vindex instead of oindex? I have a PR here dask/dask#11330 that fixes this I think, I noticed that the vindex path seems to be more common than I expected. |
Very nice! |
What happened:
Applying
.isel()
on a DataArray or Dataset with chunked data using 1-d indices (either stored in axarray.Variable
or anumpy.ndarray
) gives weird chunks (i.e., a lot of chunks with small sizes).What you expected to happen:
More consistent chunk sizes.
Minimal Complete Verifiable Example:
Let's create a chunked DataArray
Select random indices results in a lot of small chunks
What I would expect
This works fine with 2+ dimensional indexers, e.g.,
Anything else we need to know?:
I suspect the issue is here:
xarray/xarray/core/variable.py
Lines 616 to 617 in 063606b
In the example above I think we still want vectorized indexing (i.e., call
dask.array.Array.vindex[]
instead ofdask.array.Array[]
).Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:21:09)
[Clang 9.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.16.1
pandas: 1.1.3
numpy: 1.19.1
scipy: 1.5.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.19.0
distributed: 2.25.0
matplotlib: 3.3.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 47.3.1.post20200616
pip: 20.1.1
conda: None
pytest: 5.4.3
IPython: 7.16.1
sphinx: 3.2.1
The text was updated successfully, but these errors were encountered: