-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid calling np.asarray on lazy indexing classes #6874
Conversation
This returns the underlying array type instead of always casting to np.array. This is necessary for Zarr stores where the Zarr Array wraps a cupy array (for example kvikio.zarr.GDSStoree). In that case, we cannot call np.asarray because __array__ is expected to always return a numpy array. We use get_array in Variable.data to make sure we don't load arrays from such GDSStores.
instead of always casting to np.asarray
for more information, see https://pre-commit.ci
As I understand it, the main purpose here is to remove Xarray lazy indexing class. Maybe call this |
Clean up short_array_repr.
# so we need the explicit check for ExplicitlyIndexed | ||
if isinstance(array, ExplicitlyIndexed): | ||
array = array.get_duck_array() | ||
return _wrap_numpy_scalars(array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding _wrap_numpy_scalars
allows us to handle scalars being returned by the backend. This seems OK to me in that we place fewer restrictions on the backend (and is backward compatible).
xarray/xarray/core/indexing.py
Lines 607 to 612 in 3ee7b5a
def _wrap_numpy_scalars(array): | |
"""Wrap NumPy scalars in 0d arrays.""" | |
if np.isscalar(array): | |
return np.array(array) | |
else: | |
return array |
But now the issue is that we should pass an appropriate like
argument to np.array
but I don't see how to that from a scalar array
Good news is that backends can avoid this complication by returning arrays, so we could just ignore this ugly bit for now.
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> Co-authored-by: Stephan Hoyer <shoyer@google.com>
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
@Illviljan feel free to push any typing changes to this PR. I think that would really help clarify the interface. I tried adding a |
I don't have a better idea than to do |
|
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
for more information, see https://pre-commit.ci
* main: (40 commits) Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638) add timeouts for tests (pydata#7657) Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667) [pre-commit.ci] pre-commit autoupdate (pydata#7651) Allow all integer dtypes in `polyval` (pydata#7619) [skip-ci] dev whats-new (pydata#7660) Redo whats-new for 2023.03.0 (pydata#7659) Set copy=False when calling pd.Series (pydata#7642) Pin pandas < 2 (pydata#7650) Whats-new for release 2023.03.0 (pydata#7643) Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648) Use more descriptive link texts (pydata#7625) Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598) Fix `pcolormesh` with str coords (pydata#7612) [skip-ci] Fix groupby binary ops benchmarks (pydata#7603) Remove incomplete sentence in IO docs (pydata#7631) Allow indexing unindexed dimensions using dask arrays (pydata#5873) Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618) [pre-commit.ci] pre-commit autoupdate (pydata#7620) add a test for scatter colorbar extend (pydata#7616) ...
I'd like to merge this at the end of next week. It now has tests and should be backwards compatible with external backends. A good next step would be to finish up #7020 |
This is motivated by https://docs.rapids.ai/api/kvikio/stable/api.html#kvikio.zarr.GDSStore which on read loads the data directly into GPU memory.
Currently we rely on
np.asarray
to convert a BackendArray wrapped with a number of lazy indexing classes to a real array but this breaks forGDSStore
because the underlying array is a cupy array, so usingnp.asarray
raises an error.np.asarray
will raise if a non-numpy array is returned so we need to use something else.Here I added
get_array
which likenp.array
recurses down until it receives a duck array.Quite a few things are broken I think , but I'd like feedback on the approach.
I considered
np.asanyarray(..., like=...)
but that would require the lazy indexing classes to know what they're wrapping which doesn't seem right.Ref: xarray-contrib/cupy-xarray#10 which adds a
kvikio
backend entrypoint