-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use shape and dtype as typevars in NamedArray #8294
Conversation
for more information, see https://pre-commit.ci
…xarray into namedarray_from_array
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…xarray into namedarray_from_array
for more information, see https://pre-commit.ci
…xarray into namedarray_from_array
for more information, see https://pre-commit.ci
…xarray into namedarray_from_array
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Co-authored-by: Michael Niklas <mick.niklas@gmail.com>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…xarray into namedarray_scalartype
for more information, see https://pre-commit.ci
And back to the drawing board. |
…xarray into namedarray_scalartype
Yeah, all these NamedArray PRs will have heavy merge conflicts:/ |
Went easier than expected getting tests green. That's suspicious! I'll dig around regarding that in a follow up PR. |
def as_compatible_data( | ||
data: T_DuckArray | np.typing.ArrayLike, fastpath: bool = False | ||
) -> T_DuckArray: | ||
if fastpath and getattr(data, "ndim", 0) > 0: | ||
# can't use fastpath (yet) for scalars | ||
return cast(T_DuckArray, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Illviljan, i've been reviewing the latest changes on the main
branch and i've noticed that this pull request removed the as_compatible_data
function as well as fastpath
in NamedArray
's constructor. i'm curious if this was intentional or if there was some discussion about it that I may have missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind, i just saw this line in the PR description:
init for NamedArray will now just assume the input data is correct. At runtime at least, mypy will catch any non-supported array types. There's some precedent to this
The ugly fastpath argument is therefore not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should invert this.
Internally it's OK to use from_array
but a user should be able to do NamedArray('x', [1, 2, 3])
without issues. I like the idea of a classmethod NamedArray.from_array
for the fastpath
usecase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is the user of NamedArray again? Aren't we doing a lightweight Variable now?
The modern array packages I've looked at (Cubed, np.array_api) either doesn't allow an init or just assumes it's correct. They rather recommend you to use asarray
or from_array
functions.
NamedArray(('x',), np.array([1, 2, 3]))
is not that badxp.asarray([1, 2, 3], dims="x")
->Namedarray(dims=("x",), data=np.array([1, 2, 3]))
is quick too.
I think it's better to start strict (and fast) and see if users actually thinks it's a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to start strict (and fast) and see if users actually thinks it's a problem.
i'm pro being strict in Namedarray()
. xarray still gets to keep its as_compatible_data()
check.
Hey, I see that this was fairly recently merged. I have a question, and I was hoping it'd be appropriate to post here. Why is this: xarray/xarray/namedarray/_typing.py Line 68 in 562f2f8
not _Dim = str ? I'm trying to write code like da_dims: tuple[str] = da.dims which doesn't work since On the other hand, >>> da = xr.DataArray(data=[1,2,3], dims=[7])
TypeError: dimension 7 is not a string so it seems like it can't be any hashable other than |
Because dim can be anything from typing import Hashable
dims_str: tuple[Hashable, ...] = ("x",)
dims_int: tuple[Hashable, ...] = (654, 23)
dims_tuple: tuple[Hashable, ...] = (("#", "sdf"), ("s",))
# mypy --strict:
# Success: no issues found in 1 source file
# pyright 1.1.280
# 0 errors, 0 warnings, 0 informations
# Completed in 0.865sec This PR deals with the typing of import numpy as np
import xarray as xr
a = xr.namedarray.core.NamedArray(data=np.array([1, 2, 3]), dims=(7,))
b = xr.Variable(data=np.array([1, 2, 3]), dims=(7,))
c = xr.Dataset({"b": b})
d = xr.DataArray(data=[1, 2, 3], dims=(7,)) # error |
Let me elaborate on this a bit...
Because we mostly support non-string types for dimension and variable names.
That is a limitation on how things currently work. In this case you will have to use
The constructors are still a weak spot of typing so far (and error messages as well as it seems) because they allow many different combinations of how to create a DataArray (Dataset) and are therefore highly dynamic and difficult to statically type. |
Thanks so much @Illviljan and @headtr1ck for the fast and detailed response!!! As per @headtr1ck's suggestion I opened #8546. |
Using a different TypeVar strategy compared to #8281. The idea here is to typevar shape and dtype instead, just like numpy does.
Previously I tried to use the _data array as the TypeVar but that causes all kinds of issues since TypeVar is usually invariant and can't be updated to a new type. Since the dtype changes very frequently when doing array operations it quickly gets difficult to pass along the correct typing.
fastpath
argument is therefore not needed.duckarray[ShapeType, DType]
(corresponding tonp.ndarray
) orDuckArray[ScalarType]
(corresponding tonp.typing.NDArray
) are the recommended ones.is_duck_array
functions with typeguards becauseisinstance
also works on theelse
clause.NamedArray.shape
does not support unknown dimensions #8291References:
https://github.com/tomwhite/cubed/blob/ea885193dd37d27917a24878b51bb086aaef5fb1/cubed/core/ops.py#L34
https://stackoverflow.com/questions/74633074/how-to-type-hint-a-generic-numpy-array
https://numpy.org/doc/stable/reference/arrays.scalars.html#scalars
https://github.com/numpy/numpy/blob/040ed2dc9847265c581a342301dd87d2b518a3c2/numpy/__init__.pyi#L1423
https://github.com/numpy/numpy/blob/040ed2dc9847265c581a342301dd87d2b518a3c2/numpy/_typing/_array_like.py#L32
https://stackoverflow.com/questions/69186176/determine-if-subclass-has-a-base-classs-method-implemented-in-python