-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typing of abstract base classes #7020
Conversation
The inheritance in the indexes is quite convoluted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks again!
There's an error around the abstract transpose method, which is causing lots of tests to fail, but that aside let's merge!
self: T_ExplicitlyIndexed, key: Any | ||
) -> T_ExplicitlyIndexed | NumpyIndexingAdapter | np.ndarray | np.datetime64 | np.timedelta64: | ||
... | ||
# TODO: fix the return type? Something more generic? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't it also return something like a str
, assuming the index holds strings? Or am I misunderstanding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the PandasIndexingAdapter
which has some special casing for datetimes.
Anyway it is quite confusing to me, when you return an Index and when you return a np.ndarray (or scalar for that matter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreee it is confusing.
I think these annotations are quite valuable though. Can we merge a version of this PR that just documents current behaviour? Then we can work to clean it up in a different PR.
I tried removing the abstract transpose method from the Baseclass but then several mypy complaints appeared. |
I think this is the mess I ran in to in a different PR: #6874 (comment) . It's basically another |
I have now removed the requirement to implement a |
@@ -51,9 +51,13 @@ def __init__(self, variable_name, datastore): | |||
self.datastore = datastore | |||
self.variable_name = variable_name | |||
array = self.get_variable().data | |||
self.shape = array.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment describing why this was necessary please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't really understand why.
The Abstract Array class defines an abstract shape property.
Mypy infers correctly that setting self.shape
in the constructor fulfills this requirement. (Liskov is ok here, since the base class wants at least a readable attribute, and does not care about writable)
However the ABCMeta class does not seem to understand this and raises an error at runtime (even worse than just while type checking...)
I have just noticed that the tests are still failing since many other backends have the same issue.
Not sure what to do with this, maybe it is ok to force external backends to adhere to this standard as well? At least they will notice quickly that something is broken since they cannot create their classes anymore...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was some discussion about this in #6239 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the abc inspection checks at a per-class level, so instance attributes cannot be resolved.
I think there are two solutions to this:
- Force subclasses to define
shape
as a property. Should be easy for internal classes, external ones might have to change or they stop working. - Don't use abstractmethod for shape and simply use
shape: tuple[int, ...]
in the Baseclass to tell type checkers that such a property should exist.
Nr. 1 might be the cleaner solution since shape cannot be set anyway without breaking things.
Unfortunately it could break external libs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for deep diving into this!
This is definitely one of the darker corners of Xarray.
@@ -577,19 +583,20 @@ class NDArrayMixin(NdimSizeLenMixin): | |||
""" | |||
|
|||
__slots__ = () | |||
array: Any # TODO: what is this type, maybe ExplicitlyIndexed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that implements dtype
(returning np.dtype
), shape
(returning tuple[int, ...]
) and __getitem__
(not exactly sure what the return value needs to be, but it probbaly doesn't matter for this interface). I guess is some very basic version of a "duck numpy array," of the sort that is often used for loading data from different storage backends.
As an aside, the comment in the second paragraph of the docstring able overriding properties should probably be deleted: "A subclass should... set the array
property and override one or more of dtype
, shape
and __getitem__
."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think a Protocol
would be the correct solution here.
Maybe a SupportsMinimalArray
.
The return type of __getitem__
could be Any for now, would be still an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'll do that in a different PR to keep the changes smaller. Quite sure such a change will lead to many many different problems that need fixing.
self: T_ExplicitlyIndexed, key: Any | ||
) -> T_ExplicitlyIndexed | NumpyIndexingAdapter | np.ndarray | np.datetime64 | np.timedelta64: | ||
... | ||
# TODO: fix the return type? Something more generic? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really figure this out! I would love if we could constrain Xarray's interface more tightly here to simplify the programming model and reduce bugs. Type annotations are probably key!
|
||
@property | ||
@abstractmethod | ||
def data(self) -> Any: # TODO: type me? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The informal guarantee is that .data
is numpy duck array, i.e., something with an interface matching np.ndarray
. Unfortunately, this is a somewhat ill-defined notation at present from a type-system perspective.
In practice, Xarray only guarantees that .data
passes these checks:
Lines 259 to 270 in 63ba862
def is_duck_array(value: Any) -> bool: | |
if isinstance(value, np.ndarray): | |
return True | |
return ( | |
hasattr(value, "ndim") | |
and hasattr(value, "shape") | |
and hasattr(value, "dtype") | |
and ( | |
(hasattr(value, "__array_function__") and hasattr(value, "__array_ufunc__")) | |
or hasattr(value, "__array_namespace__") | |
) | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think again we could introduce some Protocol
here, a bit messy with the two special cases of array_namespace vs function but maybe simply a Union of two Protocols could work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe such a simple Protocol will not solve the typing issues here, since we probably expect that it supports most numpy ndarray methods...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this problem.
I think the really best solution would be to make DataArryay
a Generic Type which indicates the underlying datastructure (something like DataArray[np.ndarray]
).
Only two problems:
there is no way to represent the data type of coordinates independently and also for Datasets things don't work if the underlying data type is mixed.
This kind of typing would make it more convoluted for beginners (maybe it is ok though).
Maybe I open an extra issue for this.
return _as_array_or_item(self._data) | ||
return _as_array_or_item( | ||
self._data | ||
) # type:ignore[return-value] # TODO: fix this mess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
See these comments for context:
xarray/xarray/core/variable.py
Lines 257 to 262 in 073f414
This function mostly exists because 0-dimensional ndarrays with | |
dtype=datetime64 are broken :( | |
https://github.com/numpy/numpy/issues/4337 | |
https://github.com/numpy/numpy/issues/7619 | |
TODO: remove this (replace with np.asarray) once these issues are fixed |
numpy/numpy#4337 is actually fixed now, so perhaps we could indeed simply use np.asarray
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second issue is still open.
But to me it seems that everywhere it is expected that Variable.values
returns a np.ndarray
, so I am not sure where this special casing is required.
Is there a test where this special case can be seen?
I think the implementation of a protocol for duckarrays should be done in another PR. The mess of |
Already one problem is showing from "enforcing" a shape property by setting is as an abstract property. Anyone actually know why a abstract property cannot be fulfilled by setting an attribute? At the end I require at least a readable variable, but do not care whether it is settable or not... Alternative: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABC run, I find them easier to read. https://docs.python.org/3/whatsnew/3.4.html#abc
@@ -144,24 +145,49 @@ def wrapped_func(self, dim=None, **kwargs): # type: ignore[misc] | |||
).strip() | |||
|
|||
|
|||
class AbstractArray: | |||
class AbstractArray(metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class AbstractArray(metaclass=ABCMeta): | |
class AbstractArray(ABC): |
Inheriting from ABC doesn't look as scary as metaclass=ABCMeta
@@ -1,6 +1,7 @@ | |||
from __future__ import annotations | |||
|
|||
import warnings | |||
from abc import ABCMeta, abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from abc import ABCMeta, abstractmethod | |
from abc import ABC, abstractmethod |
@@ -235,7 +269,7 @@ def sizes(self: Any) -> Frozen[Hashable, int]: | |||
return Frozen(dict(zip(self.dims, self.shape))) | |||
|
|||
|
|||
class AttrAccessMixin: | |||
class AttrAccessMixin(metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class AttrAccessMixin(metaclass=ABCMeta): | |
class AttrAccessMixin(ABC): |
@@ -46,6 +46,7 @@ | |||
import re | |||
import sys | |||
import warnings | |||
from abc import ABCMeta, abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from abc import ABCMeta, abstractmethod | |
from abc import ABC, abstractmethod |
@@ -541,15 +542,20 @@ def __repr__(self) -> str: | |||
return f"{type(self).__name__}({list(self)!r})" | |||
|
|||
|
|||
class NdimSizeLenMixin: | |||
class NdimSizeLenMixin(metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class NdimSizeLenMixin(metaclass=ABCMeta): | |
class NdimSizeLenMixin(ABC): |
try: | ||
return self.shape[0] | ||
except IndexError: | ||
raise TypeError("len() of unsized object") | ||
|
||
|
||
class NDArrayMixin(NdimSizeLenMixin): | ||
class NDArrayMixin(NdimSizeLenMixin, metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class NDArrayMixin(NdimSizeLenMixin, metaclass=ABCMeta): | |
class NDArrayMixin(NdimSizeLenMixin, ABC): |
Not sure about the order here. Maybe it's the opposite?
@@ -3,12 +3,24 @@ | |||
import enum | |||
import functools | |||
import operator | |||
from abc import ABCMeta, abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from abc import ABCMeta, abstractmethod | |
from abc import ABC, abstractmethod |
T_ExplicitlyIndexed = TypeVar("T_ExplicitlyIndexed", bound="ExplicitlyIndexed") | ||
|
||
|
||
class ExplicitlyIndexed(metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class ExplicitlyIndexed(metaclass=ABCMeta): | |
class ExplicitlyIndexed(ABC): |
|
||
class ExplicitlyIndexedNDArrayMixin(NDArrayMixin, ExplicitlyIndexed): | ||
|
||
class ExplicitlyIndexedNDArrayMixin(NDArrayMixin, ExplicitlyIndexed, metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class ExplicitlyIndexedNDArrayMixin(NDArrayMixin, ExplicitlyIndexed, metaclass=ABCMeta): | |
class ExplicitlyIndexedNDArrayMixin(NDArrayMixin, ExplicitlyIndexed, ABC): |
last or first position?
This PR adds some typing to several abstract base classes that are used in xarray.
Most of it is working, only one major point I could not figure out:
What is the type of
NDArrayMixin.array
???I would appreciate it if someone that has more insight into this would help me.
Several minor open points:
ExplicitlyIndexed.__getitem__
ExplicitlyIndexed.transpose
AbstractArray.data
Variable.values
seems to be able to return scalar values which is incompatible with theAbstractArray
definition.Overall it seems that typing has helped to find some problems again :)
Mypy should fail for tests, I did not adopt them yet, want to solve the outstanding issues first.