-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Type annotations for Index #36708
Comments
we have almost 0 support for actually containing np.datetime64 in an index itself we support DatetimeIndex for this purpose |
I am aware that the index can't contain an To begin with, a naive typing scheme for Index would look like this: class Index(Generic[T]):
def __init__(self, List[T]): ...
def __getitem__(self, i) -> T: ... We can try it out, and it seems to work just fine initially: >>> from pandas import Index
>>> i = Index([True, False])
>>> i[0]
True
>>> i[1]
False
>>> i = Index(["hello", "world"])
>>> i[0]
'hello' Success! Except... >>> from numpy import datetime64
>>> i = Index([datetime64(100000, "s")])
>>> i[0]
Timestamp('1970-01-02 03:46:40')
>>> isinstance(i[0], datetime64)
False So it turns out the naive typing scheme doesn't work: you need to deal with the fact that the items you get out of the index might be a different type than what went in. And my sketch in my initial issue description is an attempt to demonstrate how one might be able to do it. |
Thanks @itamarst we always welcome PRs adding type annotations. A couple of points to note.
we currently support Python 3.7 and don't have typing_extensions as a required dependency. |
Could you expand on "We don't yet have this on CI but normally request that type parameters are added during review." I don't understand what that means, sorry. And why are you disallowing generics? |
from https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-disallow-any-generics
in the prototype above Is the |
Ah, I see. It ought to be subclassing Do you have infrastructure for testing type checking? Part of what would need doing is assertions saying "this is type checked as correct, that is type checked as incorrect". |
we run mypy on the codebase in CI for checking internal consistency, see for example https://github.com/pandas-dev/pandas/runs/1183480483
We don't have any checks for the public facing api (only from internal calls to public functions) #28142 has been opened for discussion on how types will be made available for pandas users. I don't recall any discussion on a test suite for testing the public api. In our type annotation journey, even after lots of effort and type annotations added, there is still so much to do. pandas is a large codebase. we still have many unannotated functions not being checked (we have many modules with |
One useful thing is that NumPy master now has type annotations, and next release will hopefully make that public, so that should help some. |
pandas-stubs recently added support for It will probably take quite some time until this change makes it into pandas (waiting to see whether people report issues about it on pandas-stubs and it would be a rather large PR). |
Is your feature request related to a problem?
As described in #26766, it would be good to have type annotations for Index.
Describe the solution you'd like
I would like the type of the publicly-exposed sub-objects to be part of the
Index
type. For example, these twoIndex
instances containTimestamp
from the user's perspective, regardless of the internal implementation:Because the fact
Index
returns different subclasses of itself, getting the type checker to can acknowledge that correctly is tricky. What's more, you'll notice a naive "Index[S] based on fact it's created with List[S]" won't work:Index([np.datetime64(100, "s"])
containsTimestamp
instances, at least as far as the user is concerned, andTimestamp
is very much not anp.datetime64
.Here is the only solution I've come up with that works; see also python/mypy#9482, there is no way at the moment to have this work without breaking up
Index
into a parent class that does__new__
and a subclass that does all the work.The basic idea is that you have a protocol,
IndexType
. This is a sketch, because demonstrating this with real code would be that much harder:API breaking implications
Hopefully nothing.
Describe alternatives you've considered
I tried lots and lots of other ways of structuring this. None of them worked, except this variant.
Additional context
Part of my motivation here is to help use type checking so that users can check whether switching from Pandas to Pandas-alikes like Modin/Dask/etc.. works, by having everyone use matching type annotations.
As such, just saying "this API accepts an Index" is not good enough, because some Pandas APIs have e.g. special cases for
Index[bool]
, you really do need to have some way of indicating the Index type for annotations to be sufficiently helpful.What I'd like
Some feedback on whether this approach is something you all would be OK with. If so, I can try to implement it for the real Index classes.
The text was updated successfully, but these errors were encountered: