-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typing Stubs and PEP 561 compatibility #28142
Comments
If we go down that road, we could consider using stub files in cython to make that code valid python (which would allow us to lint) |
@WillAyd @jbrockmendel @simonjayhawkins How would you describe current state of things? I've seen that many core components already have pretty good API coverage. Is it realistic to expect this to happen in foreseeable future (let's say next release or two)? To explain the context of this question ‒ for the last three years I've been working on stub files for Apache Spark. Over this time contact surface between Pandas and PySpark grown significantly, mostly due to introduction and active development of so-called Pandas udfs. Since Pandas doesn't advertise its annotation it effectively creates a growing gap, in practice not covered by type checkers. Furthermore lack of actionable annotations leads to rather ugly escalations in case of polymorphic functions, which accept Pandas objects, as well as other types, Now... For some time, to partially address the problem, I've been using
This approach is not without its own problems, but does the trick. If Pandas is going to PEP 561 these will become obsolete, but if such move is not going to happen any time soon, I will consider formalizing this approach, and agitating for required adjustments in core PySpark. However, given amount of red tape, I'd really like to avoid it :) |
Thank you for the answer @WillAyd. I didn't mean stubs explicitly, but annotations that are visible to external checkers (which tend to assume no annotations, if no stubs packages or For me it is basically a decision between waiting a bit longer (if these are expected to be ready soon. From that perspective even dynamic annotations are good enough) or adding elaborate workarounds. |
As an end user I am also having the use of annotationg code with say I am bit confused here. in order to distribute the annotations isn't that enough to add Update: However I am bit still confused on what is need to make this happen (if it is happening). |
There is a bit more to that. In general it is good to have annotations that pass type checks with standardish settings, otherwise you're likely to break things downstream. |
Related issue: #12412 |
Need to keep an eye on Microsoft's efforts here: https://github.com/microsoft/python-type-stubs/tree/main/pandas |
Regarding our (Microsoft) stubs:
* I believe that if the stub files exist in the package distribution alongside the Python files, then we can use them, and they will take priority over the .py files (@msfterictraut could confirm). So that would be one way we could handle an incremental move to inline types. But it would require package authors to buy into that approach. |
@gramster, you're correct that stub files that are packaged alongside the Python source files will take precedence when used by a type checker like pyright or mypy. This is the behavior dictated by PEP 561. Until recently, library authors had no way to verify the completeness of their type information within a "py.typed" package. We recently added a mode to pyright that analyzes a package and reports missing or incorrect type information. For more details, see this documentation. |
also need to keep an eye on cython/cython#3818 |
@gramster or @erictraut could you give some background on the "content" of the Microsoft stubs? (since we also already have type annotations in pandas as well) |
We added a lot more annotations, and becuase these are stub files, included annotations for a lot of APIs implemented in Cython. |
Thanks to all the folks who joined the dev call on Wednesday. Here's a recap of (my understanding of) the current status, and a tentative proposal for how to proceed. More notes at https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?usp=sharing. https://github.com/microsoft/python-type-stubs/tree/main/pandas contains type stub (
There weren't any objections from the pandas side to taking on maintenance of the stubs. Most of the discussion was around logistics of actually upstreaming them. pandas is already using inline types wherever possible. I think there's broad agreement that we want to continue that wherever possible (i.e. everywhere but extension modules). My suggestion is to manually go through, file-by-file, and migrate the typestub files to inline types. This will require some effort by maintainers who are experienced Finally, we'll need to coordinate with the microsoft/python-type-stubs team once we've upstreamed individual files. Perhaps a checklist on a GitHub issue, or a google spreadsheet. Once a module has been upstreamed to pandas, we'd like to avoid changing it (just) in microsoft/python-type-stubs. One thing I'm not sure of: what should happen once pandas has completely upstreamed all the files? We'd like the inline types to take precedence by code-completion tools in editors, so they should probably be removed for microsoft/python-type-stubs. But the pyright maintainers mentioned some ancillery benefits to |
One small point is that I don't know if that changes the position of the pyright maintainers, or if this a more general problem... |
I think for us, we will either come up with an alternate solution to the
docstring problem or regenerate full stubs from the pandas source and ship
those with docstrings added. We’d obviously prefer that docstrings not be
dynamically created but we have workarounds if they are.
…On Fri, Jan 15, 2021 at 6:15 AM Terji Petersen ***@***.***> wrote:
One small point is that read_csv is no longur dynamically created. read_table
is also anormal function.
I don't know if that changes the position of the pyright maintainers, or
if this a more general problem...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28142 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAVCPCDNSVMM33LJF7OB65TS2BEW3ANCNFSM4IPKIX7A>
.
|
Let's make sure that we track the two use cases of typing:
For (2) we need to figure out a way to test that we have properly typed the public API. The work that @gramster has done provides the typing for much of the public API, and that's what we want to integrate in. I still don't know how we create a set of tests for CI that help us verify that the typing of the public API is correct. One thing we could consider doing now is to add a @gramster if we were to do that, how would |
Reckon it's OK to add a |
If you add py.typed, that means that the pandas lib should be treated as type complete and the canonical source of info, and type checkers will prefer it over any other installed stubs (including stubs shipped by Pylance, various other stub projects); if the typing isn't complete yet, it may be a worse experience without much of a workaround (the preference is defined in PEP 561). I know there are a lot of types for pandas still in Microsoft's python-type-stubs repo; I'm not sure where the effort to merge those in here for parity currently stands. |
This doesn't seem right. Since you mentioned PEP 561 it states that first in the resolution order are
and inline hints have second lower precedence (with only typeshed being lower).
So with inline packages you can still override any type hint, just cannot depend on the typeshed. This can be easily tested (corresponding code can be found here) As of that:
|
I'd like to discuss the idea I sent out via email about maintaining the stubs as a |
Let's discuss tomorrow - there'd need to be a way of making sure it doesn't go out-of-date. Also, there's already https://github.com/VirtusLab/pandas-stubs |
I'm happy to be wrong on the resolution order; I just recall Pylance's set of bundled stubs behaving more like an extra typeshed such that we wouldn't have to worry about packages becoming py.typed, except that in this instance, our types are preferred. It wouldn't be too hard to test, just install pandas and IMO inlined types are the end-all-be-all, it's just that they have to be pretty solid to end up with a good UX; @Dr-Irv has kindly put a lot of effort into improving the stubs in the meantime over a pretty long timeframe. |
Jake is correct about the resolution order in Pylance. As per PEP 561, users can explicitly configure a "stubPath" that overrides other type information, but the stubs that ship with Pylance are treated like typeshed stubs and have the lowest priority. So if pandas becomes "py.typed", it will override the stubs that Pylance includes. That's fine as long as the inlined type information shipped with pandas is relatively complete (preferably a superset of what is included in the Pylance stubs today). |
I hate to make a long comment thread even longer, but is there any reason (i.e. is the design so different) why not consider migrating Microsoft stubs into pandas as-is and following with step-wise migration to inline hints? I am aware that @erictraut might disagree with me here (sorry), but that's immediate improvement for majority of users (and dependent projects). It also reduces risk of divergence between independent stubs and ongoing annotation effort within the project, which might be a lesser concern for the end users, but pretty serious issue in case of any typed library that depends on pandas. |
@zero323, I think that's a great idea. I don't disagree in the slightest. If there's anything we can do to facilitate this, please let us know. |
@erictraut @zero323 I've asked that this be put on the agenda for our monthly pandas development meeting tomorrow (December 8, 1PM Eastern time). All are welcome to join. Details can be found here: https://pandas.pydata.org/docs/development/meeting.html#calendar |
We have scheduled a meeting on January 7, 2022, at 5PM UTC time to discuss this topic in more depth. Information about that meeting can be found on the pandas development meeting calendar here: https://pandas.pydata.org/docs/development/meeting.html . All are welcome to attend. Aside from ongoing efforts to add typing to the pandas source, there have been two efforts to provide type stubs for pandas:
One interesting aspect of the The key developers of both of those projects have been invited to participate in that meeting and have indicated that they will be able to attend. We last discussed this topic in depth at our January, 2021 development meeting which led to a summary given above #28142 (comment) In order to provide type stubs that support the pandas public API, I see a few options going forward:
If we choose any of the first 3 options, we should also consider how we want to test the provided stubs. In the I think there are pros and cons to each of the above options (or maybe there are other options as well), and they can be discussed here or at our meeting on January 7, 2022. |
In Xarray we are attempting to use our pytests as tests for our typing too (e.g. pydata/xarray#5690 (comment); and pydata/xarray#5694 is one attempt at generalizing, currently lingering). That ensures the types are sufficiently broad. Though it doesn't ensure the types are as narrow as possible (i.e. assigning everything as Notably, I only realized that tests require some annotation at the test level; e.g. From an ecosystem POV, speaking as an xarray core-dev — we would benefit from upstream projects like pandas having a And notwithstanding the issues around replacing more complete alternative versions that pandas now has, typing doesn't need to be perfect in order to ship with a |
That's quite broad topic, but in general you can have multiple testing strategies. For "positive" cases (annotations are consistent and capture common usage examples) you can:
For negative cases (checkers should detect incorrect usage):
|
pip3.10 install -U 'psycopg[c]' engine = create_engine(
"postgresql+psycopg://gert:p@localhost/gert", echo=False, future=True
)
with engine.begin() as conn:
stmt = select(A)
df = pd.read_sql(stmt, conn)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df)
df.to_sql('test', conn, if_exists='replace', index = False) Type of "to_sql" is partially unknown |
@gertcuykens please try the |
There are a few reasons:
That's why we created pandas-stubs. It's meant to support type checking of user code, and we can do a LOT more checking there than is possible within the pandas code itself. We also have a testing mechanism that tests the validity of those type stubs. Install pandas-stubs and use it. You'll be glad you did! Originally posted by @Dr-Irv in #49865 (comment) |
@simonjayhawkins hi, just checking in to see if |
I'll answer for him, and the answer is yes. |
xref #28135 (comment)
do we want to make pandas PEP 561 compatible?
https://mypy.readthedocs.io/en/latest/installed_packages.html#making-pep-561-compatible-packages
The text was updated successfully, but these errors were encountered: