Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Bring over Microsoft typing stubs as starting point for typing public API #45253

Closed
Dr-Irv opened this issue Jan 7, 2022 · 12 comments
Closed
Labels
Enhancement Typing type annotations, mypy/pyright type checking

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 7, 2022

From discussion on pandas typing meeting on January 7, 2022 (notes at https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/)

Follow up to discussion here: #28142 (comment)

Goal is to copy over the pandas stubs created by Microsoft that are shipped with Visual Studio Code:
https://github.com/microsoft/python-type-stubs/tree/main/pandas

Unclear whether this should be done a file at a time, or all at once. Need to resolve conflicts between the pandas source and those stubs. Possible to publish partial stubs by indicating they are partial within a py.typed file.
@gramster (main author of the Microsoft stubs)may be willing to help (nudge, nudge)

Ideally, the testing framework for the stubs should be created first. See #45252

@Dr-Irv Dr-Irv added Enhancement Typing type annotations, mypy/pyright type checking labels Jan 7, 2022
@joannasendorek
Copy link

joannasendorek commented Jan 10, 2022

As discussed on the meeting, we have conducted tests of python-type-stubs with the test snippets coming from pandas-stubs library. We discovered the following:

  1. As already discussed briefly (Is compatibility with mypy a goal of this project? microsoft/python-type-stubs#42) stubs seem to be failing with mypy checker - giving us 179 errors, mostly functions not being fully annotated, for example:

third_party/3/pandas/core/nanops.pyi:7: error: Function is missing a type annotation for one or more arguments
third_party/3/pandas/core/nanops.pyi:8: error: Function is missing a type annotation for one or more arguments

  1. Regarding the test snippets, we identified 160 errors in total - these are all examples of properly used API, which were identified as wrong by mypy using python-type-stubs. Majority of errors is due to Series being generic in python-type-stubs. Therefore, the code such as:
s2: pd.Series = s1.ffill(inplace=False)

is identified as wrong with:

tests/snippets/test_series.py:480: error: Missing type parameters for generic type "Series"

In pandas-stubs we haven't decided to make Series generic for this reason, but I personally would gladly see Series becoming generic in the next pandas releases, cause this could help with providing better annotation for some of the Series methods.

There are also some errors connected to missing valid overload variant of stubs, such as:

s2: pd.Series = s1.ffill(inplace=False)

tests/snippets/test_series.py:480: error: No overload variant of "ffill" of "Series" matches argument type "bool"

Some other example of proper call not being covered:

tests/snippets/test_frame.py:153: error: Argument "index" to "drop" of "DataFrame" has incompatible type "Set[int]"; expected "Union[List[str], List[int], Index[Any], None]"
tests/snippets/test_frame.py:154: error: Argument "columns" to "drop" of "DataFrame" has incompatible type "Set[str]"; expected "Union[str, List[Any], Index[Any], None]"
tests/snippets/test_frame.py:155: error: Argument "index" to "drop" of "DataFrame" has incompatible type "int"; expected "Union[List[str], List[int], Index[Any], None]"
tests/snippets/test_frame.py:156: error: Argument "labels" to "drop" of "DataFrame" has incompatible type "int"; expected "Union[str, List[Any], None]"

I attach the full testing report.
test_log.txt

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jan 10, 2022

@joannasendorek Thanks for doing these tests. Some comments:

  1. From my experience, we know that some of the stubs in python-type-stubs are incorrect, and your tests are identifying a lot of those cases, so we'd have to start fixing those stubs for them to pass your set of tests. As we discussed in the meeting, I've been doing a number of PR's for that repo to fix usage cases that I and my colleagues have come across in our code.

  2. With respect to having Series be generic.... In the python-type-stubs, they are designed to be used with pyright as the type checker, so having a non-generic declaration is accepted by pyright. For example, this code:

def test_sort_values():
    s = pd.Series([4, 2, 1, 3])
    res: pd.Series = s.sort_values(0)

which failed in the tests with mypy works fine with pyright as the type checker. This may be a case where pyright is doing what most people would want, i.e., allowing a non-generic declaration on a type that is generic in the stub. You wrote:

I personally would gladly see Series becoming generic in the next pandas releases, cause this could help with providing better annotation for some of the Series methods.

That would be a big change to pandas. As best as I can tell, the way that numpy did this is by having ndarray be non-generic, and then have a type NPT.ndarray[_type_] allow more type specificity. We could consider that for the future.

Can you see what the test results are if you use pyright as the type checker, using your tests, both with the pandas-stubs and with python-type-stubs ? We might want to consider using pyright as the type checker for the stubs.

@a-reich
Copy link

a-reich commented Jan 10, 2022

Sorry just trying to follow along as an observer, but regarding the generic question:

As best as I can tell, the way that numpy did this is by having ndarray be non-generic, and then have a type NPT.ndarray[type] allow more type specificity

AFAICT the regular ndarray class is, in fact declared as generic in latest numpy’s stubs?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jan 10, 2022

AFAICT the regular ndarray class is, in fact declared as generic in latest numpy’s stubs?

Yes, you are right. I was looking at np.arange() which doesn't return a type in numpy 1.21.2, so that ar: np.ndarray = np.arange(10) works, but np.arange() does return a generic type in numpy 1.22.0, so there is something in the microsoft type stubs where methods like sort_values() return Series[S1] that is causing it to get flagged if you do res: pd.Series = s.sort_values(0) but things work OK when numpy is using a similar pattern.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jan 11, 2022

@joannasendorek Can you indicate from your tests which mypy options you changed from the defaults?

I'm wondering if we should do your tests that you did with your tests and the MS type stubs, but using the options for mypy that we currently have in pandas/pyproject.toml

@joannasendorek
Copy link

@Dr-Irv sure, sorry for late reply. I used the exact configuration from pandas-stubs repo, you might check it here: https://github.com/VirtusLab/pandas-stubs/blob/master/mypy.ini - used for checking the stubs themselves and https://github.com/VirtusLab/pandas-stubs/blob/master/mypy_env.ini - used for checking snippets using MS stubs.

@twoertwein
Copy link
Member

twoertwein commented Jan 13, 2022

@joannasendorek The first two classes of errors you mentioned (partial annotations and generic classes without a concrete specifier) are both suppressed in pandas's mypy config (disallow_any_generics = false and disallow_incomplete_defs = false).

Unclear whether this should be done a file at a time, or all at once. Need to resolve conflicts between the pandas source and those stubs.

Probably needs to be done file by file as 1) there might be many conflicts (see for example #43744: Microsoft stubs for _libs) or 2) some of the 3rd party stubs are very partial (for example offsets.pyi and interval.pyi in VirtusLab's stubs).

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jan 13, 2022

@Dr-Irv sure, sorry for late reply. I used the exact configuration from pandas-stubs repo, you might check it here: https://github.com/VirtusLab/pandas-stubs/blob/master/mypy.ini - used for checking the stubs themselves and https://github.com/VirtusLab/pandas-stubs/blob/master/mypy_env.ini - used for checking snippets using MS stubs.

Can you try the MS stubs with the settings we use in the pandas project (found in pandas/pyproject.toml)

@joannasendorek
Copy link

I attach the results of running described above mypy tests with the pandas settings.

As a quick summary number of errors dropped from 160 to 119.

test_log_2.txt

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jan 17, 2022

I attach the results of running described above mypy tests with the pandas settings.

As a quick summary number of errors dropped from 160 to 119.

Thanks @joannasendorek !

I think the question now is whether we get these cleaned up as part of moving the stubs over from Microsoft to pandas, OR we get them cleaned up as part of the Microsoft project, and then move them over.

I created an issue over there microsoft/python-type-stubs#125 to see what they have to say.

@mroeschke mroeschke mentioned this issue Jan 17, 2022
2 tasks
@gramster
Copy link

gramster commented Jan 21, 2022

It looks like I should just be able to open these tests in VS Code with Pylance and see any issues. I am doing that and I don't see the very first issue, but I'm also using latest stubs from the repo and it looks like you have made some changes, Irv. So I will look at the things that are still being squiggled.

Of course, that means I am assuming that these test snippets should be clean and free of warnings; is that a correct assumption?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jun 3, 2022

Have created the https://github.com/pandas-dev/pandas-stubs project from MS stubs.

@Dr-Irv Dr-Irv closed this as completed Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

No branches or pull requests

5 participants