Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stubgen improvements: preserving existing annotations #3169

Merged
merged 22 commits into from
Sep 22, 2017

Conversation

dmoisset
Copy link
Contributor

This PR modifies heavily the stubgen to be able to create better stubs when processing already annotated files (Implements #2106 ). It essentially adds annotations on the stubs for:

  • Variables/attributes with annotations
  • Annotations on functions/methods
  • TypeVar declarations
  • Type Aliases
    It also tries to add all the required imports into the stub to make the resulting file valid, and handle properly forward references when necessary. It also adds some fixes (handling of generics as base classes, some formatting fixes in the output) that were natural consequences of some of the refactoring I made.

I know this is a large patch (and refactors some pieces of the generator), so feel free to comment if it needs discussion.

@dmoisset
Copy link
Contributor Author

Just checked other stubgen issues and this seems to fix #1782 too

@@ -20,42 +20,52 @@ def g(arg): ...
def f(a, b=2): ...
def g(b=-1, c=0): ...
[out]
def f(a, b: int = ...): ...
def g(b: int = ..., c: int = ...): ...
def f(a, b: int=...): ...
Copy link
Member

@ilevkivskyi ilevkivskyi Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I like the old style better, also see this example from PEP 8:

Yes:

def munge(sep: AnyStr = None): ...
def munge(input: AnyStr, sep: AnyStr = None, limit=1000): ...

No:

def munge(input: AnyStr=None): ...
def munge(input: AnyStr, limit = 1000): ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeshed's convention is to put spaces around the =.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I was used to the old rule, not the special case for annotations. I'll change that

@dmoisset dmoisset changed the title Stubgen imrpovements: preserving existing annotations Stubgen improvements: preserving existing annotations Apr 13, 2017
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this PR, it will definitely make stubgen more usable. Here are more detailed comments.


y: C
[out]
x: 'C'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ForwardRefs are not needed in stub files. So that I would propose to keep all annotations simple, it would create less "noise" when reading stubs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that (and a lot of the patch could be made simpler by removing forward ref handling). I couldn't find that documented anywhere (looking at PEP 484); is that actually a defined behaviour or an implementation artifact?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a consequence of the fact that .pyi files are actually never executed, and the only reason of ForwardRef's is to "hide" something from the Python runtime. This is already heavily used in typeshed stubs to simplify them, if you think it is worth mentioning this in PEP 484, then you could make a short PR to python/peps repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, changed this

y: Any

[case testMultipleAssignmentAnnotated]
x, y = 1, "2" # type: int, str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add few tests with wrong annotations, for example with more types than variables, and other possible mismatches, just to be sure that stubgen will not crash on those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... the current code is not crashing, but generating "something" instead of erroring out. I can put an error message although I'm a bit converned that validation is a bit out of scope from the stubgen tool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit converned that validation is a bit out of scope from the stubgen tool.

Just not crashing on invalid input is already OK, I think. If you can produce a reasonable warning, then it's perfect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. the current status is that it doesn't crash, just ignores the "extra" data

def f(self): ...
def g(self): ...

[case testExportViaRelativeImport]
from .api import get
[out]
from .api import get as get
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this form should be preserved. There is currently rule in PEP 484 that says that such form should be used for names that are actually imported (and therefore re-exported by Python runtime). Mypy currently does not follow this rule and treats all names as re-exported, see #2927, but it will be fixed soon.

Therefore, to prevent breakage of stubgen stubs after the fix, and to comply with PEP 484 this form from mod import X as X should be preserved for names that are actually imported in original file. At the same time, a short form from mod import X should be used for auxiliary/fake imports created by stubgen (like typing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after reading this comment and pep-484, I'm not sure I fully understand what is required here. What I have is:

  • If a stub needs to export an imported name, it should import it with an alias (even if the alias is name is the same as the original). Is that correct?
  • But then, how does stubgen know which imported names where "intended" to be reexported? should it use __all__ and ignore everything else? should it reexport everything just in case? should it reexport only names imported in a certain way?

I could use a hand here at least with the test cases, once I have some boundaries defined by those I could probably tweak the implementation to match the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a stub needs to export an imported name, it should import it with an alias (even if the alias is name is the same as the original). Is that correct?

Yes.

But then, how does stubgen know which imported names where "intended" to be reexported? should it use all and ignore everything else? should it reexport everything just in case? should it reexport only names imported in a certain way?

I think the safest bet would be to re-export all globally visible names (i.e. ignore imports in function bodies), if there is not __all__ defined, and only re-export names form __all__ if there is one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm implementing the __all__ scenario. I'm not doing the "reexport everything if there's no __all__" because I think the result ends up being unnecessarily noisy for most cases (you'll end up with tons of modules saying from typing import List as List because they don't have an __all__). In any case, it's easy to change our minds later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll end up with tons of modules saying from typing import List as List because they don't have an __all__.

You can just manually exclude names imported from typing. In general, this is an important point, since it already caused some bugs in typeshed, see e.g. python/typeshed#1484. I would prefer this test to work.

(Also I don't think people will be unhappy about a bit more bulky stub, but they will be certainly unhappy about false positives when as name isn't present).

def visit_unbound_type(self, t: UnboundType)-> str:
s = t.name
base = s.split('.')[0]
self.stubgen.import_tracker.require_name(base)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow the logic here. If I have an unbound type X.Y.Z, will this require import X? But how do we know X is actually a module? Maybe it would be better to just use Any for unbound types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you have an X.Y.Z annotatation, what other things could X be? I'm assuming module here because if it's not the only choice, it's probably the best one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure it is a good idea to guess something for an unbound type. I still think it would be better to use Any. Why do you think we should do something at all with unbound types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stubgen runs mostly at a syntactical level. Semantic analysis phase is not run within it. So essentially 99% of the type annotations it will find will still be unbound

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stubgen runs mostly at a syntactical level. Semantic analysis phase is not run within it.

OK, is it easy to fix this? If not, then maybe add a # TODO: item here and/or open an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's easy to fix at all; it's probably a major change in the tool, the design decision of working syntactically is heavily embedded on it from day one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also have an issue for this. Maybe some day we will have time to improve this.

mypy/stubgen.py Outdated
if alias:
self.reverse_alias[alias] = name

def add_import(self, module: str, alias: str=None) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it was already discussed that PEP 8 specifies spaces around = for this case, it should be alias: str = None not alias: str=None. Also note that we are moving towards explicit Optional types for args with default None.

(This also appears few time below.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

self._state = CLASS

def is_type_expression(self, expr: Expression, top_level: bool=True) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see more tests for this, to be sure that complex (generic) aliases like Union[T, List[T]] are recognized, but things like type_map[int] are not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a test for the former...
The latter will actually go through as an alias and I'm not sure how to prevent it with the kind of information I have in stubgen (I'm assuming that in most libraries that kind of things won't happen too much at a module top level).

mypy/stubgen.py Outdated
def require_name(self, name: str) -> None:
self.required_names.add(name.split('.')[0])

def import_lines(self) -> List[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is difficult to follow the logic of this method. Maybe a doctring (and comments) will help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

mypy/stubgen.py Outdated
self.module_for = {} # type: Dict[str, Optional[str]]
self.direct_imports = {} # type: Dict[str, str]
self.reverse_alias = {} # type: Dict[str, str]
self.required_names = set() # type: Set[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would document the meaning of these four variables. Maybe just add a comment above every line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these

import urllib.parse
__all__ = ['urllib']
[out]
import urllib.parse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should still be an error comment in output like

# Names in __all__ with no definition:
#    urllib

before or after the import statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running import urllib.parse defines the urllib name and attaches it to the correct module, why this should be an error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, then ignore this comment (or add another test where this produce an error, like from urllib import parse)

@@ -552,14 +602,148 @@ def f(a): ...
[case testInferOptionalOnlyFunc]
class A:
x = None
def __init__(self, a=None) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also keep an additional test for __init__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@ilevkivskyi
Copy link
Member

@dmoisset Are you still working on this? I think this would be a useful addition (in particular in the context of helping maintainers of third-party libraries that want to support typing).

@dmoisset
Copy link
Contributor Author

@ilevkivskyi I've been dealing with other stuff, I might take a look at this again during the rest of the month

@JukkaL JukkaL removed their assignment Aug 15, 2017
@ilevkivskyi
Copy link
Member

@dmoisset Sorry for pinging again, this is the oldest open PR now (and IMO it contains an important feature).

@ilevkivskyi ilevkivskyi self-assigned this Aug 31, 2017
@ilevkivskyi
Copy link
Member

Thanks for an update! I replied to your questions above, will make a thorough review soon.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is almost ready, just few minor comments/questions.

mypy/stubgen.py Outdated

def get_init(self, lvalue: str, rvalue: Expression) -> Optional[str]:
def get_init(self, lvalue: str, rvalue: Expression,
annotation: Optional[Type]=None) -> Optional[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space around =.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

mypy/stubgen.py Outdated
init_stmt = arg_.initialization_statement
if init_stmt:
if isinstance(init_stmt.rvalue, NameExpr) and init_stmt.rvalue.name == 'None':
initializer = 'None'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a recent agreement that all optional types should be explicit in typeshed, and it is now checked with --no-implicit-optional. If the implicit Optional are added already at the parse stage, then I think the initializer should be always ..., even for optional types. Otherwise, we can keep None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure of the exact context here, but the typeshed convention is that all defaults should be ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok removing this (actually the result was already ok, but this was a leftover when I tried to take advantage of the implicit optional)

mypy/stubgen.py Outdated
@@ -35,6 +35,7 @@
- we don't seem to always detect properties ('closed' in 'io', for example)
"""

import builtins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably leftover of some previous change, removing...


[case testClassVariable]
class C:
x = 1
[out]
class C:
x = ... # type: int
x: int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For class attributes it is important to know whether an attribute was just declared like x: int or actually defined like x: int = 1. There is a difference for protocol classes -- in former case an explicit subclass cannot be instantiated. Therefore, I think the translation for class attributes should be like this:

x: int -> x: int
x = 1 -> x: int = ...
x: int = 1 -> x: int = ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... what about instance attributes that stubgen finds because of self accesses? (for example, when you have self.x = 1 in __init__). Should those be just x: int or should they also be x: int = ...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... what about instance attributes that stubgen finds because of self accesses? (for example, when you have self.x = 1 in __init__). Should those be just x: int or should they also be x: int = ...?

I think it should be x: int = ... if there is an assignment in __init__ (again mostly from the protocols POW).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've made the change (and some others), I will be pushing un update after some testing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@@ -367,19 +413,19 @@ from re import match, search, sub
__all__ = ['match', 'sub', 'x']
x = 1
[out]
from re import match as match, sub as sub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget about fixing this as discussed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


noalias1: Any
noalias2: Any
noalias3: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you preserve @abstractmethod and other decorators? I think it is important to preserve @abstractmethod, @classmethod, @staticmethod, and @property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make any changes here wrt the original stubgen. I see some tests checking for property/staticmethod/classmethod being preserved, and other decorators removed (I think abstractmethod is not preserved). If you don't mind, I would add that as a separate issue to avoid scope creep here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, could you please open an issue for this?

@gvanrossum
Copy link
Member

Would love to see this land -- who's waiting for whom?

@ilevkivskyi
Copy link
Member

I am waiting for @dmoisset to implement few last comments.

dmoisset added a commit to dmoisset/peps that referenced this pull request Sep 22, 2017
@dmoisset
Copy link
Contributor Author

OK, I'm pushed a new update. There are a few issues that we've discussed in the comments that were actually issues in the previous version. Something that could be done if you agree is merge this and add these items as separate issues. What I have is:

@dmoisset
Copy link
Contributor Author

(to clarify: I can add the issues, just wanted to sync on how we proceed)

@gvanrossum
Copy link
Member

gvanrossum commented Sep 22, 2017 via email

@dmoisset
Copy link
Contributor Author

Oh, the apologies are on me; code reviews have been very quick, and the delays have been on my side. My chances to work on this are limited so it can take me a long time to move forward after the reviews

@ilevkivskyi
Copy link
Member

There's no clear road to take to export names when __all__ is not defined #3169 (comment) (The difficulty here is deciding what to do, implementation is easy)

The problem is that stubgen does not conform to PEP 484 both without and with your PR.
I think PEP 484 is clear here and we should just go with it, instead of reconsidering the as name rule again.

I checked a bit what tests should be updated and there will be a lot of import foo as foo everywhere to add and I'm worried that in general it's not what's desired.

These tests might be older than PEP 484 itself, so I am not surprised. We just need to update the tests, although I am fine to do this in a separate PR.

To summarize, I am fine with merging this PR as is to avoid unnecessary growth (it is already large). But, please open four separate issues for the problems you mentioned and mention this PR in every issue.

@ilevkivskyi ilevkivskyi merged commit 4fc4ae2 into python:master Sep 22, 2017
@ilevkivskyi
Copy link
Member

@dmoisset
Thank you for working on this! This will make stubgen more usable, and I will be happy i fyou continue to work on the above mentioned issues.

@gvanrossum
Copy link
Member

Thanks so much Daniel for working on this! And Ivan for reviewing. Stubgen doesn't get enough developer love.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants