Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The if TYPE_CHECKING problem #1

Open
JelleZijlstra opened this issue Mar 27, 2021 · 38 comments
Open

The if TYPE_CHECKING problem #1

JelleZijlstra opened this issue Mar 27, 2021 · 38 comments

Comments

@JelleZijlstra
Copy link
Contributor

JelleZijlstra commented Mar 27, 2021

This was brought up by Joseph Perez on the mailing list. The problem is that this is a fairly common idiom:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from expensive_module import SomeType

def f(x: SomeType) -> None: ...

f.___annotations__   # NameError under current PEP 649

My only idea for a solution is to make an undefined name produce some special object, like typing.ForwardRef, in an annotation. But that may introduce more complexity, because annotations aren't just names.

I can think of three more operations we'd have to support with current standard library typing:

  • SomeType may be generic so we'll have to support SomeType[int]
  • It may be part of a Union and the user may write SomeType | int
  • It may appear on the index side of a generic, like list[SomeType], and get caught up by overzealous runtime typechecking. For example, typing.Union would currently reject it.

(I opened this issue because I feel like it's an easier way to have a focused discussion on a single problem. If you disagree, feel free to let me know.)

@carljm
Copy link

carljm commented Mar 27, 2021

In order to make this pattern work well, I believe we will eventually have to move beyond the if False: approach to guarding annotation-only imports. Ultimately I believe we should have dedicated syntax for these imports (bike shedding the exact syntax postponed for now.)

If we have that, then the runtime can preserve the fully qualified path to the type-only import instead of just its name, and record that (along with perhaps an AST-like representation of additional operations performed on it, such as indexing, bit-or, etc), allowing the lazy __annotations__ attribute to fully reify the actual type as it would exist if it had been originally imported in the normal way.

(I recognize that this is probably a bridge too far for Python 3.10, but I hope if we find a plan for 3.10, it can be a step on the way to something like this.)

@larryhastings
Copy link
Owner

I'm very interested in this fairly common idiom. I don't have a giant Python code base that uses static type analysis, so I just don't understand the use case. I'd appreciate it if you could tell me:

  • If f() takes a parameter of type SomeType, how does this code at runtime work without somebody somewhere importing expensive_module? Don't you need to import that module in order to create objects of that type? In which case you're paying to import the module anyway, so the second and subsequent imports are cheap.
  • What are people doing with f.__annotations__ at runtime? I thought the code using static type analysis didn't really look at the annotations for anything.

@larryhastings
Copy link
Owner

Also, both PEP 649 and the current stringized annotations in 3.10a should speed up importing, by making the annotations faster to calculate. Do either of those speed up importing expensive_module enough that maybe importing it isn't such a big deal anymore?

@carljm
Copy link

carljm commented Mar 27, 2021

One reason for the idiom is that typing frequently results in import cycles that would not otherwise occur. E.g. if a.f takes a parameter of b.SomeType, and module b calls a.f with that parameter, there is no import cycle for runtime purposes but there is one once the annotation is introduced.

Also import expense is relevant. In a very large system (or even a smaller CLI tool) you may have code paths that are unlikely to be followed in a given process, but may be. There is value in having an expensive import required for that code path not occur immediately at process startup (delaying initial responsiveness of the entire system.)

@carljm
Copy link

carljm commented Mar 27, 2021

It's a little harder to answer the runtime uses for __annotations__, since those are not standardized. There are various libraries for doing runtime enforcement of type annotations (in addition to the static checking performed by eg mypy); these libraries use __annotations__. At Instagram we have a JSON API schema library that uses type annotations to define the shape of the schema objects, and uses __annotations__ to introspect these shapes at runtime (as well as to provide auto generated schema documentation.) I think these are pretty typical uses of __annotations__.

The problem discussed here is already an issue for us in that API schema library use case. Today get_type_hints() cannot resolve annotations imported only under if TYPE_CHECKING, because they aren't in the module namespace. So this prevents use of if TYPE_CHECKING to guard imports in modules defining API schemas, and this is a top FAQ for users of that library, since we make extensive use of if TYPE_CHECKING guarded imports in all other modules.

In that sense this is not a new problem, your PEP just shifts it from being a get_type_hints problem to being an __annotations__ problem. Which is a significant regression because then you can't even get at the raw annotations to implement your own workaround.

@carljm
Copy link

carljm commented Mar 27, 2021

Making __annotations__ faster to create at import is certainly a good thing, but in the case of large modules with many classes and methods, I don't think it's a qualitative change in the need to care about expensive imports at all.

@larryhastings
Copy link
Owner

Circular imports are a gnarly problem. But maybe PEP 649 permits a gnarly solution?

Let's say module a depends on module b, and module b depends on module a. Let's also say that there's a common module c that everybody imports anyway, that we can require everyone imports it before they can use either a or b.

First, we pick one of a or b to be first. Let's say we choose a. We then do this:

  • a does not import b.
  • b imports a.
  • c imports a, then b, then executes a.b = b.

a should now import fine, and the annotations in a won't look up the value of b until after we've done this mild hack.

I mean, it's not great, but it seems like it might solve the circular import problem.

@larryhastings
Copy link
Owner

Are these large code bases with if TYPE_CHECKING already reliant on stringized annotations in order to make their code sufficiently performant?

And, if so, what did these large codebases do before they could import stringized annotations? Were they just slow, and everyone groused about it?

@JelleZijlstra
Copy link
Contributor Author

I agree with what Carl wrote above; expensive_module maybe wasn't the right name to use in my example. You can see some real-world examples of usage of if TYPE_CHECKING in the mypy codebase: https://github.com/python/mypy/search?q=%22if+TYPE_CHECKING%22 . Many of them are guarding circular imports, but there are some other cases, like the usage of sqlite3 in https://github.com/python/mypy/blob/538d36481526135c44b90383663eaa177cfc32e3/mypy/metastore.py#L18 (avoiding the import so that mypy mostly works if sqlite3 doesn't exist at runtime) and avoiding a slow import in https://github.com/python/mypy/blob/4827f3adf0acebd2dea7cc0dbf358b8daede6694/mypy/build.py#L40 (we only need "Reports" if mypy is actually generating a report; it's slow because it needs to import some big XML library). Mypy can't use from __future__ import annotations because it still supports 3.5 and 3.6.

In your gnarly solution, I'm guessing that a would include annotations like b.SomeType without directly importing b. That would work at runtime, but it would be difficult for a static type checker or linter to support, since they would now think b is an undefined name until they happen to look at a different module.

Another workaround is to just tell people to keep using string annotations in cases where they're using if TYPE_CHECKING. It's not great, and it makes PEP 649 arguably a regression relative to PEP 563, but we already have to live with it in some other cases (like TypeVar bounds), so maybe it's not the end of the world.

@larryhastings
Copy link
Owner

If you used my proposed solution for circular imports for runtime use, could you also continue the current if TYPE_CHECKING: import b workaround for module a for static type analysis?

@larryhastings
Copy link
Owner

You mean, hand-stringized annotations? I'm hoping that, if PEP 649 gets accepted, we can deprecate and eventually remove from __future__ import annotations.

@JelleZijlstra
Copy link
Contributor Author

Are these large code bases with if TYPE_CHECKING already reliant on stringized annotations in order to make their code sufficiently performant?

We have a large codebase with a lot of type annotations (not quite as large as Carl's, but still). Importing it is slow because it's a massive amount of code, not because of the annotations. I'm sure the annotations make it a bit worse, but it doesn't make enough of a difference to care about. We already set things up to cache the import and minimize the number of times we have to import the codebase.

If you used my proposed solution for circular imports for runtime use, could you also continue the current if TYPE_CHECKING: import b workaround for module a for static type analysis?

We could, good point. It's still going to be a bit fragile because our import cycles tend to involve more than just a few files, but maybe we can live with it.

You mean, hand-stringized annotations? I'm hoping that, if PEP 649 gets accepted, we can deprecate and eventually remove from future import annotations.

Yes, I mean hand-stringized annotations, like the mypy function that has -> 'sqlite3.Connection':.

@carljm
Copy link

carljm commented Mar 27, 2021

I can only speak for one or two large code bases, but our path was roughly: we started introducing type annotations, we had more and more of them, we noticed they were a significant contributor to startup time and overall CPU, we started using more hand-stringized annotations, then we were happy to see PEP 593 which allowed us to add from __future__ import annotations everywhere, get rid of most hand stringized annotations, and move most typing only imports under if TYPE_CHECKING.

Edit: I should note that a lot of the runtime CPU cost that became a problem was due to the Python 3.6 GenericMeta implementation, which is much improved now. But we still wouldn't want to go back to pre 593 days, for reasons of import cycles and delaying expensive and likely not to be needed imports.

@carljm
Copy link

carljm commented Mar 27, 2021

I think from our perspective the "module c" workaround wouldn't meet the usability or maintainability bar and we wouldn't adopt it. But the PEP also wouldn't make things any worse for us today, it just would make it harder for us to implement our own extension to get_type_hints() that can handle guarded imports. Which is something we've talked about but isn't high on the priority list, since for now we're able to just avoid guarded imports in those modules where we need runtime access to annotations. If we needed to do it after your PEP we'd have to revert to hand stringized annotations in those cases.

I do think the "module c" workaround points in the direction of what a longer term better resolution would be, as I was getting at in my first comment. Really we just want efficient lazy imports (probably via some form of module-level "cached property"), which I think Neil Schemenauer had a proof of concept of at one point.

@gvanrossum
Copy link

Sorry to state the perhaps obvious here, I have limited screen time available and this thread has already ballooned, so I'm treating it as append-only. :-)

IIUC the issue with import recursion (and to some extent large code bases) is that before annotations were in use, one could often get away with not importing a module even though an instance of a class defined in that module was used as an argument, because of duck typing. But with annotations you need to have the class in your namespace so you can name it in the annotation. And that means you have to import the module containing the class. So now you are adding imports, and before you know it you either have a new circular import (in large, mature code bases those occur frequently) or you import everything that could be used, regardless of whether it is actually needed.

For example, suppose you have a UserAccount class whose instances have a list of BackupFile objects. But most users never use backup files, so that list is nearly always empty, and the BackupFile code just manipulates the list directly. But when adding type annotations, you have to add backup_files: list[BackupFile] to the UserAccount class, and now the module defining UserAccount depends on the module defining BackupFile, and everything it depends on; or perhaps you have now introduced an import cycle, because the BackupFile module also imports the UserAccount class. The 'if False' or 'if TYPE_CHECKING' hack limits this dependency to static type checkers, where import cycles are less of a problem, and everything is loaded together anyway.

@larryhastings
Copy link
Owner

larryhastings commented Apr 6, 2021

Misquoting something I head from a friend: there's no problem you can't solve in computer science with another layer of indirection, except for the "too many layers of indirection" problem.

a.py:

def b_decl():
    import b
    return b.B

class A:
    def __init__(self, b:b_decl()): ...

b.py:

def a_decl():
    import a
    return a.A

class B:
    def __init__(self, a:a_decl()): ...

Would that sufficiently solve the circular import problem?

@JelleZijlstra
Copy link
Contributor Author

Type checkers will generally reject that as an invalid annotation: you can't use a call expression in an an annotation.

Personally I'm now OK with just recommending that you write a: "A" here, at least until we get some kind of lazy import as @carljm alluded to above. It's ugly but it doesn't come up that often (at least in our codebase) and there are already a few other places in typing where you have to use quotes even with from __future__ import annotations/co_annotations semantics.

@carljm
Copy link

carljm commented Apr 6, 2021

In brief, no :)

I think you might be operating under a misapprehension that these cyclic imports are a rare or unusual case, such that it's acceptable to introduce five or ten lines of extra code for each one where it should have otherwise been a simple annotation. This is not so -- these are very common. So "write a three-line wrapper function for every would-be cyclic import" is as much a non-starter as "introduce a new third module for every cyclic import case."

Also, I don't think any of the current type checkers are able to handle such annotations, so this option requires updates to every type checker.

Again, I think you're pushing in the right semantic direction (lazy imports), but it needs to be much more transparent and less syntactically burdensome to be a practical option.

@larryhastings
Copy link
Owner

Oh, right, yeah. Sorry. I forget static type checkers are so finicky.

@carljm
Copy link

carljm commented Apr 6, 2021

I agree with @JelleZijlstra that it's ok to move forward without solving this, not because the cyclic import case isn't common (for us at least it's quite common), but because the "cyclic import plus need to resolve annotations at runtime" case isn't common. And most important, that case already doesn't work today, since get_type_hints already requires the name to be in the module at runtime.

@larryhastings
Copy link
Owner

What about a mock object replacing expensive_module?

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from expensive_module import SomeType
else:
    class AnythingMock:
        def __init__(self, name):
            self.___name = name

        def __getattr__(self, name):
            child_name = f"{self.___name}.{name}"
            value = AnythingMock(child_name)
            setattr(self, name, value)
            return value

        def __repr__(self):
            return f"<AnythingMock {self.___name}>"

    expensive_module = AnythingMock("expensive_module")
    SomeType = expensive_module.SomeType

def f(x: SomeType) -> None: ...

print(f.__annotations__)

@larryhastings
Copy link
Owner

Here's a slightly more elaborate version that supports SomeType | int, SomeType[int], etc.

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from expensive_module import SomeType
else:
    class AnythingMock:
        def __init__(self, name):
            self.___name = name
            self.___items = {}

        def __getattr__(self, name):
            child_name = f"{self.___name}.{name}"
            value = AnythingMock(child_name)
            setattr(self, name, value)
            return value

        def __getitem__(self, name):
            name = str(name)
            if name not in self.___items:
                child_name = f"{self.___name}[{name}]"
                value = AnythingMock(child_name)
                self.___items[name] = value
            return self.___items[name]

        def __or__(self, other):
            return AnythingMock(f"{self!r}|{other!s}")

        def __repr__(self):
            return f"<AnythingMock {self.___name}>"

        def __str__(self):
            return f"<AnythingMock {self.___name}>"

    expensive_module = AnythingMock("expensive_module")
    SomeType = expensive_module.SomeType


def f(w: SomeType, x: SomeType | int, y:SomeType[int], z:SomeType[int]|expensive_module.OtherType) -> None: ...

print(f.__annotations__)

@carljm
Copy link

carljm commented Apr 6, 2021

I think that requiring an else that defines or imports an alternative for every name imported under if TYPE_CHECKING is still too much boilerplate. But I also think, as mentioned above, that we should be able to pull way back on what needs to be done in the short term for purposes of this PEP.

The status quo today with PEP 563 is that you can get __annotations__ off an object and be sure that won't raise, but get_type_hints on that object might raise if some things in its annotations are imported under if TYPE_CHECKING and therefore not actually in the module namespace at runtime. Anyone who wants more advanced capabilities can either access the raw __annotations__ themselves and get strings to work with, or pass augmented namespaces to get_type_hints().

If we could arrange such that if you access __annotations__ and some of the annotations are not in the namespace, instead of an error you get some kind of Reference object for those annotations (that contains something like the AST for the annotation, or whatever representation is easiest to provide), that would leave things in at least as good a place as they are today. Accessing __annotations__ is still safe, and the relevant information about the annotation is preserved if someone wants to implement some other kind of resolution.

Failing that, I would go back to hand-stringifying annotations in this situation before choosing any other manual workaround that's been proposed in this thread so far.

@gvanrossum
Copy link

gvanrossum commented Apr 6, 2021 via email

@carljm
Copy link

carljm commented Apr 6, 2021

Yeah, the rub is what Reference contains and how costly it is to preserve and materialize...

@JelleZijlstra
Copy link
Contributor Author

My original suggestion was to turn NameErrors in annotations into special objects, similar to the Reference objects that @carljm suggests above. A draft of this approach is implemented in #3, and it still means that we won't need AST stringification in CPython any more.

@gvanrossum
Copy link

Oh, I see. Your PR catches NameErrors and replaces them with something that reports back the name. Yes, that should work.

@larryhastings
Copy link
Owner

I think that requiring an else that defines or imports an alternative for every name imported under if TYPE_CHECKING is still too much boilerplate.

So inject the mock objects into sys.modules and let the actual import statements do the work of creating the names.

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    class AnythingMock:
        def __init__(self, name):
            self.___name = name
            self.___items = {}

        def __getattr__(self, name):
            child_name = f"{self.___name}.{name}"
            value = AnythingMock(child_name)
            setattr(self, name, value)
            return value

        def __getitem__(self, name):
            name = str(name)
            if name not in self.___items:
                child_name = f"{self.___name}[{name}]"
                value = AnythingMock(child_name)
                self.___items[name] = value
            return self.___items[name]

        def __or__(self, other):
            return AnythingMock(f"{self!r}|{other!s}")

        def __repr__(self):
            return f"<AnythingMock {self.___name}>"

        def __str__(self):
            return f"<AnythingMock {self.___name}>"

    expensive_module = AnythingMock("expensive_module")
    import sys
    sys.modules['expensive_module'] = expensive_module

from expensive_module import SomeType


def f(w: SomeType, x: SomeType | int, y:SomeType[int], z:SomeType[int]|expensive_module.OtherType) -> None: ...

print(f.__annotations__)

If you moved this injection code into a central module, let's call it no_type_checking.py, you could search-and-replace every if TYPE_CHECKING: in your code base and mechanically convert it as follows:

  1. replace the if TYPE_CHECKING: line with import no_type_checking.py, and
  2. outdent the import lines that used to be in the if TYPE_CHECKING: block.

Now you only have to create the mock modules--not every imported object--and only in one place. And if you forget to add one, your code still works, it just imports the original expensive module.

My original suggestion was to turn NameErrors in annotations into special objects, similar to the Reference objects that @carljm suggests above.

While "delayed annotations using descriptors that magically abolish NameErrors" is definitely preferable in my mind to "annotations are automatically turned into strings", I still hold out hope we can find another less-magical approach.

@carljm
Copy link

carljm commented Apr 8, 2021

I don't see "stuffing sys.modules with stub replacements as an import side effect" as either workable for real use, or as less magical than the NameError replacement approach.

The cycle problem does not impact one specific "expensive module" that one can choose in advance to stub out. It could impact any module in the code base, and a different set of modules over time.

Also, who is then responsible for replacing this stub with the real module? Import won't do it, it'll find sys.modules already populated and happily return the stub.

I'm afraid I can't see any possibility of this stub approach being acceptable solution for prod use.

@larryhastings
Copy link
Owner

I don't see "stuffing sys.modules with stub replacements as an import side effect" as either workable for real use, or as less magical than the NameError replacement approach.

It is obviously less magical than the proposed "swallow NameError and replace with a new Reference object" approach, because it doesn't require changing the language. You could do it today in Python 3.9.

And, if it works, surely it is by definition workable?

The cycle problem does not impact one specific "expensive module" that one can choose in advance to stub out. It could impact any module in the code base, and a different set of modules over time.

Right, but the set of modules is known at compile-time. Obviously it's known, because it's hard-coded in the source tree. It's the set of modules that are supposed to go in the if TYPE_CHECKING: block.

So you stub out that set of modules in my proposed no_type_checking.py. And as the list of modules changes over time, you modify that set of modules stubbed out in no_type_checking.py.

It's actually an improvement over the current approach in that respect, where that set of imports is moved into / out of an if TYPE_CHECKING: block in every individual files. If you miss moving a file into the TYPE_CHECKING block you still take the speed hit. With no_type_checking.py the list of stubbed-out modules lives in exactly one place, which makes maintenance easier.

You could even have some error-checking in there. If somebody imported one of the modules that's supposed to get stubbed-out before no_type_checking.py ran, it could notice and report the error.

Also, who is then responsible for replacing this stub with the real module? Import won't do it, it'll find sys.modules already populated and happily return the stub.

You can't replace the stub with the real module. But I thought the whole goal was to avoid importing the module at runtime. The problem I was trying to solve was "we need to not import this module in production, because it's expensive, but we need our code to still run, and we want the annotations to minimally work because sometimes people look at them".

I'm afraid I can't see any possibility of this stub approach being acceptable solution for prod use.

If the problem is "we have circular imports," you're right, this won't solve your problem. Circular imports are a pretty gnarly problem in Python and so far nobody has come up with any good solutions. And type hints seem to make them worse.

But the initial post in this issue is about "we don't want to import expensive_module at runtime". So far I remain convinced that my proposed workaround with injecting a stub into sys.modules would do a dandy job solving that problem. I genuinely don't understand why you're so negative about it.

Perhaps it would be best if we moved the discussion about circular imports into a second issue on this repo. Attempting to solve two different problems ("import expensive_module" vs "circular imports") in one conversation is not lending clarity to either discussion.

@larryhastings
Copy link
Owner

larryhastings commented Apr 8, 2021

Y'know, I just tried it, and a simple example of circular dependencies and circular imports worked first try. My code:

a.py:

from __future__ import co_annotations

import b

class A:
    def method(self, b:b.B): pass

b.py:

from __future__ import co_annotations

import a

class B:
    def method(self, a:a.A): pass

c.py:

from __future__ import co_annotations

import a
import b

print(a.A.method)
print(b.B.method)

print(a.A.method.__annotations__)
print(b.B.method.__annotations__)

I ran ./python c.py in my co_annotations directory, and it printed the methods and their annotations without complaint. It printed:

<function A.method at 0x7f34afcfb180>
<function B.method at 0x7f34afcfb0e0>
{'b': <class 'b.B'>}
{'a': <class 'a.A'>}

I'm prepared to believe that there are gnarlier circular dependencies / circular import problems where PEP 649 falls down. But at a minimum, it sure seems to work correctly with this simple example.

Can someone give me an example where they can get a NameError from evaluating an annotation, using a circular dependency / circular import problem, when running against the current repo?

@carljm
Copy link

carljm commented Apr 8, 2021

Hi Larry!

You can't replace the stub with the real module. But I thought the whole goal was to avoid importing the module at runtime. The problem I was trying to solve was "we need to not import this module in production, because it's expensive, but we need our code to still run, and we want the annotations to minimally work because sometimes people look at them".

I think we've successfully uncovered at least one miscommunication! As far as I'm aware, this problem statement does not describe a problem that anyone has. If you never want a module imported at runtime, why would you have the module around and accessible to import at all? And why would you have functions annotated to take/return instances of classes from that module, if at runtime such instances could not exist?

The problem described above that most closely resembles this is when I said "In a very large system (or even a smaller CLI tool) you may have code paths that are unlikely to be followed in a given process, but may be. There is value in having an expensive import required for that code path not occur immediately at process startup (delaying initial responsiveness of the entire system.)" But you'll note that this describes a use case where the "expensive module" is still needed at runtime, just not needed in all (or the most common) execution paths.

And this miscommunication leads to another one:

Right, but the set of modules is known at compile-time. Obviously it's known, because it's hard-coded in the source tree. It's the set of modules that are supposed to go in the if TYPE_CHECKING: block.

The "set of modules that are supposed to go in the TYPE_CHECKING block" is not a single universal set. Depending on the needs of the runtime code in each module, or on import cycle factors, a different set of imports belongs in if TYPE_CHECKING in different modules.

Perhaps it would be best if we moved the discussion about circular imports into a second issue on this repo.

I don't think so. Once we resolve the misunderstandings above, there is only one real problem here, import cycles and import expense are just two symptoms of it.

It would be best to entirely set aside the "an expensive module" framing of the problem; as @JelleZijlstra mentioned early in this thread, that was an overly-simplified and non-representative initial framing. A better core statement of the problem is something like this:

"Type annotations tend to greatly increase the import dependency fanout of a typical module. Increasing the import dependency fanout when there is no runtime need for the increased fanout is a bad thing, because it leads to many more import cycles and it unnecessarily front-loads import expense (even if there is no singular "expensive module" but rather just many many modules in the transitive dependency chain which in aggregate are expensive to import)."

Circular imports are a pretty gnarly problem in Python and so far nobody has come up with any good solutions.

In general this is true. But for the problem that "type annotations make import cycles a lot more common," there is already a really excellent solution (from the end-user perspective), which is already in wide use: PEP 563 and if TYPE_CHECKING are a very effective solution with downsides that are in practice quite manageable (indenting imports under if TYPE_CHECKING is ugly, and you have to use typing.get_type_hints() to runtime-reify your annotations). The goal here is not to regress from that good-enough existing solution :)

I genuinely don't understand why you're so negative about it.

I hope the above has clarified that the reason I was negative about it is that it doesn't work to solve any problem that I have :)

a simple example of circular dependencies and circular imports worked first try

Yes, this is generally how cyclic imports work in Python. If you import the entire module (rather than from ... import individual objects from it), and your module doesn't depend on the contents of the other module in order to execute the module body at import time, then you can get away with a cycle! If you change your example to use from a import A and/or from b import B, I think you'll see the cycle at runtime, and if you fix the runtime cycle by guarding an import with if TYPE_CHECKING, I think you'll see the NameError on annotations.

You might reasonably object that in that case we should just use the form of imports that allows cycles to work! And I would agree in general, but in more complex cases with deeply nested submodules this isn't ergonomic, and as discussed above it's still not desirable to add more spurious runtime dependencies to your modules just for type-checking purposes.

It is obviously less magical... because it doesn't require changing the language

Bit of a digression maybe; "magical" is not a well-specified term. I only pursue the digression because it might help clarify why we find different approaches more or less objectionable. IMO "requires changing the language" is a pretty arbitrary definition of what constitutes "magical" (in a negative use of the term), and I don't think it's a particularly apt definition. I would say something along the lines of "leads to unexpected behavior" or "likely to leak as an abstraction and cause unintended problems" are better definitions, and these are mostly orthogonal to "requires changes to the language." Python as it exists today has plenty of scope for "magic" already (for better or worse), and some changes to the language may allow for less "magical" solutions to some problems.

@JacobHayes
Copy link

JacobHayes commented Apr 23, 2021

Bike shedding some syntax and semantics here, perhaps we could introduce a deferred modifier to the import statement (deferred import x or deferred from x import Y) that would emit ForwardRef-like values tracking the module + optional member. Unlike ForwardRef though, they would "resolve-on-access", just like this PEP's __annotations__. Naively, it seems like the same kind of data descriptor pattern we are using for module level type hints would work for imported values (one descriptor per imported name?).

Within an annotation, these values (and member lookups) would be ignored (eg: "stringized" like PEP 563 or deferred as here/PEP 649). Accessing those values at runtime would trigger the import to be "resolved" - it doesn't magically solve circular imports. Explicitly: runtime annotation inspection (get_type_hints, the proposed inspect.annotations, etc) would also resolve the references (and potentially cause an import cycle, instead of a NameError). A flag to maintain the forward reference would be nice to have, but might be tricky if the implementation is a descriptor on the module.

I'm not very familiar with syntax parsing, but given import and from are already keywords w/ limited use, I wonder if we could add a modifier unambiguously and "conflict" free. ie: there are no names preceding the existing import and from keywords (except for yield from).

Outcome:

  • Replace if TYPE_CHECKING with deferred imports (simple cases might avoid import typing entirely)
  • Static type checking crowd maintains faster imports and avoids import cycles
  • Runtime annotations/hinting crowd gets a reference to otherwise hidden imports
    • As long as annotations aren't evaluated until after the cycle is broken, all is well
    • If annotations are evaluated eagerly (eg: @decorator, __init_subclass__, etc), they may now get an import cycle instead of NameError, barring a flag or similar to maintain the forward reference.
  • Questionably too "magical"

--

It might look something like this:

a.py:

from __future__ import co_annotations

deferred import b import B # -> `B = Ref("b.B")`

class A:
    def method(self, b: B): pass

b.py:

from __future__ import co_annotations

deferred import a # -> `a = Ref("a")`

class B:
    def method(self, a: a.A): pass

edit:

Ah, for some reason I though descriptors did work on modules, but that appears not to be the case. 😁 A wrapt or lazy-object-proxy approach/prototype might have the right semantics to toy with at the least.

`lazy_object_proxy` example
from importlib import import_module

from lazy_object_proxy import Proxy


def ref(name, ismember=False):
    def resolve():
        if ismember:
            module_name, _, member_name = name.rpartition(".")
            return getattr(import_module(module_name), member_name)
        return import_module(name)

    return Proxy(resolve)


path = ref("os.path")
Path = ref("pathlib.Path", ismember=True)

print("marker 1")

print(path)
print(path.join("/", "/home"))

print("marker 2")

print(Path)
print(Path("/") / "home")

# marker 1
# <module 'posixpath' from '/Users/jacobhayes/.pyenv/versions/3.9.0/lib/python3.9/posixpath.py'>
# /home
# marker 2
# <class 'pathlib.Path'>
# /home

With __class_getitem__, Generic+TypeVar+Literal, and a tiny bit of cast or mypy plugin magic (take Literal and lookup original type), it could probably pass type check and runtime pretty reasonably.

@carljm
Copy link

carljm commented May 14, 2021

@JacobHayes Yup, I think opt-in deferred imports are a good direction for solving this problem (as well as providing a nicer general solution for cyclic imports than Python has ever previously had.) I think for performance reasons the implementation should probably be native in the runtime rather than implemented in pure Python, and ideally totally transparent to language semantics (other than changing when import side effects occur). A co-worker has already been playing around with an implementation of this, there are some tricky issues but it looks promising.

@carljm
Copy link

carljm commented Apr 8, 2022

I think the idea described at #2 (comment) also provides a workable approach to this problem.

@ncoghlan
Copy link

ncoghlan commented Oct 26, 2022

With the release of Python 3.11, browsing the What's New had me re-reading Łukasz's article at https://lukasz.langa.pl/61df599c-d9d8-4938-868b-36b67fdb4448/.

That article got me thinking again along the lines of @JelleZijlstra's idea of making name lookup in annotations inherently lazy so NameError turned into a typing forward reference at evaluation time rather than being raised as an exception.

As @carljm noted in the comments on #2, that doesn't technically require compiler changes, it just requires the use of a non-standard globals dictionary when doing the evaluation. Writing such a replacement globals isn't entirely trivial (if you want to avoid implicitly wrapping all builtin references in ForwardRef and avoid copying the globals dict and allow assignment expressions to write back to the original globals dict), but it's straightforward to write one that's sufficient to illustrate the concept (the only thing this sketch doesn't allow is writes back to the original globals(), and PEP 649 disallows use of the walrus operator during annotation evaulation anyway):

>>> import builtins
>>> from typing import ForwardRef
>>> class ImplicitForwardRef(dict):
...     def __init__(self, ns):
...         self._ns = ns
...     def __missing__(self, key):
...         try:
...             return self._ns[key]
...         except KeyError:
...             try:
...                 return getattr(builtins, key)
...             except AttributeError:
...                 pass
...         return ForwardRef(key)
...
>>> eval("(A, ImplicitForwardRef, int)", ImplicitForwardRef(globals()))
(ForwardRef('A'), <class '__main__.ImplicitForwardRef'>, <class 'int'>)

And then as Jelle noted in the original post, ForwardRef would need runtime enhancements to support typing related expressions, which would in turn require a LazyExpression runtime type (or separate ForwardGeneric and ForwardUnion types) to represent the operations that can't be fully resolved until the underyling forward reference is resolved.

@alson
Copy link

alson commented Mar 14, 2023

I think we've successfully uncovered at least one miscommunication! As far as I'm aware, this problem statement does not describe a problem that anyone has. If you never want a module imported at runtime, why would you have the module around and accessible to import at all? And why would you have functions annotated to take/return instances of classes from that module, if at runtime such instances could not exist?

One use case I think has not been discussed much here is the use of type hints purely for benefit of the IDE / linter, like for code completion. The AWS client library boto3 does not provide any type hints, so we use botostubs that defines stub classes to use as type hints. But botostubs is a fairly big module that we do not want to include in our production build, so the current solution is declaring botostubs as dev dependency only, and gating the import behing TYPE_CHECKING:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    import botostubs

def get_thing_from_ssm(thing):
    ssm: botostubs.SSM = boto3.client('ssm')
    return ssm.get_parameter(Name=thing)

The benefit of botostubs is that the IDE will complete the methods on ssm and their parameters based on the type stubs.

@ncoghlan
Copy link

The accepted version of PEP 649 covers this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants