-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop TypedDictAnalyzer from leaking synthetic types #13732
Stop TypedDictAnalyzer from leaking synthetic types #13732
Conversation
This comment has been minimized.
This comment has been minimized.
We do this for named tuples since they can have methods. Ideally, we should not need to visit the body in normal way for typed dicts, as the semantics of assignment stmt is different. This solution is also OK however (at least in short term). |
1690984
to
63c1080
Compare
This comment has been minimized.
This comment has been minimized.
Fixes python#10007 Mypy currently crashes when you try: 1. Creating a class-based TypedDict containing a malformed type hint 2. Asking it to compute fine-grained dependencies, either via running dmypy or by setting the `--cache-fine-grained` flag. Here is the exact sequence of events that leads to this crash: 1. Since the type annotation is malformed, semanal initially gives the type annotation a type of `RawExpressionType`. 2. TypedDictAnalyzer (correctly) determines determines that the type of the malformed type annotation should be treated as just `Any` in: https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal_typeddict.py#L289 3. TypedDictAnalyzer forgets to modify `stmt.type` like we normally do after calling `self.anal_type` in normal classes: https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal.py#L3022 4. Mypy _does_ use the `Any` type when constructing the TypeInfo for the TypedDict. This is why mypy will not crash under most conditions: the correct type is being used in most places. 5. Setting `--cache-fine-grained` will make mypy perform one final pass against the AST to compute fine-grained dependencies. As a part of this process, it traverses the AssigmentStatement's `type` field using TypeTriggersVisitor. 6. TypeTriggersVisitor is _not_ a SyntheticTypeVisitor. So, the visitor trips an assert when we try traversing into the RawExpressionType. Interestingly, this same crash does not occur for NamedTuples despite the fact that NamedTupleAnalyzer also does not set `stmt.type`: https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal_namedtuple.py#L177 It turns out this is because semanal.py will end up calling the `analyze_class_body_common(...)` function after NamedTupleAnalyzer runs, but _not_ after TypedDictAnalyzer runs: - https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal.py#L1510 - https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal.py#L1479 I'm not sure why this is: ideally, the two analyzers ought to have as similar semantics as possible. But refactoring this felt potentially disruptive, so I went for the narrower route of just patching TypedDictAnalyzer.
63c1080
to
1f9a1c0
Compare
This comment has been minimized.
This comment has been minimized.
Probably we should print only e.g. last 100 lines in the deferral trace. Otherwise it is very noisy. Also FWIW I don't think this PR introduced a new bug, it probably just exposing some existing flaw. Still, it is probably better to fix it before merging, as crashes are always bad. |
Actually, I think the crash may be caused by 0f4e0fb @Michael0x2a You can try removing that |
This comment has been minimized.
This comment has been minimized.
@ilevkivskyi -- whoops, didn't see your comment until just now. I just pushed a commit which tried throwing a I also included a minimal repro in my commit, in case you want to take a look -- I'm not confident I fully understand when/where we're supposed to be deferring and checking for placeholder types. (The actual crash in numpy was due to a TypedDict in numpy.random containing a field of type I also tried commenting out my change and commenting out the defer call from 0f4e0fb, but I still got the crash. I also tried triggering the crash with NamedTuples, but wasn't able to repro interesting enough -- not sure why its behavior would differ from TypedDicts here either. |
This comment has been minimized.
This comment has been minimized.
# | ||
# TODO: Find some way of refactoring and partially unifying | ||
# these two codepaths? | ||
if not has_placeholder(analyzed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think #13516 might have been the root cause of the previous mypy_primer failures -- the previous version of this diff probably just never worked when we enabled recursive types, so it broke once I rebased pass the diff flipping the defaults.
I'm not really sure this is the correct fix though; it feels sort of ad-hoc.
I also need to figure out what's up with the bokeh error from mypy_primer -- I'm having a difficult time repro-ing those new failures locally.
Although the fix looks like a hack, it may be the best we can have now. Most likely the problem is that logic around placeholders in The above is just a guess I can look in more details later today or tomorrow. |
Yes, essentially my guess was right. For example in def visit_instance(self, t: Instance) -> Type:
return t If you make it actually traverse the type args, the test passes. But I bet there are several similar scenarios that will cause a crash. So you still should have the In general, I think the weird state where |
Fixes python#13066 During the semanal phase, mypy opts to ignore and skip processing any malformed or illegal statements inside of a TypedDict class definition, such as method definitions. Skipping semanal analysis on these statements can cause any number of odd downstream problems: the type-checking phase assumes that all semanal-only semantic constructs (e.g. FakeInfo) have been purged by this point, and so can crash at any point once this precondition has been violated. This diff opts to solve this problem by filtering down the list of statements so we keep only the ones we know are legal within a TypedDict definition. The other possible solution to this problem is to modify mypy so we skip checking TypedDict class bodies entirely during type checking and fine-grained deps analysis. Doing this would also let address python#10007 and supersede my other diff python#13732. I decided against doing this for now because: 1. I wasn't sure if this was actually safe, especially in the fine-grained deps phase and for mypyc. 2. I think no matter what, the semanal phase should not leak semanal-only types: relaxing this postcondition would make it harder to reason about mypy. So, we'd probably want to make this change regardless of what we do in the later phases.
Fixes #13066 During the semanal phase, mypy opts to ignore and skip processing any malformed or illegal statements inside of a TypedDict class definition, such as method definitions. Skipping semanal analysis on these statements can cause any number of odd downstream problems: the type-checking phase assumes that all semanal-only semantic constructs (e.g. FakeInfo) have been purged by this point, and so can crash at any point once this precondition has been violated. This diff opts to solve this problem by filtering down the list of statements so we keep only the ones we know are legal within a TypedDict definition. The other possible solution to this problem is to modify mypy so we skip checking TypedDict class bodies entirely during type checking and fine-grained deps analysis. Doing this would also let address #10007 and supersede my other diff #13732. I decided against doing this for now because: 1. I wasn't sure if this was actually safe, especially in the fine-grained deps phase and for mypyc. 2. I think no matter what, the semanal phase should not leak semanal-only types: relaxing this postcondition would make it harder to reason about mypy. So, we'd probably want to make this change regardless of what we do in the later phases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, I think this is a good solution for now, no need to wait for a long-term plan for typeanal.py
.
@Michael0x2a It looks like the crash still happens on master, I would propose to merge this (assuming the tests pass after my merge). |
This comment has been minimized.
This comment has been minimized.
Hm, @Michael0x2a it looks like there is actually a bug in this PR, see Could you please take a look? |
Yeah, will do later this week. I was stumped earlier trying to figure out how to repro the issues w/ Bokeh -- hopefully the latest discord error will give me something more useful to work with. |
@Michael0x2a Ah, the thing with Bokeh is not a bug in your PR, it is a bug(?) in import bazz # type: ignore
A = list[bazz.Bazz]
x: A
y: A
z: A mypy will show errors on all three variables, instead of once for the type alias target. Maybe this was intentional, but most likely it was just overlooked. If you want to fix it, it is a simple fix, add |
Diff from mypy_primer, showing the effect of this PR on open source code: discord.py (https://github.com/Rapptz/discord.py)
- discord/components.py:282: error: Incompatible types (expression has type "Union[int, int, int, int, int, int, int, int]", TypedDict item "type" has type "Literal[3, 5, 6, 7, 8]") [typeddict-item]
+ discord/components.py:281: error: Missing key "options" for TypedDict "SelectMenu" [typeddict-item]
bokeh (https://github.com/bokeh/bokeh)
+ src/bokeh/document/json.py: note: In class "ColumnDataChanged":
+ src/bokeh/document/json.py:86:5: error: Type of variable becomes "Dict[str, Union[Sequence[Any], ndarray[Any, dtype[Any]], Any, Any]]" due to an unfollowed import [no-any-unimported]
+ src/bokeh/document/json.py: note: In class "ColumnsStreamed":
+ src/bokeh/document/json.py:93:5: error: Type of variable becomes "Dict[str, Union[Sequence[Any], ndarray[Any, dtype[Any]], Any, Any]]" due to an unfollowed import [no-any-unimported]
|
This is now superseded by #17495 |
Fixes #10007
Mypy currently crashes when you try:
--cache-fine-grained
flag.Here is the exact sequence of events that leads to this crash:
Since the type annotation is malformed, semanal initially gives the type annotation a type of
RawExpressionType
.TypedDictAnalyzer (correctly) determines determines that the type of the malformed type annotation should be treated as just
Any
in:mypy/mypy/semanal_typeddict.py
Line 289 in f5ce4ee
TypedDictAnalyzer forgets to modify
stmt.type
like we normally do after callingself.anal_type
in normal classes:mypy/mypy/semanal.py
Line 3022 in f5ce4ee
Mypy does use the
Any
type when constructing the TypeInfo for the TypedDict. This is why mypy will not crash under most conditions: the correct type is being used in most places.Setting
--cache-fine-grained
will make mypy perform one final pass against the AST to compute fine-grained dependencies. As a part of this process, it traverses the AssigmentStatement'stype
field using TypeTriggersVisitor.TypeTriggersVisitor is not a SyntheticTypeVisitor. So, the visitor trips an assert when we try traversing into the RawExpressionType.
Interestingly, this same crash does not occur for NamedTuples despite the fact that NamedTupleAnalyzer also does not set
stmt.type
:mypy/mypy/semanal_namedtuple.py
Line 177 in f5ce4ee
It turns out this is because semanal.py will end up calling the
analyze_class_body_common(...)
function after NamedTupleAnalyzer runs, but not after TypedDictAnalyzer runs:mypy/mypy/semanal.py
Line 1510 in f5ce4ee
mypy/mypy/semanal.py
Line 1479 in f5ce4ee
I'm not sure why this is: ideally, the two analyzers ought to have as similar semantics as possible. But refactoring this felt potentially disruptive, so I went for the narrower route of just patching TypedDictAnalyzer.