Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop TypedDictAnalyzer from leaking synthetic types #13732

Conversation

Michael0x2a
Copy link
Collaborator

Fixes #10007

Mypy currently crashes when you try:

  1. Creating a class-based TypedDict containing a malformed type hint
  2. Asking it to compute fine-grained dependencies, either via running dmypy or by setting the --cache-fine-grained flag.

Here is the exact sequence of events that leads to this crash:

  1. Since the type annotation is malformed, semanal initially gives the type annotation a type of RawExpressionType.

  2. TypedDictAnalyzer (correctly) determines determines that the type of the malformed type annotation should be treated as just Any in:

    analyzed = self.api.anal_type(

  3. TypedDictAnalyzer forgets to modify stmt.type like we normally do after calling self.anal_type in normal classes:

    analyzed = self.anal_type(s.type, allow_tuple_literal=allow_tuple_literal)

  4. Mypy does use the Any type when constructing the TypeInfo for the TypedDict. This is why mypy will not crash under most conditions: the correct type is being used in most places.

  5. Setting --cache-fine-grained will make mypy perform one final pass against the AST to compute fine-grained dependencies. As a part of this process, it traverses the AssigmentStatement's type field using TypeTriggersVisitor.

  6. TypeTriggersVisitor is not a SyntheticTypeVisitor. So, the visitor trips an assert when we try traversing into the RawExpressionType.

Interestingly, this same crash does not occur for NamedTuples despite the fact that NamedTupleAnalyzer also does not set stmt.type:

analyzed = self.api.anal_type(

It turns out this is because semanal.py will end up calling the analyze_class_body_common(...) function after NamedTupleAnalyzer runs, but not after TypedDictAnalyzer runs:

I'm not sure why this is: ideally, the two analyzers ought to have as similar semantics as possible. But refactoring this felt potentially disruptive, so I went for the narrower route of just patching TypedDictAnalyzer.

@github-actions

This comment has been minimized.

@ilevkivskyi
Copy link
Member

I'm not sure why this is: ideally, the two analyzers ought to have as similar semantics as possible.

We do this for named tuples since they can have methods. Ideally, we should not need to visit the body in normal way for typed dicts, as the semantics of assignment stmt is different. This solution is also OK however (at least in short term).

@Michael0x2a Michael0x2a force-pushed the fix-typeddict-semanal-analyzer-leaking-synthetic-types branch from 1690984 to 63c1080 Compare September 29, 2022 08:37
@github-actions

This comment has been minimized.

Fixes python#10007

Mypy currently crashes when you try:

1. Creating a class-based TypedDict containing a malformed type hint
2. Asking it to compute fine-grained dependencies, either via
   running dmypy or by setting the `--cache-fine-grained` flag.

Here is the exact sequence of events that leads to this crash:

1. Since the type annotation is malformed, semanal initially gives
   the type annotation a type of `RawExpressionType`.

2. TypedDictAnalyzer (correctly) determines determines that the
   type of the malformed type annotation should be treated as
   just `Any` in:
   https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal_typeddict.py#L289

3. TypedDictAnalyzer forgets to modify `stmt.type` like we normally
   do after calling `self.anal_type` in normal classes:
   https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal.py#L3022

4. Mypy _does_ use the `Any` type when constructing the TypeInfo
   for the TypedDict. This is why mypy will not crash under most
   conditions: the correct type is being used in most places.

5. Setting `--cache-fine-grained` will make mypy perform one final
   pass against the AST to compute fine-grained dependencies. As
   a part of this process, it traverses the AssigmentStatement's `type`
   field using TypeTriggersVisitor.

6. TypeTriggersVisitor is _not_ a SyntheticTypeVisitor. So, the
   visitor trips an assert when we try traversing into the
   RawExpressionType.

Interestingly, this same crash does not occur for NamedTuples
despite the fact that NamedTupleAnalyzer also does not set `stmt.type`:
https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal_namedtuple.py#L177

It turns out this is because semanal.py will end up calling the
`analyze_class_body_common(...)` function after NamedTupleAnalyzer
runs, but _not_ after TypedDictAnalyzer runs:

- https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal.py#L1510
- https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal.py#L1479

I'm not sure why this is: ideally, the two analyzers ought to have
as similar semantics as possible. But refactoring this felt potentially
disruptive, so I went for the narrower route of just patching TypedDictAnalyzer.
@Michael0x2a Michael0x2a force-pushed the fix-typeddict-semanal-analyzer-leaking-synthetic-types branch from 63c1080 to 1f9a1c0 Compare October 23, 2022 21:23
@github-actions

This comment has been minimized.

@ilevkivskyi
Copy link
Member

Probably we should print only e.g. last 100 lines in the deferral trace. Otherwise it is very noisy. Also FWIW I don't think this PR introduced a new bug, it probably just exposing some existing flaw. Still, it is probably better to fix it before merging, as crashes are always bad.

@ilevkivskyi
Copy link
Member

Actually, I think the crash may be caused by 0f4e0fb

@Michael0x2a You can try removing that .defer() and try locally. Also when you will find a simple repro, could please try if there is a similar crash on current master if you replace TypedDict with NamedTuple?

@github-actions

This comment has been minimized.

@Michael0x2a
Copy link
Collaborator Author

Michael0x2a commented Oct 24, 2022

@ilevkivskyi -- whoops, didn't see your comment until just now.

I just pushed a commit which tried throwing a has_placeholder(...) in front of the anal_type call, but it looks like that ended up just breaking the recursive typeddict tests instead so it's back to the drawing board for me.

I also included a minimal repro in my commit, in case you want to take a look -- I'm not confident I fully understand when/where we're supposed to be deferring and checking for placeholder types.

(The actual crash in numpy was due to a TypedDict in numpy.random containing a field of type ndarray[Any, dtype[uint64]] -- the dtype[uint64] apparently is a placeholder type on the first pass.)

I also tried commenting out my change and commenting out the defer call from 0f4e0fb, but I still got the crash. I also tried triggering the crash with NamedTuples, but wasn't able to repro interesting enough -- not sure why its behavior would differ from TypedDicts here either.

@github-actions

This comment has been minimized.

#
# TODO: Find some way of refactoring and partially unifying
# these two codepaths?
if not has_placeholder(analyzed):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #13516 might have been the root cause of the previous mypy_primer failures -- the previous version of this diff probably just never worked when we enabled recursive types, so it broke once I rebased pass the diff flipping the defaults.

I'm not really sure this is the correct fix though; it feels sort of ad-hoc.

I also need to figure out what's up with the bokeh error from mypy_primer -- I'm having a difficult time repro-ing those new failures locally.

@ilevkivskyi
Copy link
Member

Although the fix looks like a hack, it may be the best we can have now. Most likely the problem is that logic around placeholders in typeanal.py (in particular visit_placeholder_type(), btw IIRC all tests actually still pass if I remove it) never actually worked. This is also why on every iteration semantic analizer usually analyzes a type "from scratch" (i.e. starting from UnboundType), while your change actually makes it re-analyze an already analyzed type.

The above is just a guess I can look in more details later today or tomorrow.

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Oct 24, 2022

Yes, essentially my guess was right. For example in typeanal.py:

def visit_instance(self, t: Instance) -> Type:
    return t

If you make it actually traverse the type args, the test passes. But I bet there are several similar scenarios that will cause a crash. So you still should have the if.

In general, I think the weird state where typeanal.py is, because some people have a mental model that it should be "fixed point" style analyzer (as e.g. semantic analyzer is). While in fact only visit_unbound_type() has well thought out logic covering various corner cases etc. I guess we may need to agree on how typeanal.py should work, and clean it up accordingly. cc @JukkaL

Michael0x2a added a commit to Michael0x2a/mypy that referenced this pull request Oct 29, 2022
Fixes python#13066

During the semanal phase, mypy opts to ignore and skip processing
any malformed or illegal statements inside of a TypedDict class
definition, such as method definitions.

Skipping semanal analysis on these statements can cause any number
of odd downstream problems: the type-checking phase assumes that all
semanal-only semantic constructs (e.g. FakeInfo) have been purged
by this point, and so can crash at any point once this precondition has
been violated.

This diff opts to solve this problem by filtering down the list of
statements so we keep only the ones we know are legal within a TypedDict
definition.

The other possible solution to this problem is to modify mypy so we
skip checking TypedDict class bodies entirely during type checking and
fine-grained deps analysis. Doing this would also let address python#10007
and supersede my other diff python#13732.

I decided against doing this for now because:

1. I wasn't sure if this was actually safe, especially in the
   fine-grained deps phase and for mypyc.
2. I think no matter what, the semanal phase should not leak
   semanal-only types: relaxing this postcondition would make
   it harder to reason about mypy. So, we'd probably want to make this
   change regardless of what we do in the later phases.
JukkaL pushed a commit that referenced this pull request Oct 31, 2022
Fixes #13066

During the semanal phase, mypy opts to ignore and skip processing any
malformed or illegal statements inside of a TypedDict class definition,
such as method definitions.

Skipping semanal analysis on these statements can cause any number of
odd downstream problems: the type-checking phase assumes that all
semanal-only semantic constructs (e.g. FakeInfo) have been purged by
this point, and so can crash at any point once this precondition has
been violated.

This diff opts to solve this problem by filtering down the list of
statements so we keep only the ones we know are legal within a TypedDict
definition.

The other possible solution to this problem is to modify mypy so we skip
checking TypedDict class bodies entirely during type checking and
fine-grained deps analysis. Doing this would also let address #10007 and
supersede my other diff #13732.

I decided against doing this for now because:

1. I wasn't sure if this was actually safe, especially in the
fine-grained deps phase and for mypyc.
2. I think no matter what, the semanal phase should not leak
semanal-only types: relaxing this postcondition would make it harder to
reason about mypy. So, we'd probably want to make this change regardless
of what we do in the later phases.
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I think this is a good solution for now, no need to wait for a long-term plan for typeanal.py.

@ilevkivskyi
Copy link
Member

@Michael0x2a It looks like the crash still happens on master, I would propose to merge this (assuming the tests pass after my merge).

@github-actions

This comment has been minimized.

@ilevkivskyi
Copy link
Member

Hm, @Michael0x2a it looks like there is actually a bug in this PR, see mypy_primer result for discord. NotRequired (and Required) are stripped when analyzing type, and then we put the stripped type back as stmt.type. So if there is more than one semantic analysis iteration (some forward ref e.g.) this information will be lost when we will again re-analyze stmt.type.

Could you please take a look?

@Michael0x2a
Copy link
Collaborator Author

Yeah, will do later this week.

I was stumped earlier trying to figure out how to repro the issues w/ Bokeh -- hopefully the latest discord error will give me something more useful to work with.

@ilevkivskyi
Copy link
Member

@Michael0x2a Ah, the thing with Bokeh is not a bug in your PR, it is a bug(?) in --disallow-any-unimported. For example, if you have something like

import bazz # type: ignore

A = list[bazz.Bazz]

x: A
y: A
z: A

mypy will show errors on all three variables, instead of once for the type alias target. Maybe this was intentional, but most likely it was just overlooked. If you want to fix it, it is a simple fix, add self.msg.unimported_type_becomes_any("Type alias target component", res, s) and then transform Anys to special_form (to prevent more errors) in check_and_set_up_type_alias().

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2022

Diff from mypy_primer, showing the effect of this PR on open source code:

discord.py (https://github.com/Rapptz/discord.py)
- discord/components.py:282: error: Incompatible types (expression has type "Union[int, int, int, int, int, int, int, int]", TypedDict item "type" has type "Literal[3, 5, 6, 7, 8]")  [typeddict-item]
+ discord/components.py:281: error: Missing key "options" for TypedDict "SelectMenu"  [typeddict-item]

bokeh (https://github.com/bokeh/bokeh)
+ src/bokeh/document/json.py: note: In class "ColumnDataChanged":
+ src/bokeh/document/json.py:86:5: error: Type of variable becomes "Dict[str, Union[Sequence[Any], ndarray[Any, dtype[Any]], Any, Any]]" due to an unfollowed import  [no-any-unimported]
+ src/bokeh/document/json.py: note: In class "ColumnsStreamed":
+ src/bokeh/document/json.py:93:5: error: Type of variable becomes "Dict[str, Union[Sequence[Any], ndarray[Any, dtype[Any]], Any, Any]]" due to an unfollowed import  [no-any-unimported]

@ilevkivskyi
Copy link
Member

This is now superseded by #17495

@ilevkivskyi ilevkivskyi closed this Jul 6, 2024
ilevkivskyi added a commit that referenced this pull request Jul 6, 2024
Fixes #10007
Fixes #17477

This fixes the crash as proposed in
#13732, but also fixes some
inconsistencies in `Any` types exposed by the fix.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
JukkaL pushed a commit that referenced this pull request Jul 18, 2024
Fixes #10007
Fixes #17477

This fixes the crash as proposed in
#13732, but also fixes some
inconsistencies in `Any` types exposed by the fix.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dmypy crash with “Invalid type comment or annotation”
3 participants