Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider field value types when disambiguating a union of TypedDicts #11199

Closed
wants to merge 2 commits into from

Conversation

bluetech
Copy link
Contributor

@bluetech bluetech commented Sep 26, 2021

Description

Fixes #8533.

Previously, given a union of TypedDicts, e.g. A|B in

from typing import TypedDict, Literal, Union

class A(TypedDict):
    tag: Literal['A']
    extra_a: str

class B(TypedDict):
    tag: Literal['B']
    extra_b: str

when needing to disambiguate the union, e.g.

td: A|B = {
    'tag': 'A',
    'extra_a': 'foo',
}

mypy would only consider the keys of the dict expression and
TypedDict, e.g. 'tag' and 'extra_a'. But if multiple members of the
union have the same shape, only distinguished by a value type, the
disambiguation fails, e.g.

class A(TypedDict):
    tag: Literal['A']

class B(TypedDict):
    tag: Literal['B']

td: A|B = {  # E: Type of TypedDict is ambiguous, could be any of ("A", "B")
    'tag': 'A',
}

To allow this, also consider the types of the dict expression's values
when narrowing the candidates from the union.

Test Plan

I've added a test case and also successfully ran on my own code where I've hit this issue. The test fails before and passes after.

Fixes python#8533.

Previously, given a union of TypedDicts, e.g. `A|B` in

```py
from typing import TypedDict, Literal, Union

class A(TypedDict):
    tag: Literal['A']
    extra_a: str

class B(TypedDict):
    tag: Literal['B']
    extra_b: str
```

when needing to disambiguate the union, e.g.

```
td: A|B = {
    'tag': 'A',
    'extra_a': 'foo',
}
```

mypy would only consider the *keys* of the dict expression and
TypedDict, e.g. 'tag' and 'extra_a'. But if multiple members of the
union have the same shape, only distinguished by a value type, the
disambiguation fails, e.g.

```py
class A(TypedDict):
    tag: Literal['A']

class B(TypedDict):
    tag: Literal['B']

td: A|B = {  # E: Type of TypedDict is ambiguous, could be any of ("A", "B")
    'tag': 'A',
}
```

To allow this, also consider the types of the dict expression's values
when narrowing the candidates from the union.
@bluetech
Copy link
Contributor Author

Note: I don't really know if self.accept/is_subtype is the right way to do this check. In the issue (#8533) @Michael0x2a implied there might need to be more substantial changes to the flow of disambiguation/type-checking, I suspect that's true. But since it's a small change and seems to work well, I'm submitting it anyway.

C2 = TypedDict('C2', {'@type': Literal['c-type'], 'extra': str})

x: Union[A, B, C1, C2]
x = {'@type': 'a-type'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but in the test case above [case testTypedDictUnionUnambiguousCase] - the inferred type of c: Union[A, B] = {'@type': 'a-type', 'a': 'Test'} is Union[TypedDict('__main__.A', {'@type': Literal['a-type'], 'a': builtins.str}), TypedDict('__main__.B', {'@type': Literal['b-type'], 'b': builtins.int})] which looks inconsistent with this case.

Should it be this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference if it's annotated directly c: Union[A, B] = ... or separately c: Union[A, B]; c = .... Initially I emulated the style of the previous test (annotated directly), but then switched to this one because it's shorter. I cannot explain why the difference exist, naively I would have thought they should be equivalent. I can try to figure it out if you think it's pertinent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naively I would have thought they should be equivalent

Yeah, me too! 😄 I would be very confused if I saw this. I don't remember any other cases when x: Annotation; x = ... and x: Annotation = ... would produce different results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this behavior is not particular to TypedDict:

class A: pass
class B: pass

x: A|B
x = A()
reveal_type(x)  # Revealed type is A

y: A|B = A()
reveal_type(y)  # Revealed type is Union[A, B]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this behavior is suprising. We have it for historical reasons and because without this feature there would be no easy way to force a wider/more general type for a variable in a type-safe manner.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

This is probably fine. There's one context I can think of where this might be problematic. Can you add a test case where you do something like this:

x: TD1 | TD2
x = {'x': 1, 'y': []}   # Note empty list, disambiguate to only TD1 or TD2 based on 'x'
reveal_type(x)

The type of y should be list[int], for example, in the TypedDict type. I'm wondering how the type of the empty list would be inferred.

@bluetech
Copy link
Contributor Author

bluetech commented Oct 1, 2021

@JukkaL thanks for reviewing!

I added the test you suggested, and it's seems good.

Also added a check for an ambiguous case, where the xs have the same type, but the ys are List[int] or List[str]. Given [], it should be ambiguous, and it is. But then I've had a thought to also test this given [1], which should go to the List[int] TypedDict and not be ambiguous -- but it is. So I need to look into it.

For reference, the failing check is

from typing import Union, List
from typing_extensions import TypedDict

A = TypedDict('A', {'x': str, 'y': List[int]})
B = TypedDict('B', {'x': int, 'y': List[int]})
C = TypedDict('C', {'x': int, 'y': List[str]})

abc_unambiguous: Union[A, B, C]                                                                                                            
abc_unambiguous = {'x': 0, 'y': [1]} # N: Revealed type is "TypedDict('__main__.B', {'x': builtins.int, 'y': builtins.list[builtins.int]})"
                                     # Actual: Type of TypedDict is ambiguous, could be any of ("B", "C")

@bluetech
Copy link
Contributor Author

I tried to find a solution but failed unfortunately. The problem is this: as part of the proposed algorithm, I check the dict expr (the rvalue) against each TypedDict in the union. For each TypedDict, I go over each key-value and ask "is the dict expr key's value (in the example: [1]) compatible with the TypedDict key value's type (in the example: List[str])?". The "compatible" check is done like this:

value_type = self.accept(node=<dict expr key's value expr>, type_context=<TypedDict key's value type>`)
is value_type a subtype of <TypedDict key's value type>?

I pass type_context to self.accept because otherwise it will return str for strings instead of Literal when the TypedDict key's value type being matched against is a literal, and won't match when it should.

But then, self.accept(ListExpr[IntExpr(1)], type_context=list[str]) does two things I don't want: emits an incompatible-type error, and returns list[str], which matches when it shouldn't.

So there's a problem if I pass the type context, and there's a problem if I don't.

Hopefully someone else can figure it out!

@bluetech bluetech closed this Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Type of TypedDict is ambiguous" despite tag fields being distinct Literal types
3 participants