Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up make_simplified_union, fix recursive tuple crash #15128

Merged
merged 8 commits into from
May 6, 2023

Conversation

hauntsaninja
Copy link
Collaborator

@hauntsaninja hauntsaninja commented Apr 25, 2023

Fixes #15192

The following code optimises make_simplified_union in the common case that there are exact duplicates in the union. In this regard, this is similar to #15104

There's a behaviour change in one unit test. I think it's good? We'll see what mypy_primer has to say.

To get this to work, I needed to use partial tuple fallbacks in a couple places (these maybe had the potential to be latent crashes anyway?) There were some interesting things going on with recursive type aliases and type state assumptions

This is about a 25% speedup on the pydantic codebase and about a 2% speedup on self check (measured with uncompiled mypy)

hauntsaninja and others added 2 commits April 25, 2023 10:18
The following code optimises make_simplified_union in the common case
that there are exact duplicates in the union. In this regard, this is
similar to python#15104

To get this to work, I needed to use partial tuple fallbacks in a couple
places (these maybe had the potential to be latent crashes anyway?)
There were some interesting things going on with recursive type aliases
and type state assumptions

This is about a 25% speedup on the pydantic codebase and about a 2%
speedup on self check (measured with uncompiled mypy)
@github-actions

This comment has been minimized.

@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented Apr 25, 2023

Looks like there's a major performance regression in materialize

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

pandas-stubs (https://github.com/pandas-dev/pandas-stubs) got 1.14 faster (68.0s -> 59.5s)

speedrun.com_global_scoreboard_webapp (https://github.com/Avasam/speedrun.com_global_scoreboard_webapp) got 1.24 faster (43.8s -> 35.2s)

@hauntsaninja hauntsaninja marked this pull request as ready for review April 26, 2023 00:11
@hauntsaninja
Copy link
Collaborator Author

(note to self: once this is merged, rename is_simple_literal to is_simple_str_literal or see if we can get rid of it)

@JukkaL
Copy link
Collaborator

JukkaL commented May 3, 2023

I did a measurement using misc/perf-compare.py and this seems roughly performance-neutral for self check using a compiled mypy:

=== Results ===

hauntsaninja-msu          4.393s (0.0%)
master                    4.405s (+0.3%)

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'm happy to see additional performance wins. I finally got around to reviewing this now that my jet lag is better.

Left a few additional ideas (optional, since this already seems like a nice improvement).

mypy/typeops.py Show resolved Hide resolved
mypy/typeops.py Show resolved Hide resolved
mypy/typeops.py Outdated Show resolved Hide resolved
# If deleted subtypes had more general truthiness, use that
orig_item = new_items[duplicate_index]
if not orig_item.can_be_true and ti.can_be_true:
new_items[duplicate_index] = true_or_false(orig_item)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only change can_be_true, or is it ok to only adjust can_be_false?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. I'm going to preserve the existing behaviour for now and explore this in another PR. These code paths don't have many unit tests (see #15094 and #15098), so I want to be careful here

if not orig_item.can_be_true and ti.can_be_true:
new_items[duplicate_index] = true_or_false(orig_item)
elif not orig_item.can_be_false and ti.can_be_false:
new_items[duplicate_index] = true_or_false(orig_item)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above.

@hauntsaninja hauntsaninja changed the title Speed up make_simplified_union, remove a potential crash Speed up make_simplified_union, fix a crash May 5, 2023
@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented May 5, 2023

This fixes a crash that just got reported #15192, so I added a regression test for it.

I added the two micro optimisations suggested.

@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2023

Diff from mypy_primer, showing the effect of this PR on open source code:

pandas-stubs (https://github.com/pandas-dev/pandas-stubs) got 1.15x faster (66.6s -> 58.0s)

speedrun.com_global_scoreboard_webapp (https://github.com/Avasam/speedrun.com_global_scoreboard_webapp) got 1.27x faster (33.3s -> 26.2s)

@hauntsaninja hauntsaninja changed the title Speed up make_simplified_union, fix a crash Speed up make_simplified_union, fix recursive tuple crash May 6, 2023
@hauntsaninja hauntsaninja merged commit 7832e1f into python:master May 6, 2023
@hauntsaninja hauntsaninja deleted the msu branch May 6, 2023 20:22
@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented May 6, 2023

Nice, looks like the primer timings are pretty consistent between:
#15128 (comment)
#15128 (comment)
and some of the earlier commits on this PR (although those earlier commits had the regression on Materialize)

@ilevkivskyi
Copy link
Member

This PR actually introduced a recursive tuple crash

[case testAliasRecursiveUnpack]
from typing import Tuple, TypeVar, Optional

T = TypeVar("T")
S = TypeVar("S")

A = Tuple[T, S, Optional[A[T, S]]]  # Must have two non-recursive items to crash
x: A[int, str]

*_, last = x
if last is not None:
    reveal_type(last)
[builtins fixtures/tuple.pyi]

I will check maybe there is a simple fix for it.

@ilevkivskyi
Copy link
Member

It looks like there is (a bug is quite obvious). I will submit a PR shortly.

@ilevkivskyi
Copy link
Member

See #15216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault when manipulating a recursive union
3 participants