-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mypyc] Detect always defined attributes #12600
Conversation
This way we can more easily implement data flow analyses that don't range over Values.
Improve analysis and add tests Use always defined attributes to simplify get attr operations Support conditionally defined always defined attributes Update docstring Better handle conditionally defined attributes Update docstring Comment updates Collect initializers [WIP] Faster initialization of always defined attributes Simplify generated code a little [WIP] Simplify code generated for set attr Tests are still failing! Update assert Fix properties WIP update wrapper functions Fix after rebase Attributes that are __deletable__ are never always defiend Rewrite always defined analysis to default to disallow This is safer that defaulting to allow. Allow construction of collections Improve always defined analysis Support True and False Support reference to module-level final attribute Rename mypyc.analysis.defined to mypyc.irbuild.defined Update docstring Fix type check Fix lint WIP new implementation over IR
If we know that some helper won't arbitrarily mutate heap, etc., we can perform some additional optimizations. I'm going to use this for detecting always defined attributes, but this could also be used for common subexpression elimination and other things. Also manually tag some functions that won't run arbitrary code. Many of these are actually not quite correct. I think that anything that may allocate or free memory is actually able to run arbitrary code through user-defined `__del__` methods, possibly called by the cyclic garbage collector. We optimistically ignore this, since `__del__` methods that do weird stuff seem very unlikely (and are honestly asking for trouble). If we are totally conservative, there is very little we can assume about side effects. Here's is a simple example where this can be useful: ``` @Final class C: def __init__(self) -> None" self.a: list[int] = [] self.b = 1 ``` If we can assume that `[]` doesn't call arbitrary code that might read `self.b`, we can infer that both `a` and `b` attributes are always defined, and we don't need to check whether they are defined on each get or set operation.
This comment has been minimized.
This comment has been minimized.
In particular I'd love to hear ideas about tricky edge cases that the implementation might not support correctly. I've fixed a bunch, but there may well be others that I've overlooked. This turned out to be a fairly complex problem. |
The analysis isn't super helpful with mypy, since we tend to call other methods in |
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
@@ -58,7 +58,7 @@ def assert_blobs_same(x: Any, y: Any, trail: Tuple[Any, ...]) -> None: | |||
assert x.keys() == y.keys(), "Keys mismatch at {}".format(trail) | |||
for k in x.keys(): | |||
assert_blobs_same(x[k], y[k], trail + (k,)) | |||
elif isinstance(x, Iterable) and not isinstance(x, str): | |||
elif isinstance(x, Iterable) and not isinstance(x, (str, set)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit odd that we treat sets different from other iterables (I understand treating strs differently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sets have unpredictable ordering, so we can't use the special casing for them.
mypyc/analysis/attrdefined.py
Outdated
# If an attribute we *set* may be sometimes undefined and | ||
# sometimes defined, don't consider it always defined. | ||
if isinstance(op, SetAttr) and op.obj is self_reg: | ||
attr = op.attr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use op.attr
here like above? I don't think we get much benefit, and avoiding clutting the namespace of a large-ish function would be better. (And it would be more similar to above)
mypyc/analysis/attrdefined.py
Outdated
if op.attr in maybe_undefined.before[block, i]: | ||
attrs.discard(op.attr) | ||
# If an attribute we *set* may be sometimes undefined and | ||
# sometimes defined, don't consider it always defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what makes it different from the get case?
dump_always_defined: Final = False | ||
|
||
|
||
def analyze_always_defined_attrs(class_irs: List[ClassIR]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either here, or in analyze_alays_defined_attrs_in_class
should explain which attributes are being set on the class IR
# Instance attributes that are initialized in the class body. | ||
self.attrs_with_defaults: Set[str] = set() | ||
|
||
# Attributes that are always initialized in __init__ or class body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain where they should be set here too, I think. Or choose one of this and the other comment to put the explanation.
|
||
This can be used to mutate the copy with truthiness information. | ||
|
||
Classes compiled with mypyc don't support copy.copy(), so we need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not supporting copy.copy
is pretty rough. This will also mean that we don't support pickle
, right?
I don't think that we can reasonably break those without providing an opt-opt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I was planning to provide a way to do this later, but I can add it to this PR. We could have a per-class flag saying that this class (and any subclasses, probably) is picklable, and no attribute would be treated as always defined in the class, and thus we can always provide a way of creating an uninitialized instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's supported by using @mypyc_attr(serializable=True)
.
@@ -128,100 +128,100 @@ def __str__(self) -> str: | |||
return 'before: %s\nafter: %s\n' % (self.before, self.after) | |||
|
|||
|
|||
GenAndKill = Tuple[Set[Value], Set[Value]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not GenAndKill = Tuple[Set[T], Set[T]]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good idea! Will do.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
from m import define_interpreted_subclass | ||
|
||
@mypyc_attr(allow_interpreted_subclasses=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add a test with both allow_interpreted_subclasses
and serializable
? For example, what if they are "conflicting" like former is True but latter is False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be reasonable. allow_interpreted_subclasses
would take precedence and the class would be serializable. In the future we could reject incompatible attribute values.
self.n = n | ||
|
||
data = pickle.dumps(Cls(5)) | ||
obj = pickle.loads(data) # OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention what happens without the attr? Does it segfault or raises an exception etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good idea. I'll update this. It raises an exception about __init__
being called with too few arguments, IIRC.
According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉 |
Use static analysis to find attributes that are always defined. Always defined
attributes don't require checks on each access. This makes them faster and
also reduces code size.
Attributes defined in the class body and assigned to in all code paths in
__init__
are always defined. We need to know all subclasses staticallyto determine whether
__init__
always defines an attribute in every case,including in subclasses.
The analysis looks at
__init__
methods and supports limited inter-proceduralanalysis over
super().__init__(...)
calls. Otherwise we rely on intra-proceduralanalysis to keep the analysis fast.
As a side effect,
__init__
will now always be called when constructing an object.This means that
copy.copy
(and others like it) won't be supported for nativeclasses unless
__init__
can be called without arguments.mypyc/analysis/attrdefined.py
has more details about the algorithm indocstrings.
Performance impact to selected benchmarks (with clang):
The richards result is probably an outlier.
This will also significantly help with native integers (mypyc/mypyc#837, as tracking
undefined values would otherwise require extra memory use.
Closes mypyc/mypyc#836.