Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mypyc] Detect always defined attributes #12600

Merged
merged 100 commits into from
May 17, 2022
Merged

Conversation

JukkaL
Copy link
Collaborator

@JukkaL JukkaL commented Apr 16, 2022

Use static analysis to find attributes that are always defined. Always defined
attributes don't require checks on each access. This makes them faster and
also reduces code size.

Attributes defined in the class body and assigned to in all code paths in
__init__ are always defined. We need to know all subclasses statically
to determine whether __init__ always defines an attribute in every case,
including in subclasses.

The analysis looks at __init__ methods and supports limited inter-procedural
analysis over super().__init__(...) calls. Otherwise we rely on intra-procedural
analysis to keep the analysis fast.

As a side effect, __init__ will now always be called when constructing an object.
This means that copy.copy (and others like it) won't be supported for native
classes unless __init__ can be called without arguments.

mypyc/analysis/attrdefined.py has more details about the algorithm in
docstrings.

Performance impact to selected benchmarks (with clang):

  • richards +28%
  • deltablue +10%
  • hexiom +1%

The richards result is probably an outlier.

This will also significantly help with native integers (mypyc/mypyc#837, as tracking
undefined values would otherwise require extra memory use.

Closes mypyc/mypyc#836.

JukkaL added 30 commits April 15, 2022 09:52
This way we can more easily implement data flow analyses that don't
range over Values.
Improve analysis and add tests

Use always defined attributes to simplify get attr operations

Support conditionally defined always defined attributes

Update docstring

Better handle conditionally defined attributes

Update docstring

Comment updates

Collect initializers

[WIP] Faster initialization of always defined attributes

Simplify generated code a little

[WIP] Simplify code generated for set attr

Tests are still failing!

Update assert

Fix properties

WIP update wrapper functions

Fix after rebase

Attributes that are __deletable__ are never always defiend

Rewrite always defined analysis to default to disallow

This is safer that defaulting to allow.

Allow construction of collections

Improve always defined analysis

Support True and False

Support reference to module-level final attribute

Rename mypyc.analysis.defined to mypyc.irbuild.defined

Update docstring

Fix type check

Fix lint

WIP new implementation over IR
If we know that some helper won't arbitrarily mutate heap, etc., we
can perform some additional optimizations. I'm going to use this for
detecting always defined attributes, but this could also be used for
common subexpression elimination and other things.

Also manually tag some functions that won't run arbitrary code.

Many of these are actually not quite correct. I think that anything
that may allocate or free memory is actually able to run arbitrary
code through user-defined `__del__` methods, possibly called by the
cyclic garbage collector. We optimistically ignore this, since
`__del__` methods that do weird stuff seem very unlikely (and are
honestly asking for trouble). If we are totally conservative, there is
very little we can assume about side effects.

Here's is a simple example where this can be useful:

```
@Final
class C:
    def __init__(self) -> None"
        self.a: list[int] = []
        self.b = 1
```

If we can assume that `[]` doesn't call arbitrary code that might read
`self.b`, we can infer that both `a` and `b` attributes are always
defined, and we don't need to check whether they are defined on each
get or set operation.
@github-actions

This comment has been minimized.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Apr 16, 2022

In particular I'd love to hear ideas about tricky edge cases that the implementation might not support correctly. I've fixed a bunch, but there may well be others that I've overlooked. This turned out to be a fairly complex problem.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Apr 16, 2022

The analysis isn't super helpful with mypy, since we tend to call other methods in __init__ methods, defeating the analysis. My intention is to eventually refactor some of the more widely used mypy implementation classes to better support always defined attribute analysis (in follow-up PRs).

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@@ -58,7 +58,7 @@ def assert_blobs_same(x: Any, y: Any, trail: Tuple[Any, ...]) -> None:
assert x.keys() == y.keys(), "Keys mismatch at {}".format(trail)
for k in x.keys():
assert_blobs_same(x[k], y[k], trail + (k,))
elif isinstance(x, Iterable) and not isinstance(x, str):
elif isinstance(x, Iterable) and not isinstance(x, (str, set)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit odd that we treat sets different from other iterables (I understand treating strs differently)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sets have unpredictable ordering, so we can't use the special casing for them.

@JukkaL JukkaL requested a review from msullivan April 19, 2022 18:06
# If an attribute we *set* may be sometimes undefined and
# sometimes defined, don't consider it always defined.
if isinstance(op, SetAttr) and op.obj is self_reg:
attr = op.attr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use op.attr here like above? I don't think we get much benefit, and avoiding clutting the namespace of a large-ish function would be better. (And it would be more similar to above)

if op.attr in maybe_undefined.before[block, i]:
attrs.discard(op.attr)
# If an attribute we *set* may be sometimes undefined and
# sometimes defined, don't consider it always defined.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what makes it different from the get case?

dump_always_defined: Final = False


def analyze_always_defined_attrs(class_irs: List[ClassIR]) -> None:
Copy link
Collaborator

@jhance jhance Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either here, or in analyze_alays_defined_attrs_in_class should explain which attributes are being set on the class IR

# Instance attributes that are initialized in the class body.
self.attrs_with_defaults: Set[str] = set()

# Attributes that are always initialized in __init__ or class body
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain where they should be set here too, I think. Or choose one of this and the other comment to put the explanation.


This can be used to mutate the copy with truthiness information.

Classes compiled with mypyc don't support copy.copy(), so we need
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not supporting copy.copy is pretty rough. This will also mean that we don't support pickle, right?

I don't think that we can reasonably break those without providing an opt-opt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I was planning to provide a way to do this later, but I can add it to this PR. We could have a per-class flag saying that this class (and any subclasses, probably) is picklable, and no attribute would be treated as always defined in the class, and thus we can always provide a way of creating an uninitialized instance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's supported by using @mypyc_attr(serializable=True).

@@ -128,100 +128,100 @@ def __str__(self) -> str:
return 'before: %s\nafter: %s\n' % (self.before, self.after)


GenAndKill = Tuple[Set[Value], Set[Value]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not GenAndKill = Tuple[Set[T], Set[T]]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good idea! Will do.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


from m import define_interpreted_subclass

@mypyc_attr(allow_interpreted_subclasses=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a test with both allow_interpreted_subclasses and serializable? For example, what if they are "conflicting" like former is True but latter is False.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be reasonable. allow_interpreted_subclasses would take precedence and the class would be serializable. In the future we could reject incompatible attribute values.

self.n = n

data = pickle.dumps(Cls(5))
obj = pickle.loads(data) # OK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention what happens without the attr? Does it segfault or raises an exception etc.

Copy link
Collaborator Author

@JukkaL JukkaL May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good idea. I'll update this. It raises an exception about __init__ being called with too few arguments, IIRC.

@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@JukkaL JukkaL merged commit 7bd6fdd into master May 17, 2022
@JukkaL JukkaL deleted the always-defined-inheritance-5 branch May 17, 2022 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Always defined attributes
5 participants