gh-87729: add LOAD_SUPER_ATTR instruction for faster super() #103497

carljm · 2023-04-13T04:09:30Z

This PR speeds up super() (by around 85%, for a simple one-level super().meth() microbenchmark) by avoiding allocation of a new single-use super() object on each use.

Microbenchmark results

With this PR:

➜ ./python -m pyperf timeit -s 'from superbench import b' 'b.meth()'
.....................
Mean +- std dev: 70.4 ns +- 1.4 ns

Without this PR:

➜ ./python -m pyperf timeit -s 'from superbench import b' 'b.meth()'
.....................
Mean +- std dev: 130 ns +- 1 ns

Microbenchmark code

➜ cat superbench.py
class A:
    def meth(self):
        return 1

class B(A):
    def meth(self):
        return super().meth()

b = B()

Microbenchmark numbers are the same (both pre and post) if the microbenchmark is switched to use return super(B, self).meth() instead.

super() is already special-cased in the compiler to ensure the presence of the __class__ cell needed by zero-argument super(). This extends that special-casing a bit in order to compile super().meth() as

              4 LOAD_GLOBAL              0 (super)
             14 LOAD_DEREF               1 (__class__)
             16 LOAD_FAST                0 (self)
             18 LOAD_SUPER_ATTR          5 (NULL|self + meth)
             20 CALL                     0

instead of the current:

              4 LOAD_GLOBAL              1 (NULL + super)
             14 CALL                     0
             22 LOAD_ATTR                3 (NULL|self + meth)
             42 CALL                     0

Bytecode comparison for simple attribute

And compile super().attr as

              4 LOAD_GLOBAL              0 (super)
             14 LOAD_DEREF               1 (__class__)
             16 LOAD_FAST                0 (self)
             18 LOAD_SUPER_ATTR     4 (attr)

instead of the current:

              4 LOAD_GLOBAL              1 (NULL + super)
             14 CALL                     0
             22 LOAD_ATTR                2 (attr)

The new bytecode has one more instruction, but still ends up executing much faster, because it eliminates the cost of allocating a new single-use super object each time. For zero-arg super, it also eliminates dynamically figuring out each time via frame introspection where to find the self argument and __class__ cell, even though the location of both is already known at compile time.

The LOAD_GLOBAL of super remains only in order to support existing semantics in case the name super is re-bound to some other callable besides the built-in super type.

Besides being faster, the new bytecode is preferable because it regularizes the loading of self and __class__ to use the normal LOAD_FAST and LOAD_DEREF opcodes, instead of custom code in the super object (not part of the interpreter) relying on private details of interpreter frames to load these in a bespoke way. This helps optimizers like the Cinder JIT that fully support LOAD_FAST and LOAD_DEREF but may not maintain frame locals in the same way. It also makes the bytecode more easily amenable to future optimization by a type-specializing tier 2 interpreter, because __class__ and self will now be surfaced and visible to the optimizer in the usual way, rather than hidden inside the super object.

I'll follow up with a specialization of LOAD_SUPER_ATTR for the case where we are looking up a method and a method is found (because this is a common case, and a case where the output of LOAD_SUPER_ATTR depends only on the type of self and not on the actual instance). But to simplify review, I'll do this in a separate PR. I think the benefits of this PR stand alone, even without further benefits of specialization. (ETA: the specialization is now also ready at https://github.com/carljm/cpython/compare/superopt...carljm:cpython:superopt_spec?expand=1 and increases the microbenchmark win from 85% to 2.3x.)

The frame introspection code for runtime/dynamic zero-arg super() still remains, but after this PR it would only ever be used in an odd edge case like super(*args) (if args turns out to be empty at runtime), where we can't detect at compile time whether we will have zero-arg or two-arg super().

"Odd" uses of super() (like one-argument super, use of a super object as a descriptor etc) are still supported and experience no change; the compiler will not emit the new LOAD_SUPER_ATTR opcode.

I chose to make the new opcode more general by using it for both (statically detectable) zero- and two-arg super. Optimizing zero-arg super is more important because it is more common in modern Python code, and because it also eliminates the frame introspection. But supporting two-arg super costs only one extra bit smuggled via the oparg; this seems worth it.

Real-world results and macrobenchmarks

This approach provides a speed-up of about 0.5% globally on the Instagram server real-world workload (measured recently on Python 3.10.) I can work on a macrobenchmark for the pyperformance suite that exercises super() (currently it isn't significantly exercised by any benchmark.) (ETA: benchmark is now ready at python/pyperformance#271 -- this diff improves its performance by 10%, the specialization follow-up by another 10%.)

Prior art

This PR is essentially an updated version of #24936 -- thanks to @vladima for the original inspiration for this approach. Notable differences from that PR:

I avoid turning the oparg for the new opcode into a const load, preferring to pass the needed bits of information by bit-shifting the oparg instead (following the precedent of LOAD_ATTR).
I prioritize code simplicity over performance in edge cases like when a super() attribute access raises AttributeError, which also reduces the footprint of the PR.

#30992 was an attempt to optimize super() solely using the specializing interpreter, but it was never merged because there are too many problems caused by adaptive super-instructions in the tier 1 specializing interpreter.

Issue: Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver #87729

* main: pythongh-103479: [Enum] require __new__ to be considered a data type (pythonGH-103495) pythongh-103365: [Enum] STRICT boundary corrections (pythonGH-103494) pythonGH-103488: Use return-offset, not yield-offset. (pythonGH-103502) pythongh-103088: Fix test_venv error message to avoid bytes/str warning (pythonGH-103500) pythonGH-103082: Turn on branch events for FOR_ITER instructions. (python#103507) pythongh-102978: Fix mock.patch function signatures for class and staticmethod decorators (python#103228) pythongh-103462: Ensure SelectorSocketTransport.writelines registers a writer when data is still pending (python#103463) pythongh-95299: Rework test_cppext.py to not invoke setup.py directly (python#103316)

corona10 · 2023-04-13T23:23:48Z

cc @Fidget-Spinner

carljm · 2023-04-14T00:28:05Z

https://github.com/carljm/cpython/compare/superopt...carljm:cpython:superopt_spec?expand=1 has a draft of the first specialization of LOAD_SUPER_ATTR built on top of this, specializing for the method case.

With that specialization, the ./python -m pyperf timeit -s 'from superbench import b' 'b.meth()' microbenchmark shown above now runs in 56ns, down from 130ns originally and 70ns without the specialization. That's 2.3x better than the current main-branch speed. For reference, a version of the same benchmark that uses return A.meth(self) in place of return super().meth() runs in 48ns. So we are getting pretty close to zero-cost super method calls.

(If reviewers would prefer to just have the specialization(s) included directly in this PR and all reviewed together, let me know and I can push everything here.)

bedevere-bot · 2023-04-14T00:33:39Z

🤖 New build scheduled with the buildbot fleet by @corona10 for commit 92c943b 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

Objects/typeobject.c

Python/bytecodes.c

Objects/typeobject.c

Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>

markshannon · 2023-04-18T16:32:58Z

Is the microbenchmark code correct? It doesn't look like you call meth()

carljm · 2023-04-18T16:38:20Z

Is the microbenchmark code correct? It doesn't look like you call meth()

The call to b.meth() happens in the actual invocation of pyperf timeit: ./python -m pyperf timeit -s 'from superbench import b' 'b.meth()'

markshannon

There is a fair bit of branching LOAD_SUPER_ATTR which suggest that either it needs reworking or splitting up.

I've made a few suggestions as to how it can be made less branchy.
We'll see if that is sufficient.

The compiler code looks OK to me, but I'll leave it @iritkatriel to review it properly.

Lib/test/test_super.py

Objects/typeobject.c

Python/bytecodes.c

Objects/typeobject.c

carljm · 2023-04-19T00:25:09Z

@markshannon

Thanks for the review!

There is a fair bit of branching LOAD_SUPER_ATTR which suggest that either it needs reworking or splitting up.

The causes of branching are these:

Has built-in super been replaced or shadowed? This branching is unavoidable, since it can happen dynamically; we have to check at runtime.
Are we loading a method that will be called?
Is this zero-arg or two-arg super? (We only need to know this if super has been shadowed, so we can reconstruct the right call to whatever it is now.)

(2) and (3) are both known at compile time, so we could split the opcode in two along either axis (i.e. LOAD_SUPER_ATTR vs LOAD_SUPER_METHOD, or LOAD_ZERO_SUPER_ATTR vs LOAD_TWO_SUPER_ATTR). I considered both splits, and decided neither made sense: the second split would result in two separate opcodes that we'll later want to specialize to the same opcode, which is awkward, and the first split loses the parallel to how LOAD_ATTR works. (Both splits would result in code duplication.)

Your suggestion above about how to handle oparg & 2 eliminates the branching for zero-arg vs two-arg super in the shadowing case; hopefully that's enough.

Python/compile.c

* main: (24 commits) pythongh-98040: Move the Single-Phase Init Tests Out of test_imp (pythongh-102561) pythongh-83861: Fix datetime.astimezone() method (pythonGH-101545) pythongh-102856: Clean some of the PEP 701 tokenizer implementation (python#103634) pythongh-102856: Skip test_mismatched_parens in WASI builds (python#103633) pythongh-102856: Initial implementation of PEP 701 (python#102855) pythongh-103583: Add ref. dependency between multibytecodec modules (python#103589) pythongh-83004: Harden msvcrt further (python#103420) pythonGH-88342: clarify that `asyncio.as_completed` accepts generators yielding tasks (python#103626) pythongh-102778: IDLE - make sys.last_exc available in Shell after traceback (python#103314) pythongh-103582: Remove last references to `argparse.REMAINDER` from docs (python#103586) pythongh-103583: Always pass multibyte codec structs as const (python#103588) pythongh-103617: Fix compiler warning in _iomodule.c (python#103618) pythongh-103596: [Enum] do not shadow mixed-in methods/attributes (pythonGH-103600) pythonGH-100530: Change the error message for non-class class patterns (pythonGH-103576) pythongh-95299: Remove lingering setuptools reference in installer scripts (pythonGH-103613) [Doc] Fix a typo in optparse.rst (python#103504) pythongh-101100: Fix broken reference `__format__` in `string.rst` (python#103531) pythongh-95299: Stop installing setuptools as a part of ensurepip and venv (python#101039) pythonGH-103484: Docs: add linkcheck allowed redirects entries for most cases (python#103569) pythongh-67230: update whatsnew note for csv changes (python#103598) ...

carljm · 2023-04-20T09:27:37Z

@markshannon I've now addressed or replied to all comments, if you want to take another look.

* main: (53 commits) pythongh-102498 Clean up unused variables and imports in the email module (python#102482) pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244) pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365) pythongh-94300: Update datetime.strptime documentation (python#95318) pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778) pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456) pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775) Revert "Add tests for empty range equality (python#103751)" (python#103770) pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519) pythonGH-65022: Fix description of copyreg.pickle function (python#102656) pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324) pythongh-91687: modernize dataclass example typing (python#103773) pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747) pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769) pythongh-87452: Improve the Popen.returncode docs Removed unnecessary escaping of asterisks (python#103714) pythonGH-102973: Slim down Fedora packages in the dev container (python#103283) pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095) Add tests for empty range equality (python#103751) pythongh-103712: Increase the length of the type name in AttributeError messages (python#103713) ...

* main: pythongh-101517: fix line number propagation in code generated for except* (python#103550) pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782)

bedevere-bot · 2023-04-24T21:36:56Z

🤖 New build scheduled with the buildbot fleet by @carljm for commit 0de5bc6 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

* main: pythongh-100227: Only Use deepfreeze for the Main Interpreter (pythongh-103794) pythongh-103492: Clarify SyntaxWarning with literal comparison (python#103493) pythongh-101100: Fix Sphinx warnings in `argparse` module (python#103289)

bedevere-bot · 2023-04-24T21:52:08Z

🤖 New build scheduled with the buildbot fleet by @carljm for commit dbe1665 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

* main: pythongh-87729: add LOAD_SUPER_ATTR instruction for faster super() (python#103497) pythongh-103791: Make contextlib.suppress also act on exceptions within an ExceptionGroup (python#103792)

carljm added 2 commits April 12, 2023 20:12

pythongh-87729: add instruction for faster zero-arg super()

0a0ebe2

add news entry

f14a3bf

carljm requested review from brettcannon, encukou, ericsnowcurrently, ncoghlan, warsaw, tiran, markshannon and iritkatriel as code owners April 13, 2023 04:09

bedevere-bot mentioned this pull request Apr 13, 2023

Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver #87729

Closed

bedevere-bot added the awaiting core review label Apr 13, 2023

carljm mentioned this pull request Apr 13, 2023

bpo-43563 : Introduce dedicated opcodes for super calls #24936

Closed

carljm added 4 commits April 12, 2023 21:32

document LOAD_ZERO_SUPER_ATTR

5625e73

fix gdb test_wrapper_call

0775efe

optimize 2-arg super also

f229b5b

fix incompatible assignment

626999d

carljm changed the title ~~gh-87729: add instruction for faster zero-arg super()~~ gh-87729: add LOAD_SUPER_ATTR instruction for faster super() Apr 13, 2023

carljm added 2 commits April 13, 2023 09:52

fix bad first arg

92c943b

corona10 added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Apr 14, 2023

bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Apr 14, 2023

corona10 reviewed Apr 14, 2023

View reviewed changes

Objects/typeobject.c Outdated Show resolved Hide resolved

corona10 reviewed Apr 14, 2023

View reviewed changes

Python/bytecodes.c Outdated Show resolved Hide resolved

corona10 reviewed Apr 14, 2023

View reviewed changes

Objects/typeobject.c Outdated Show resolved Hide resolved

carljm and others added 2 commits April 14, 2023 08:35

Apply suggestions from code review

775ed0f

Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>

don't unnecessarily re-find args in error case

2077f1a

update generated cases with new comment

3a3cb74

markshannon reviewed Apr 18, 2023

View reviewed changes

Lib/test/test_super.py Show resolved Hide resolved

Lib/test/test_super.py Show resolved Hide resolved

Objects/typeobject.c Outdated Show resolved Hide resolved

Python/bytecodes.c Outdated Show resolved Hide resolved

Objects/typeobject.c Show resolved Hide resolved

iritkatriel reviewed Apr 19, 2023

View reviewed changes

Python/compile.c Show resolved Hide resolved

carljm added 4 commits April 19, 2023 15:58

simplify oparg & 2 handling

e4466a7

cleanup and clarification

f161268

move __class__ special case out of the fast path

df442c0

markshannon self-requested a review April 24, 2023 21:12

markshannon approved these changes Apr 24, 2023

View reviewed changes

Merge branch 'main' into superopt

0de5bc6

* main: pythongh-101517: fix line number propagation in code generated for except* (python#103550) pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782)

carljm enabled auto-merge (squash) April 24, 2023 21:27

carljm added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Apr 24, 2023

bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Apr 24, 2023

Merge branch 'main' into superopt

dbe1665

* main: pythongh-100227: Only Use deepfreeze for the Main Interpreter (pythongh-103794) pythongh-103492: Clarify SyntaxWarning with literal comparison (python#103493) pythongh-101100: Fix Sphinx warnings in `argparse` module (python#103289)

carljm added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Apr 24, 2023

bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Apr 24, 2023

carljm merged commit 0dc8b50 into python:main Apr 24, 2023

bedevere-bot removed the awaiting merge label Apr 24, 2023

carljm deleted the superopt branch April 28, 2023 18:34

carljm mentioned this pull request May 30, 2023

3.12.0b1 includes backwards incompatible change to operation of super() #105035

Closed

tekknolagi mentioned this pull request Jun 12, 2023

Optimize super load method tekknolagi/skybison#503

Open

cdce8p mentioned this pull request Jul 20, 2023

3.12.0b4 Backwards incompatible change with reassignment of cls.__new__ and super() #106917

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-87729: add LOAD_SUPER_ATTR instruction for faster super() #103497

gh-87729: add LOAD_SUPER_ATTR instruction for faster super() #103497

carljm commented Apr 13, 2023 •

edited

Loading

corona10 commented Apr 13, 2023

carljm commented Apr 14, 2023

bedevere-bot commented Apr 14, 2023

markshannon commented Apr 18, 2023

carljm commented Apr 18, 2023

markshannon left a comment

carljm commented Apr 19, 2023 •

edited

Loading

carljm commented Apr 20, 2023

bedevere-bot commented Apr 24, 2023

bedevere-bot commented Apr 24, 2023

gh-87729: add LOAD_SUPER_ATTR instruction for faster super() #103497

gh-87729: add LOAD_SUPER_ATTR instruction for faster super() #103497

Conversation

carljm commented Apr 13, 2023 • edited Loading

Real-world results and macrobenchmarks

Prior art

corona10 commented Apr 13, 2023

carljm commented Apr 14, 2023

bedevere-bot commented Apr 14, 2023

markshannon commented Apr 18, 2023

carljm commented Apr 18, 2023

markshannon left a comment

Choose a reason for hiding this comment

carljm commented Apr 19, 2023 • edited Loading

carljm commented Apr 20, 2023

bedevere-bot commented Apr 24, 2023

bedevere-bot commented Apr 24, 2023

carljm commented Apr 13, 2023 •

edited

Loading

carljm commented Apr 19, 2023 •

edited

Loading