-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-87729: add LOAD_SUPER_ATTR instruction for faster super() #103497
Conversation
* main: pythongh-103479: [Enum] require __new__ to be considered a data type (pythonGH-103495) pythongh-103365: [Enum] STRICT boundary corrections (pythonGH-103494) pythonGH-103488: Use return-offset, not yield-offset. (pythonGH-103502) pythongh-103088: Fix test_venv error message to avoid bytes/str warning (pythonGH-103500) pythonGH-103082: Turn on branch events for FOR_ITER instructions. (python#103507) pythongh-102978: Fix mock.patch function signatures for class and staticmethod decorators (python#103228) pythongh-103462: Ensure SelectorSocketTransport.writelines registers a writer when data is still pending (python#103463) pythongh-95299: Rework test_cppext.py to not invoke setup.py directly (python#103316)
https://github.com/carljm/cpython/compare/superopt...carljm:cpython:superopt_spec?expand=1 has a draft of the first specialization of With that specialization, the (If reviewers would prefer to just have the specialization(s) included directly in this PR and all reviewed together, let me know and I can push everything here.) |
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
Is the microbenchmark code correct? It doesn't look like you call |
The call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a fair bit of branching LOAD_SUPER_ATTR
which suggest that either it needs reworking or splitting up.
I've made a few suggestions as to how it can be made less branchy.
We'll see if that is sufficient.
The compiler code looks OK to me, but I'll leave it @iritkatriel to review it properly.
Thanks for the review!
The causes of branching are these:
(2) and (3) are both known at compile time, so we could split the opcode in two along either axis (i.e. Your suggestion above about how to handle |
* main: (24 commits) pythongh-98040: Move the Single-Phase Init Tests Out of test_imp (pythongh-102561) pythongh-83861: Fix datetime.astimezone() method (pythonGH-101545) pythongh-102856: Clean some of the PEP 701 tokenizer implementation (python#103634) pythongh-102856: Skip test_mismatched_parens in WASI builds (python#103633) pythongh-102856: Initial implementation of PEP 701 (python#102855) pythongh-103583: Add ref. dependency between multibytecodec modules (python#103589) pythongh-83004: Harden msvcrt further (python#103420) pythonGH-88342: clarify that `asyncio.as_completed` accepts generators yielding tasks (python#103626) pythongh-102778: IDLE - make sys.last_exc available in Shell after traceback (python#103314) pythongh-103582: Remove last references to `argparse.REMAINDER` from docs (python#103586) pythongh-103583: Always pass multibyte codec structs as const (python#103588) pythongh-103617: Fix compiler warning in _iomodule.c (python#103618) pythongh-103596: [Enum] do not shadow mixed-in methods/attributes (pythonGH-103600) pythonGH-100530: Change the error message for non-class class patterns (pythonGH-103576) pythongh-95299: Remove lingering setuptools reference in installer scripts (pythonGH-103613) [Doc] Fix a typo in optparse.rst (python#103504) pythongh-101100: Fix broken reference `__format__` in `string.rst` (python#103531) pythongh-95299: Stop installing setuptools as a part of ensurepip and venv (python#101039) pythonGH-103484: Docs: add linkcheck allowed redirects entries for most cases (python#103569) pythongh-67230: update whatsnew note for csv changes (python#103598) ...
@markshannon I've now addressed or replied to all comments, if you want to take another look. |
* main: (53 commits) pythongh-102498 Clean up unused variables and imports in the email module (python#102482) pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244) pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365) pythongh-94300: Update datetime.strptime documentation (python#95318) pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778) pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456) pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775) Revert "Add tests for empty range equality (python#103751)" (python#103770) pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519) pythonGH-65022: Fix description of copyreg.pickle function (python#102656) pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324) pythongh-91687: modernize dataclass example typing (python#103773) pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747) pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769) pythongh-87452: Improve the Popen.returncode docs Removed unnecessary escaping of asterisks (python#103714) pythonGH-102973: Slim down Fedora packages in the dev container (python#103283) pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095) Add tests for empty range equality (python#103751) pythongh-103712: Increase the length of the type name in AttributeError messages (python#103713) ...
* main: pythongh-101517: fix line number propagation in code generated for except* (python#103550) pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782)
* main: pythongh-100227: Only Use deepfreeze for the Main Interpreter (pythongh-103794) pythongh-103492: Clarify SyntaxWarning with literal comparison (python#103493) pythongh-101100: Fix Sphinx warnings in `argparse` module (python#103289)
* main: pythongh-87729: add LOAD_SUPER_ATTR instruction for faster super() (python#103497) pythongh-103791: Make contextlib.suppress also act on exceptions within an ExceptionGroup (python#103792)
This PR speeds up
super()
(by around 85%, for a simple one-levelsuper().meth()
microbenchmark) by avoiding allocation of a new single-usesuper()
object on each use.Microbenchmark results
With this PR:
Without this PR:
Microbenchmark code
Microbenchmark numbers are the same (both pre and post) if the microbenchmark is switched to use
return super(B, self).meth()
instead.super()
is already special-cased in the compiler to ensure the presence of the__class__
cell needed by zero-argumentsuper()
. This extends that special-casing a bit in order to compilesuper().meth()
asinstead of the current:
Bytecode comparison for simple attribute
And compile
super().attr
asinstead of the current:
The new bytecode has one more instruction, but still ends up executing much faster, because it eliminates the cost of allocating a new single-use
super
object each time. For zero-arg super, it also eliminates dynamically figuring out each time via frame introspection where to find theself
argument and__class__
cell, even though the location of both is already known at compile time.The
LOAD_GLOBAL
ofsuper
remains only in order to support existing semantics in case the namesuper
is re-bound to some other callable besides the built-insuper
type.Besides being faster, the new bytecode is preferable because it regularizes the loading of
self
and__class__
to use the normalLOAD_FAST
andLOAD_DEREF
opcodes, instead of custom code in thesuper
object (not part of the interpreter) relying on private details of interpreter frames to load these in a bespoke way. This helps optimizers like the Cinder JIT that fully supportLOAD_FAST
andLOAD_DEREF
but may not maintain frame locals in the same way. It also makes the bytecode more easily amenable to future optimization by a type-specializing tier 2 interpreter, because__class__
andself
will now be surfaced and visible to the optimizer in the usual way, rather than hidden inside thesuper
object.I'll follow up with a specialization of
LOAD_SUPER_ATTR
for the case where we are looking up a method and a method is found (because this is a common case, and a case where the output ofLOAD_SUPER_ATTR
depends only on the type ofself
and not on the actual instance). But to simplify review, I'll do this in a separate PR. I think the benefits of this PR stand alone, even without further benefits of specialization. (ETA: the specialization is now also ready at https://github.com/carljm/cpython/compare/superopt...carljm:cpython:superopt_spec?expand=1 and increases the microbenchmark win from 85% to 2.3x.)The frame introspection code for runtime/dynamic zero-arg
super()
still remains, but after this PR it would only ever be used in an odd edge case likesuper(*args)
(ifargs
turns out to be empty at runtime), where we can't detect at compile time whether we will have zero-arg or two-argsuper()
."Odd" uses of
super()
(like one-argumentsuper
, use of a super object as a descriptor etc) are still supported and experience no change; the compiler will not emit the newLOAD_SUPER_ATTR
opcode.I chose to make the new opcode more general by using it for both (statically detectable) zero- and two-arg super. Optimizing zero-arg super is more important because it is more common in modern Python code, and because it also eliminates the frame introspection. But supporting two-arg super costs only one extra bit smuggled via the oparg; this seems worth it.
Real-world results and macrobenchmarks
This approach provides a speed-up of about 0.5% globally on the Instagram server real-world workload (measured recently on Python 3.10.) I can work on a macrobenchmark for the
pyperformance
suite that exercisessuper()
(currently it isn't significantly exercised by any benchmark.) (ETA: benchmark is now ready at python/pyperformance#271 -- this diff improves its performance by 10%, the specialization follow-up by another 10%.)Prior art
This PR is essentially an updated version of #24936 -- thanks to @vladima for the original inspiration for this approach. Notable differences from that PR:
LOAD_ATTR
).super()
attribute access raisesAttributeError
, which also reduces the footprint of the PR.#30992 was an attempt to optimize
super()
solely using the specializing interpreter, but it was never merged because there are too many problems caused by adaptive super-instructions in the tier 1 specializing interpreter.