Increase fraction of code executed by tier 2. #118093

markshannon · 2024-04-19T09:49:54Z

According to stats and profiling only about 40% of bytecode instructions are executed by tier 2 and the remaining 60% by tier 1.

We the expected improvements to the JIT and tier 2 optimizer we expect tier 2 (with JIT) to have a significantly faster than tier 1.
It therefore make sense to get the fraction of instructions executed by tier 2 up from 40% to nearer 90%.

To do that we need to:

Enter tier 2 at some function entry points as well as back edges. #118094
Stay in tier 2 when exiting executors in most cases Trace stitching #112354
Increase the number of micro-ops that we can handle in tier 2 #118095

Linked PRs

The text was updated successfully, but these errors were encountered:

brandtbucher · 2024-07-17T02:25:47Z

By my count we're currently hovering around 54% of code executed in tier two (our benchmarks run about 266 billion tier one instructions on normal builds and 122 billion instructions on JIT builds). I've identified a few strategies for improving this (based on stats and tracing through how we execute a bunch of the benchmarks) and will start landing PRs soon. No magic bullets here, just chipping away at things:

Add specializations for CALL_KW and CALL_FUNCTION_EX. We probably don't need to do anything too crazy here early on in terms of optimizing calls... we can start by just adding a handful of specializations that allow us to trace through them instead of ending the trace.
Add tier two support to several other instructions that are prematurely ending traces. Some of these are easy (CALL_LIST_APPEND, IMPORT_NAME, LOAD_NAME, BUILD_SET, SEND_GEN, and IMPORT_FROM), and some are harder (LOAD_ATTR_PROPERTY, BINARY_SUBSCR_GETITEM, CALL_ALLOC_AND_ENTER_INIT, RAISE_VARARGS, and BINARY_OP_INPLACE_ADD_UNICODE).
Specialize SEND_ASYNC_GEN_ANEXT, using a similar shim frame as CALL_ALLOC_AND_ENTER_INIT.
Handle underflow, either dynamically (with DYNAMIC_EXIT) or statically (by using the current stack when projecting to infer callers). A more radical idea could be to start recording traces instead of projecting them. That simplifies a lot of things, but is a big rewrite of some pretty core stuff.
Turn more DEOPT_IFs into EXIT_IFs for better handling of control flow and polymorphism. _FOR_ITER_TIER_TWO is an obvious candidate here, but there are others, too.
Allow shorter traces (it's tricky to do this while still requiring progress, but doable).
Better handling of polymorphism (our current progress requirement inhibits this, but that can be relaxed without too much trouble).
Remove invalid traces from side exits (currently we just remove them from the bytecode, and side exits not only keep the invalid trace alive and continually deopting, but also prevent new traces from taking their place).
Be better about closing loops in one trace, by allowing a single jump to occur anywhere in a trace, rather than always at the start.

My motivation for this is to make JIT improvements more pronounced. We currently spend less than 10% of our time in the JIT (vs ~25% of our time in tier one), which means that we need to improve the performance of JIT code by over 10% just to see a 1% improvement on the benchmarks. My (probably ambitious) goal is to get the fraction of code executed in tier two up to around 80% (meaning, in the neighborhood of 25%-30% of the total time spent running the benchmarks) in the next couple of weeks. Then the improvements can be easier to measure and iterate on.

brandtbucher · 2024-07-17T02:28:22Z

It's also worth noting that our stats are currently broken on benchmarks that use C extensions or spawn subprocesses. So the actual numbers may vary a bit right now, but probably aren't heavily biased one way or another.

…GH-121885)

…-122253)

… two (GH-122843)

…123140) * Convert CALL_ALLOC_AND_ENTER_INIT to micro-ops such that tier 2 supports it * Allow inexact arguments for CALL_ALLOC_AND_ENTER_INIT.

…n tier two (pythonGH-122843)

pythonGH-123140) * Convert CALL_ALLOC_AND_ENTER_INIT to micro-ops such that tier 2 supports it * Allow inexact arguments for CALL_ALLOC_AND_ENTER_INIT.

…_GENERAL` (GH-123212) Specialize classes without vectorcall as CALL_NON_PY_GENERAL

markshannon added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.13 bugs and security fixes labels Apr 19, 2024

markshannon mentioned this issue Apr 19, 2024

Things to do for 3.13 faster-cpython/ideas#654

Open

9 tasks

markshannon mentioned this issue May 24, 2024

Treat slices with constant parts as constants and remove BINARY_SLICE opcodes #119500

Closed

bedevere-app bot mentioned this issue Jul 17, 2024

GH-118093: Add tier two support to several instructions #121884

Merged

brandtbucher added 3.14 new features, bugs and security fixes and removed 3.13 bugs and security fixes labels Jul 17, 2024

bedevere-app bot mentioned this issue Jul 17, 2024

GH-118093: Remove invalidated executors from side exits #121885

Merged

brandtbucher added a commit that referenced this issue Jul 18, 2024

GH-118093: Add tier two support to several instructions (GH-121884)

7b36b67

brandtbucher added a commit that referenced this issue Jul 24, 2024

GH-118093: Remove invalidated executors from side exits (GH-121885)

794546f

nohlson pushed a commit to nohlson/cpython that referenced this issue Jul 24, 2024

pythonGH-118093: Remove invalidated executors from side exits (python…

91d2700

…GH-121885)

nohlson pushed a commit to nohlson/cpython that referenced this issue Jul 24, 2024

pythonGH-118093: Remove invalidated executors from side exits (python…

3f1e5bc

…GH-121885)

This was referenced Jul 25, 2024

GH-118093: Better handling of short and mid-loop traces #122252

Merged

GH-118093: Add tier two support for BINARY_OP_INPLACE_ADD_UNICODE #122253

Merged

GH-118093: Add tier two support for LOAD_ATTR_PROPERTY #122283

Merged

brandtbucher added a commit that referenced this issue Jul 25, 2024

GH-118093: Add tier two support for LOAD_ATTR_PROPERTY (GH-122283)

5f60011

brandtbucher added a commit that referenced this issue Jul 25, 2024

GH-118093: Add tier two support for BINARY_OP_INPLACE_ADD_UNICODE (GH…

d9efa45

…-122253)

brandtbucher added a commit that referenced this issue Jul 29, 2024

GH-118093: Improve handling of short and mid-loop traces (GH-122252)

7797182

bedevere-app bot mentioned this issue Aug 8, 2024

GH-118093: Handle some polymorphism before requiring progress in tier two #122843

Merged

brandtbucher added a commit that referenced this issue Aug 12, 2024

GH-118093: Handle some polymorphism before requiring progress in tier…

9621a7d

… two (GH-122843)

This was referenced Aug 14, 2024

GH-118093: Turn some DEOPT_IFs into EXIT_IFs #122998

Merged

GH-118093: Specialize CALL_KW #123006

Merged

brandtbucher added a commit that referenced this issue Aug 14, 2024

GH-118093: Turn some DEOPT_IFs into EXIT_IFs (GH-122998)

f84754b

bedevere-app bot mentioned this issue Aug 15, 2024

GH-118093: Specialize CALL_FUNCTION_EX #123034

Draft

markshannon added a commit that referenced this issue Aug 16, 2024

GH-118093: Specialize CALL_KW (GH-123006)

c13e7d9

bedevere-app bot mentioned this issue Aug 19, 2024

GH-118093: Make CALL_ALLOC_AND_ENTER_INIT suitable for tier 2. #123140

Merged

jeremyhylton pushed a commit to jeremyhylton/cpython that referenced this issue Aug 19, 2024

pythonGH-118093: Specialize CALL_KW (pythonGH-123006)

114fa0f

markshannon added a commit that referenced this issue Aug 20, 2024

GH-118093: Make CALL_ALLOC_AND_ENTER_INIT suitable for tier 2. (GH-…

bb1d303

…123140) * Convert CALL_ALLOC_AND_ENTER_INIT to micro-ops such that tier 2 supports it * Allow inexact arguments for CALL_ALLOC_AND_ENTER_INIT.

bedevere-app bot mentioned this issue Aug 22, 2024

GH-118093: Specialize calls to non-vectorcall classes as CALL_NON_PY_GENERAL #123212

Merged

blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024

pythonGH-118093: Handle some polymorphism before requiring progress i…

2272a67

…n tier two (pythonGH-122843)

blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024

pythonGH-118093: Turn some DEOPT_IFs into EXIT_IFs (pythonGH-122998)

00d9f4e

blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024

pythonGH-118093: Specialize CALL_KW (pythonGH-123006)

eea2ba3

markshannon pushed a commit that referenced this issue Aug 22, 2024

GH-118093: Specialize calls to non-vectorcall classes as `CALL_NON_PY…

427b106

…_GENERAL` (GH-123212) Specialize classes without vectorcall as CALL_NON_PY_GENERAL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase fraction of code executed by tier 2. #118093

Increase fraction of code executed by tier 2. #118093

markshannon commented Apr 19, 2024 •

edited by bedevere-app bot

Loading

brandtbucher commented Jul 17, 2024 •

edited

Loading

brandtbucher commented Jul 17, 2024

Increase fraction of code executed by tier 2. #118093

Increase fraction of code executed by tier 2. #118093

Comments

markshannon commented Apr 19, 2024 • edited by bedevere-app bot Loading

Linked PRs

brandtbucher commented Jul 17, 2024 • edited Loading

brandtbucher commented Jul 17, 2024

markshannon commented Apr 19, 2024 •

edited by bedevere-app bot

Loading

brandtbucher commented Jul 17, 2024 •

edited

Loading