Is `PREDICT()` worth it any more? #496

gvanrossum · 2022-11-15T21:48:10Z

The PREDICT() macro is already a no-op on compilers with computed GOTO. So it's only used on Windows (and a few esoteric platforms). Is it worth the bother? We should ideally benchmark this. In the past it's been claimed to have substantial benefits.

The text was updated successfully, but these errors were encountered:

brandtbucher · 2022-11-15T21:50:05Z

I've wondered the same thing. We could benchmark this on Linux pretty easily by disabling computed gotos and timing the results with and without PREDICT turned on. I imagine that would be very close to whatever the Windows speedup is (but without requiring a Windows benchmarking setup).

brandtbucher · 2022-11-22T00:08:41Z

Based on Linux builds on our benchmarking machine, disabling USE_COMPUTED_GOTOS makes things ~1% slower (I thought it would be more, but it could be that our new superinstructions do most of the heavy lifting when reducing this part of the dispatch overhead):

Slower (45):
- logging_format: 6.25 us +- 0.09 us -> 6.79 us +- 0.12 us: 1.09x slower
- logging_simple: 5.71 us +- 0.09 us -> 6.07 us +- 0.07 us: 1.06x slower
- genshi_xml: 46.1 ms +- 0.6 ms -> 48.8 ms +- 1.9 ms: 1.06x slower
- html5lib: 58.5 ms +- 2.6 ms -> 61.8 ms +- 2.7 ms: 1.06x slower
- deltablue: 3.19 ms +- 0.04 ms -> 3.37 ms +- 0.07 ms: 1.06x slower
- dulwich_log: 61.1 ms +- 0.4 ms -> 64.3 ms +- 0.8 ms: 1.05x slower
- scimark_lu: 104 ms +- 2 ms -> 110 ms +- 2 ms: 1.05x slower
- regex_compile: 128 ms +- 1 ms -> 135 ms +- 2 ms: 1.05x slower
- raytrace: 278 ms +- 4 ms -> 292 ms +- 9 ms: 1.05x slower
- deepcopy: 324 us +- 2 us -> 338 us +- 3 us: 1.05x slower
- pidigits: 189 ms +- 0 ms -> 198 ms +- 0 ms: 1.04x slower
- tornado_http: 92.9 ms +- 1.4 ms -> 97.1 ms +- 1.4 ms: 1.04x slower
- sympy_str: 282 ms +- 3 ms -> 294 ms +- 3 ms: 1.04x slower
- sqlglot_normalize: 105 ms +- 1 ms -> 109 ms +- 1 ms: 1.04x slower
- sympy_expand: 459 ms +- 5 ms -> 479 ms +- 3 ms: 1.04x slower
- unpickle: 12.9 us +- 0.1 us -> 13.4 us +- 1.4 us: 1.04x slower
- sympy_sum: 164 ms +- 2 ms -> 169 ms +- 2 ms: 1.03x slower
- gunicorn: 1.07 ms +- 0.01 ms -> 1.11 ms +- 0.01 ms: 1.03x slower
- sqlglot_optimize: 50.4 ms +- 0.2 ms -> 52.1 ms +- 0.2 ms: 1.03x slower
- sqlglot_parse: 1.32 ms +- 0.01 ms -> 1.36 ms +- 0.01 ms: 1.03x slower
- django_template: 32.3 ms +- 0.3 ms -> 33.3 ms +- 0.4 ms: 1.03x slower
- unpickle_pure_python: 201 us +- 2 us -> 207 us +- 2 us: 1.03x slower
- sqlglot_transpile: 1.61 ms +- 0.02 ms -> 1.66 ms +- 0.02 ms: 1.03x slower
- pprint_safe_repr: 672 ms +- 6 ms -> 691 ms +- 7 ms: 1.03x slower
- pprint_pformat: 1.38 sec +- 0.01 sec -> 1.41 sec +- 0.01 sec: 1.03x slower
- aiohttp: 994 us +- 9 us -> 1.02 ms +- 0.01 ms: 1.03x slower
- sympy_integrate: 20.4 ms +- 0.1 ms -> 20.9 ms +- 0.2 ms: 1.02x slower
- coroutines: 25.0 ms +- 0.2 ms -> 25.5 ms +- 0.5 ms: 1.02x slower
- scimark_sor: 103 ms +- 1 ms -> 105 ms +- 1 ms: 1.02x slower
- 2to3: 246 ms +- 1 ms -> 250 ms +- 1 ms: 1.02x slower
- chaos: 66.9 ms +- 0.8 ms -> 68.0 ms +- 0.6 ms: 1.02x slower
- scimark_fft: 309 ms +- 3 ms -> 314 ms +- 3 ms: 1.01x slower
- pickle_pure_python: 284 us +- 4 us -> 288 us +- 2 us: 1.01x slower
- deepcopy_memo: 33.7 us +- 0.4 us -> 34.2 us +- 0.3 us: 1.01x slower
- float: 73.0 ms +- 1.0 ms -> 74.0 ms +- 0.9 ms: 1.01x slower
- deepcopy_reduce: 2.90 us +- 0.04 us -> 2.94 us +- 0.04 us: 1.01x slower
- json_loads: 24.1 us +- 0.9 us -> 24.3 us +- 0.3 us: 1.01x slower
- xml_etree_process: 53.0 ms +- 0.7 ms -> 53.6 ms +- 0.4 ms: 1.01x slower
- unpickle_list: 4.92 us +- 0.10 us -> 4.97 us +- 0.04 us: 1.01x slower
- xml_etree_generate: 76.8 ms +- 0.7 ms -> 77.6 ms +- 0.7 ms: 1.01x slower
- xml_etree_iterparse: 102 ms +- 2 ms -> 103 ms +- 2 ms: 1.01x slower
- python_startup_no_site: 6.23 ms +- 0.01 ms -> 6.28 ms +- 0.01 ms: 1.01x slower
- python_startup: 8.56 ms +- 0.01 ms -> 8.62 ms +- 0.01 ms: 1.01x slower
- coverage: 98.2 ms +- 1.3 ms -> 98.9 ms +- 1.3 ms: 1.01x slower
- hexiom: 6.13 ms +- 0.03 ms -> 6.16 ms +- 0.05 ms: 1.01x slower

Faster (22):
- nbody: 94.7 ms +- 2.3 ms -> 88.6 ms +- 1.5 ms: 1.07x faster
- pycparser: 1.13 sec +- 0.02 sec -> 1.09 sec +- 0.02 sec: 1.04x faster
- pathlib: 18.0 ms +- 0.3 ms -> 17.4 ms +- 0.3 ms: 1.03x faster
- mdp: 2.65 sec +- 0.02 sec -> 2.57 sec +- 0.01 sec: 1.03x faster
- regex_v8: 22.2 ms +- 0.3 ms -> 21.6 ms +- 0.3 ms: 1.03x faster
- pickle_list: 4.16 us +- 0.06 us -> 4.06 us +- 0.04 us: 1.03x faster
- pickle: 10.3 us +- 0.1 us -> 10.0 us +- 0.1 us: 1.03x faster
- generators: 76.4 ms +- 0.7 ms -> 74.6 ms +- 0.5 ms: 1.02x faster
- genshi_text: 21.1 ms +- 0.3 ms -> 20.7 ms +- 0.3 ms: 1.02x faster
- pyflate: 402 ms +- 7 ms -> 393 ms +- 4 ms: 1.02x faster
- chameleon: 6.50 ms +- 0.07 ms -> 6.38 ms +- 0.07 ms: 1.02x faster
- scimark_sparse_mat_mult: 4.12 ms +- 0.08 ms -> 4.04 ms +- 0.05 ms: 1.02x faster
- unpack_sequence: 48.7 ns +- 0.6 ns -> 48.0 ns +- 0.8 ns: 1.01x faster
- fannkuch: 380 ms +- 2 ms -> 375 ms +- 3 ms: 1.01x faster
- mako: 9.64 ms +- 0.07 ms -> 9.53 ms +- 0.06 ms: 1.01x faster
- logging_silent: 91.3 ns +- 0.9 ns -> 90.3 ns +- 1.5 ns: 1.01x faster
- richards: 41.7 ms +- 0.6 ms -> 41.3 ms +- 1.0 ms: 1.01x faster
- go: 134 ms +- 1 ms -> 132 ms +- 1 ms: 1.01x faster
- json_dumps: 9.48 ms +- 0.14 ms -> 9.40 ms +- 0.10 ms: 1.01x faster
- regex_effbot: 3.60 ms +- 0.02 ms -> 3.58 ms +- 0.02 ms: 1.01x faster
- scimark_monte_carlo: 65.6 ms +- 0.8 ms -> 65.3 ms +- 0.8 ms: 1.00x faster
- regex_dna: 203 ms +- 0 ms -> 202 ms +- 1 ms: 1.00x faster

Benchmark hidden because not significant (15): async_tree_none, async_tree_cpu_io_mixed, async_tree_io, async_tree_memoization, crypto_pyaes, json, meteor_contest, mypy, nqueens, pickle_dict, spectral_norm, sqlite_synth, telco, thrift, xml_etree_parse

Geometric mean: 1.01x slower

When USE_COMPUTED_GOTOS is turned off, disabling PREDICT doesn't appear to have any effect:

Slower (28):
- generators: 74.6 ms +- 0.5 ms -> 84.4 ms +- 0.3 ms: 1.13x slower
- scimark_sparse_mat_mult: 4.04 ms +- 0.05 ms -> 4.27 ms +- 0.06 ms: 1.06x slower
- mdp: 2.57 sec +- 0.01 sec -> 2.68 sec +- 0.02 sec: 1.04x slower
- telco: 6.30 ms +- 0.15 ms -> 6.54 ms +- 0.18 ms: 1.04x slower
- pickle_list: 4.06 us +- 0.04 us -> 4.19 us +- 0.06 us: 1.03x slower
- float: 74.0 ms +- 0.9 ms -> 76.3 ms +- 1.1 ms: 1.03x slower
- nbody: 88.6 ms +- 1.5 ms -> 91.3 ms +- 2.8 ms: 1.03x slower
- json: 4.70 ms +- 0.11 ms -> 4.84 ms +- 0.17 ms: 1.03x slower
- pickle: 10.0 us +- 0.1 us -> 10.3 us +- 0.1 us: 1.02x slower
- json_dumps: 9.40 ms +- 0.10 ms -> 9.56 ms +- 0.14 ms: 1.02x slower
- pickle_dict: 30.8 us +- 0.1 us -> 31.2 us +- 0.1 us: 1.01x slower
- hexiom: 6.16 ms +- 0.05 ms -> 6.24 ms +- 0.04 ms: 1.01x slower
- pathlib: 17.4 ms +- 0.3 ms -> 17.6 ms +- 0.3 ms: 1.01x slower
- deepcopy_memo: 34.2 us +- 0.3 us -> 34.5 us +- 0.6 us: 1.01x slower
- go: 132 ms +- 1 ms -> 134 ms +- 1 ms: 1.01x slower
- sqlglot_transpile: 1.66 ms +- 0.02 ms -> 1.67 ms +- 0.02 ms: 1.01x slower
- sqlglot_parse: 1.36 ms +- 0.01 ms -> 1.37 ms +- 0.02 ms: 1.01x slower
- json_loads: 24.3 us +- 0.3 us -> 24.6 us +- 0.2 us: 1.01x slower
- pyflate: 393 ms +- 4 ms -> 396 ms +- 4 ms: 1.01x slower
- sympy_expand: 479 ms +- 3 ms -> 482 ms +- 5 ms: 1.01x slower
- sqlite_synth: 2.60 us +- 0.04 us -> 2.62 us +- 0.04 us: 1.01x slower
- mako: 9.53 ms +- 0.06 ms -> 9.57 ms +- 0.04 ms: 1.00x slower
- nqueens: 81.9 ms +- 0.8 ms -> 82.3 ms +- 0.7 ms: 1.00x slower
- regex_v8: 21.6 ms +- 0.3 ms -> 21.7 ms +- 0.2 ms: 1.00x slower
- unpickle_pure_python: 207 us +- 2 us -> 208 us +- 2 us: 1.00x slower
- 2to3: 250 ms +- 1 ms -> 251 ms +- 1 ms: 1.00x slower
- sqlglot_optimize: 52.1 ms +- 0.2 ms -> 52.2 ms +- 0.3 ms: 1.00x slower
- regex_dna: 202 ms +- 1 ms -> 202 ms +- 1 ms: 1.00x slower

Faster (23):
- regex_effbot: 3.58 ms +- 0.02 ms -> 3.32 ms +- 0.02 ms: 1.08x faster
- unpack_sequence: 48.0 ns +- 0.8 ns -> 44.9 ns +- 0.6 ns: 1.07x faster
- pidigits: 198 ms +- 0 ms -> 189 ms +- 0 ms: 1.04x faster
- xml_etree_process: 53.6 ms +- 0.4 ms -> 52.8 ms +- 0.4 ms: 1.01x faster
- chaos: 68.0 ms +- 0.6 ms -> 67.1 ms +- 0.7 ms: 1.01x faster
- pprint_pformat: 1.41 sec +- 0.01 sec -> 1.40 sec +- 0.01 sec: 1.01x faster
- django_template: 33.3 ms +- 0.4 ms -> 32.9 ms +- 0.3 ms: 1.01x faster
- pycparser: 1.09 sec +- 0.02 sec -> 1.07 sec +- 0.02 sec: 1.01x faster
- thrift: 755 us +- 27 us -> 746 us +- 9 us: 1.01x faster
- scimark_sor: 105 ms +- 1 ms -> 104 ms +- 1 ms: 1.01x faster
- logging_format: 6.79 us +- 0.12 us -> 6.72 us +- 0.11 us: 1.01x faster
- scimark_fft: 314 ms +- 3 ms -> 311 ms +- 3 ms: 1.01x faster
- unpickle_list: 4.97 us +- 0.04 us -> 4.92 us +- 0.12 us: 1.01x faster
- logging_simple: 6.07 us +- 0.07 us -> 6.01 us +- 0.10 us: 1.01x faster
- crypto_pyaes: 75.1 ms +- 1.3 ms -> 74.5 ms +- 0.8 ms: 1.01x faster
- pickle_pure_python: 288 us +- 2 us -> 285 us +- 4 us: 1.01x faster
- pprint_safe_repr: 691 ms +- 7 ms -> 686 ms +- 5 ms: 1.01x faster
- sympy_str: 294 ms +- 3 ms -> 292 ms +- 2 ms: 1.01x faster
- python_startup_no_site: 6.28 ms +- 0.01 ms -> 6.25 ms +- 0.01 ms: 1.00x faster
- aiohttp: 1.02 ms +- 0.01 ms -> 1.01 ms +- 0.01 ms: 1.00x faster
- sqlglot_normalize: 109 ms +- 1 ms -> 109 ms +- 1 ms: 1.00x faster
- python_startup: 8.62 ms +- 0.01 ms -> 8.59 ms +- 0.01 ms: 1.00x faster
- deepcopy: 338 us +- 3 us -> 337 us +- 3 us: 1.00x faster

Benchmark hidden because not significant (31): async_tree_none, async_tree_cpu_io_mixed, async_tree_io, async_tree_memoization, chameleon, coroutines, coverage, deepcopy_reduce, deltablue, dulwich_log, fannkuch, genshi_text, genshi_xml, gunicorn, html5lib, logging_silent, meteor_contest, mypy, raytrace, regex_compile, richards, scimark_lu, scimark_monte_carlo, spectral_norm, sympy_integrate, sympy_sum, tornado_http, unpickle, xml_etree_parse, xml_etree_iterparse, xml_etree_generate

Geometric mean: 1.00x slower

So this could be worth exploring as a cleanup opportunity (probably after observing similar MSVC results first).

gvanrossum · 2022-11-22T00:47:53Z

Agreed, this looks somewhat promising. Let's wait until we have a reliable way to benchmark this on Windows. (Didn't I read somewhere that we need 2.5% perf difference to be sure a benchmark result isn't noise?)

ericsnowcurrently · 2022-11-22T17:40:29Z

(Didn't I read somewhere that we need 2.5% perf difference to be sure a benchmark result isn't noise?)

#480

gvanrossum · 2022-11-30T18:03:31Z

Another way to evaluate the effectiveness of PREDICT() without benchmarking: look at opcode counts and successors for opcodes that have a PREDICT() in them. We should be able to tell how often the PREDICT() is effective from that.

Separately, PREDICT() in opcodes that may be specialized aren't very effective unless the specialized versions also use PREDICT(). So we should either add PREDICT() to the specialized versions, or remove it from the base opcode.

gvanrossum · 2022-11-30T23:03:41Z

LIST_APPEND: not specialized; 97% successors are the predicted JUMP_BACKWARD
SET_ADD: not specialized; 100% successors are the predicted JUMP_BACKWARD
GET_ANEXT: not specialized; doesn't show up in successor count stats
GET_AWAITABLE: not specialized; doesn't show up in successor count stats
DICT_MERGE: not specialized; 100% successors are the predicted CALL_FUNCTION_EX
MAP_ADD: not specialized; only 8.6% successors are the predicted JUMP_BACKWARD (!)
MATCH_MAPPING: not specialized; doesn't show up in successor count stats
MATCH_SEQUENCE: not specialized; doesn't show up in successor count stats
GET_YIELD_FROM_ITER: not specialized; 100% successors are the predicted LOAD_CONST
BEFORE_ASYNC_WITH: not specialized; doesn't show up in successor count stats

Conclusion: no need to worry about specializations here; half of these opcodes don't show up in the stats (because they're not used in any of the benchmarks); four of the others are clearly effective; MAP_ADD seems questionable. Perhaps we can just drop the PREDICT() calls from the 5 that don't show up (being very specialized opcodes) and from MAP_ADD?

earonesty · 2023-01-11T17:44:35Z

windows is far from esoteric tho! seems like it's still worth it on windows, which is a critical o/s

markshannon · 2023-02-28T10:54:29Z

One thing we could do, is strip the PREDICT() macros from bytecodes.c and automatically insert the prediction logic using data from the stats.
The current, manually inserted predictions aren't very good, so it should be easy to do better.

E.g. according to the stats, BEFORE_WITH is followed by POP_TOP about 80% of the time, STORE_FAST__LOAD_FAST the remaining 20%, and everything else some negligible fraction.

We could generate this code at the end of BEFORE_WITH:

if (opcode == POP_TOP) goto target_POP_TOP;
if (opcode == STORE_FAST__LOAD_FAST) goto target_STORE_FAST__LOAD_FAST;
GOTO_DISPATCH();

(The if ... goto ... will need to wrapped in a macro, for portability)

gvanrossum · 2023-02-28T16:55:16Z

Since PREDICT() only matters on the MS compiler, maybe we could just manually insert more appropriate PREDICT() calls based on the stats? Where are the most recent stats?

markshannon · 2023-02-28T16:58:19Z

I think we should automate this.
https://github.com/faster-cpython/ideas/tree/main/stats look for the one from the most recent Sunday.

markshannon · 2023-08-18T12:33:57Z

The PREDICT macros are gone.

gvanrossum mentioned this issue Nov 28, 2022

GH-98831: Typed stack effects, and more instructions converted python/cpython#99764

Merged

brandtbucher mentioned this issue Jan 12, 2023

Investigate "switched-goto" for compilers without computed gotos #537

Open

gvanrossum mentioned this issue Feb 28, 2023

Interpreter generator should emit code closer to pure C python/cpython#102305

Closed

brandtbucher mentioned this issue May 18, 2023

Remove the PREDICT macros. python/cpython#104610

Closed

markshannon closed this as completed Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is `PREDICT()` worth it any more? #496

Is `PREDICT()` worth it any more? #496

gvanrossum commented Nov 15, 2022

brandtbucher commented Nov 15, 2022

brandtbucher commented Nov 22, 2022

gvanrossum commented Nov 22, 2022

ericsnowcurrently commented Nov 22, 2022

gvanrossum commented Nov 30, 2022 •

edited

Loading

gvanrossum commented Nov 30, 2022

earonesty commented Jan 11, 2023

markshannon commented Feb 28, 2023

gvanrossum commented Feb 28, 2023

markshannon commented Feb 28, 2023

markshannon commented Aug 18, 2023

Is PREDICT() worth it any more? #496

Is PREDICT() worth it any more? #496

Comments

gvanrossum commented Nov 15, 2022

brandtbucher commented Nov 15, 2022

brandtbucher commented Nov 22, 2022

gvanrossum commented Nov 22, 2022

ericsnowcurrently commented Nov 22, 2022

gvanrossum commented Nov 30, 2022 • edited Loading

gvanrossum commented Nov 30, 2022

earonesty commented Jan 11, 2023

markshannon commented Feb 28, 2023

gvanrossum commented Feb 28, 2023

markshannon commented Feb 28, 2023

markshannon commented Aug 18, 2023

Is `PREDICT()` worth it any more? #496

Is `PREDICT()` worth it any more? #496

gvanrossum commented Nov 30, 2022 •

edited

Loading