bpo-43452: Microoptimizations to PyType_Lookup #24804

DinoV · 2021-03-09T19:22:27Z

The common case going through _PyType_Lookup is to have a cache hit. There's some small tweaks which can make this a little cheaper:

the name field identity is used for a cache hit, and is kept alive by the cache. So there's no need to read the hash code of the name - instead the address can be used as the hash.
There's no need to check if the name is cachable on the lookup either, it probably is, and if it is, it'll be in the cache.
If we clear the version tag when invalidating a type then we don't actually need to check for a valid version tag bit.

https://bugs.python.org/issue43452

markshannon · 2021-03-10T10:46:21Z

Objects/typeobject.c

@@ -32,8 +32,7 @@ class object "PyObject *" "&PyBaseObject_Type"
         & ((1 << MCACHE_SIZE_EXP) - 1))

 #define MCACHE_HASH_METHOD(type, name)                                  \
-        MCACHE_HASH((type)->tp_version_tag,                     \
-                    ((PyASCIIObject *)(name))->hash)
+    MCACHE_HASH((type)->tp_version_tag, ((Py_ssize_t)(name)) >> 3)


I suspect that id(s) >> 3 is a much worse hash function than hash(s).
Is the cost of an extra memory read lower than the cost of the additional collisions?

I think this cache is so wildly successful that it doesn't actually matter a whole lot one way or another. It does do a little bit worse on collisions, but it doesn't seem to impact the hit rate much. This is running python -m test.regrtest and logging the stats at the end of the run:

>>3:
{'total_slots': 4096, 'occupied_slots': 4096, 'num_hits': 20931459, 'num_misses': 4292, 'num_collisions': 127068, 'num_uncacheable': 1, 'num_mro_steps': 1170327}

((PyASCIIObject *)(name))->hash:
{'total_slots': 4096, 'occupied_slots': 4096, 'num_hits': 20922787, 'num_misses': 4304, 'num_collisions': 107251, 'num_uncacheable': 1, 'num_mro_steps': 1133300}

method_cache_hits is bumped in the hit case
method_cache_collisions is bumped when adding a new entry and replacing the old one
method_cache_misses is bumped when adding a new entry and there isn't an existing one

Do you have any evidence as to which is faster?
If not, then the status quo wins.

My intuition is that the better spread from using the hash is likely to have better worse case performance, but in general there would be no measurable difference.

I think per @methane running the benchmark suite it does look better with the object hash. I've also setup a small micro-benchmark and put it in https://bugs.python.org/issue43452 and it shows a slight win there as well. I had to write it in C to get any clear signal one way or another though, just doing the tight loop in Python seemed to have too much noise.

markshannon · 2021-03-10T10:51:55Z

I like this in general.

I'm not convinced by (1) as it may lead to more collisions. I'd be happy to be proved wrong, if you have data.
(2) and (3) are definitely improvements.

vstinner · 2021-03-10T11:07:51Z

Can you try to write a micro-benchmark to show the speedup?

vstinner · 2021-03-10T11:08:01Z

cc @pablogsal @methane @serhiy-storchaka

DinoV · 2021-03-10T21:23:09Z

Can you try to write a micro-benchmark to show the speedup?

I'll make an attempt at it and report back, but I think it'll be hard to get enough signal even w/ a microbenchmark. When I originally measured this against re-played Instagram traffic we saw a reduction in _PyType_Lookup from 1.7% of CPU usage to 1.6% of CPU usage. So it is a pretty small win overall, and may be dwarfed by all the other interpreter overhead in a benchmark.

pablogsal · 2021-03-10T22:39:20Z

This is great work.

I agree with @markshannon, I think that I am sold with (2) and (3) but I am quite not convinced by (1), but I think it should be fine because I expect quite a small amount of cache misses.

DinoV · 2021-03-10T23:16:10Z

This is great work.

I agree with @markshannon, I think that I am sold with (2) and (3) but I am quite not convinced by (1), but I think it should be fine because I expect quite a small amount of cache misses.

One additional tweak here could be to do >> 2 on 32-bit platforms, although I'm not sure how much that'll really matter either.

methane · 2021-03-11T04:05:25Z

One additional tweak here could be to do >> 2 on 32-bit platforms, although I'm not sure how much that'll really matter either.

Objects are aligned by 16byte on 64bit and 8byte on 32bit platforms. So >>4 on 64bit and >>3 on 32bit would be better.

methane · 2021-03-11T06:50:05Z

Objects are aligned by 16byte on 64bit and 8byte on 32bit platforms. So >>4 on 64bit and >>3 on 32bit would be better.

Since unicode objects is larger than minimum alignment, more than 4bits are not random.

>>> sys.getsizeof("a")
50
>>> (50+15)//16
4

But lower bits of tp_version_tag are more random than higher bits. So I think no need to throw away all non-random lower bits in unicode pointer.

methane · 2021-03-11T08:37:22Z

But lower bits of tp_version_tag are more random than higher bits. So I think no need to throw away all non-random lower bits in unicode pointer.

I forgot about we use only 12bits for hash. All bits are important. I posted benchmark result of >>3 in bpo. I will benchmark >>4 and >>5 too.

vstinner · 2021-03-11T11:51:39Z

About >>4 shift, see also the _Py_HashPointer() function:

    /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
       excessive hash collisions for dicts and sets */
    y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));

vstinner · 2021-03-11T11:52:11Z

Objects are aligned by 16byte on 64bit and 8byte on 32bit platforms. So >>4 on 64bit and >>3 on 32bit would be better.

Should we update _Py_HashPointer()?

methane · 2021-03-11T13:10:14Z

I forgot about we use only 12bits for hash. All bits are important. I posted benchmark result of >>3 in bpo. I will benchmark >>4 and >>5 too.

Performance difference between >>3, >>4, and >>5 are not significant.
All of them are faster than status quo (result).

methane · 2021-03-19T07:29:23Z

I think this change is significant enough to have a news entry.

gvanrossum

LGTM. But a news entry would be nice.

DinoV · 2021-03-20T01:18:22Z

Can you try to write a micro-benchmark to show the speedup?

@vstinner I've added a small C micro-benchmark in https://bugs.python.org/issue43452

vstinner · 2021-03-20T14:54:03Z

obj hash: 0m6.222s
str hash: 0m6.327s
baseline: 0m6.784s

Thanks for proving that the optimization actually makes the code faster :-)

gvanrossum

Okay, let’s land this. Thanks Dino for going the extra mile!

pablogsal · 2021-03-20T19:12:18Z

Thanks a lot @DinoV for the fantastic work!

* master: (129 commits) bpo-43452: Micro-optimizations to PyType_Lookup (pythonGH-24804) bpo-43517: Fix false positive in detection of circular imports (python#24895) bpo-43494: Make some minor changes to lnotab notes (pythonGH-24861) Mention that code.co_lnotab is deprecated in what's new for 3.10. (python#24902) bpo-43244: Remove symtable.h header file (pythonGH-24910) bpo-43466: Add --with-openssl-rpath configure option (pythonGH-24820) Fix a typo in c-analyzer (pythonGH-24468) bpo-41561: Add workaround for Ubuntu's custom security level (pythonGH-24915) bpo-43521: Allow ast.unparse with empty sets and NaN (pythonGH-24897) bpo-43244: Remove the PyAST_Validate() function (pythonGH-24911) bpo-43541: Fix PyEval_EvalCodeEx() regression (pythonGH-24918) bpo-43244: Fix test_peg_generators on Windows (pythonGH-24913) bpo-39342: Expose X509_V_FLAG_ALLOW_PROXY_CERTS in ssl module (pythonGH-18011) bpo-43244: Fix test_peg_generator for PyAST_Validate() (pythonGH-24912) bpo-42128: Add 'missing :' syntax error message to match statements (pythonGH-24733) bpo-43244: Add pycore_ast.h header file (pythonGH-24908) bpo-43244: Rename pycore_ast.h to pycore_ast_state.h (pythonGH-24907) Remove unnecessary imports in the grammar parser (pythonGH-24904) bpo-35883: Py_DecodeLocale() escapes invalid Unicode characters (pythonGH-24843) Add PEP 626 to what's new in 3.10. (python#24892) ...

Microoptimizations to PyType_Lookup

8dcfba5

DinoV requested a review from markshannon as a code owner March 9, 2021 19:22

the-knights-who-say-ni added the CLA signed label Mar 9, 2021

bedevere-bot added the awaiting core review label Mar 9, 2021

DinoV requested a review from gvanrossum March 9, 2021 20:09

DinoV added the skip news label Mar 9, 2021

markshannon reviewed Mar 10, 2021

View reviewed changes

methane approved these changes Mar 19, 2021

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Mar 19, 2021

methane removed the skip news label Mar 19, 2021

gvanrossum approved these changes Mar 19, 2021

View reviewed changes

blurb-it bot and others added 2 commits March 20, 2021 01:21

📜🤖 Added by blurb_it.

ac2aa1e

Update 2021-03-20-01-21-37.bpo-43452.tDVJkc.rst

f01c39e

gvanrossum approved these changes Mar 20, 2021

View reviewed changes

pablogsal merged commit ee48c7d into python:master Mar 20, 2021

bedevere-bot removed the awaiting merge label Mar 20, 2021

DinoV deleted the pytype_lookup branch May 31, 2024 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-43452: Microoptimizations to PyType_Lookup #24804

bpo-43452: Microoptimizations to PyType_Lookup #24804

DinoV commented Mar 9, 2021 •

edited by bedevere-bot

Loading

markshannon Mar 10, 2021

DinoV Mar 10, 2021

markshannon Mar 11, 2021

DinoV Mar 20, 2021

markshannon commented Mar 10, 2021

vstinner commented Mar 10, 2021

vstinner commented Mar 10, 2021

DinoV commented Mar 10, 2021

pablogsal commented Mar 10, 2021

DinoV commented Mar 10, 2021

methane commented Mar 11, 2021

methane commented Mar 11, 2021

methane commented Mar 11, 2021

vstinner commented Mar 11, 2021

vstinner commented Mar 11, 2021

methane commented Mar 11, 2021

methane commented Mar 19, 2021

gvanrossum left a comment

DinoV commented Mar 20, 2021

vstinner commented Mar 20, 2021

gvanrossum left a comment

pablogsal commented Mar 20, 2021

bpo-43452: Microoptimizations to PyType_Lookup #24804

bpo-43452: Microoptimizations to PyType_Lookup #24804

Conversation

DinoV commented Mar 9, 2021 • edited by bedevere-bot Loading

markshannon Mar 10, 2021

Choose a reason for hiding this comment

DinoV Mar 10, 2021

Choose a reason for hiding this comment

markshannon Mar 11, 2021

Choose a reason for hiding this comment

DinoV Mar 20, 2021

Choose a reason for hiding this comment

markshannon commented Mar 10, 2021

vstinner commented Mar 10, 2021

vstinner commented Mar 10, 2021

DinoV commented Mar 10, 2021

pablogsal commented Mar 10, 2021

DinoV commented Mar 10, 2021

methane commented Mar 11, 2021

methane commented Mar 11, 2021

methane commented Mar 11, 2021

vstinner commented Mar 11, 2021

vstinner commented Mar 11, 2021

methane commented Mar 11, 2021

methane commented Mar 19, 2021

gvanrossum left a comment

Choose a reason for hiding this comment

DinoV commented Mar 20, 2021

vstinner commented Mar 20, 2021

gvanrossum left a comment

Choose a reason for hiding this comment

pablogsal commented Mar 20, 2021

DinoV commented Mar 9, 2021 •

edited by bedevere-bot

Loading