Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-121795: Improve performance of set membership testing from set arguments #121796

Merged
merged 10 commits into from
Jul 22, 2024

Conversation

HarryLHW
Copy link
Contributor

@HarryLHW HarryLHW commented Jul 15, 2024

Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some benchmarks?

Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
@HarryLHW
Copy link
Contributor Author

Can we have some benchmarks?

Yes :)

Benchmark (on an M2 Macbook Air):

Script:

import timeit

for n in (1, 10, 100, 1000, 10000):
    setup = f'''
a = set(range({n}))
b = {{frozenset(a)}}
'''
    print(timeit.timeit("a in b", setup), f'N = {n}')

main branch result:

0.3688333749996673 N = 1
0.48004929200033075 N = 10
1.781020708000142 N = 100
11.256919416999153 N = 1000
152.03752795800028 N = 10000

my branch result:

0.30569541699878755 N = 1
0.3746081659992342 N = 10
1.2535620420021587 N = 100
7.2957435000025725 N = 1000
86.63279329200304 N = 10000

There will be no performance regression on the normal case.

Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe like this? (for the rest, I'll leave it to Raymond)

Objects/setobject.c Outdated Show resolved Hide resolved
Copy link
Contributor

@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat performance improvement, that contrary to most other such attempts, does not add to the complexity of the code; nice. AFAICS, this is good to go. I'll leave the landing of the PR to the code owner.

Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
@HarryLHW
Copy link
Contributor Author

This is a neat performance improvement, that contrary to most other such attempts, does not add to the complexity of the code; nice. AFAICS, this is good to go. I'll leave the landing of the PR to the code owner.

Thank you so much for your review. I have made changes to resolve the issues.

@rhettinger
Copy link
Contributor

We used to do something like this, but it caused bugs and had to be removed. See https://bugs.python.org/issue8757 and checkin 4d45c10 . IIRC there was also a reason that the set had to be made temporarily immutable during the lookup. I don't remember all the details now. Perhaps @serhiy-storchaka does.

The original set_swap_bodies variants were very fast, so it was a bummer to have to use a set copy instead. The good news is that set copies are now much faster than they were. Also, we realized that the implicit frozenset conversion promised in the docs was an almost never used feature.

@serhiy-storchaka
Copy link
Member

It was before I started contributing to CPython, so I have no memories about this case.

The approach of this PR is better than the code used before bpo-8757. It leaves the original set key unmodified, so other threads are not affected if they only read it. The original reproducer should pass this test.

But what if the set key is modified in other thread? I think this is safe if set comparison is thread-safe (and AFAIK it is).

There is a new race condition: the set key can be changed during calculating its hash (currently creating a frozenset from a set is atomic, or it can be made atomic). But I do not think this is bad. We should not guarantee the correct result in such case.

I think this change is safe. LGTM.

@rhettinger rhettinger merged commit 2408a8a into python:main Jul 22, 2024
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants