Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of set membership testing from set arguments #121795

Closed
HarryLHW opened this issue Jul 15, 2024 · 3 comments
Closed

Improve performance of set membership testing from set arguments #121795

HarryLHW opened this issue Jul 15, 2024 · 3 comments
Labels
performance Performance or resource usage type-feature A feature request or enhancement

Comments

@HarryLHW
Copy link
Contributor

HarryLHW commented Jul 15, 2024

Feature or enhancement

Proposal:

For set s and set elem, the operations elem in s, s.remove(elem), and s.discard(elem) are equivalent to frozenset(elem) in s, s.remove(frozenset(elem)), and s.discard(frozenset(elem)), respectively. This feature is documented in Built-in Types:

Note, the elem argument to the contains(), remove(), and discard() methods may be a set. To support searching for an equivalent frozenset, a temporary one is created from elem.

The implementation can be improved by calculating the hash value directly from the set object (as if it is a frozenset), instead of copying it into a new frozenset before calculating the hash value. The hash function for set is internal use only, and set is still unhashable.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@HarryLHW HarryLHW added the type-feature A feature request or enhancement label Jul 15, 2024
@rhettinger rhettinger self-assigned this Jul 15, 2024
@picnixz picnixz added the performance Performance or resource usage label Jul 15, 2024
@HarryLHW
Copy link
Contributor Author

Benchmark (on an M2 Macbook Air):

Script:

import timeit

for n in (1, 10, 100, 1000, 10000):
    setup = f'''
a = set(range({n}))
b = {{frozenset(a)}}
'''
    print(timeit.timeit("a in b", setup), f'N = {n}')

main branch result:

0.3688333749996673 N = 1
0.48004929200033075 N = 10
1.781020708000142 N = 100
11.256919416999153 N = 1000
152.03752795800028 N = 10000

my branch result:

0.30569541699878755 N = 1
0.3746081659992342 N = 10
1.2535620420021587 N = 100
7.2957435000025725 N = 1000
86.63279329200304 N = 10000

There will be no performance regression on the normal case.

@rhettinger
Copy link
Contributor

Sorry, I haven't been feeling well. Please get another core dev to review this.

@rhettinger rhettinger removed their assignment Jul 23, 2024
@erlend-aasland
Copy link
Contributor

Thanks for the PR, @HarryLHW! I believe this can be closed, as the PR was recently merged. Good job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants