Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Current way of caching is unsafe #4038

Open
ksagiyam opened this issue Feb 13, 2025 · 3 comments
Open

BUG: Current way of caching is unsafe #4038

ksagiyam opened this issue Feb 13, 2025 · 3 comments
Labels

Comments

@ksagiyam
Copy link
Contributor

Describe the bug

It seems that we assume that hashing is perfect in some places. Specifically, here, we generate a sequence of 32 letters+numbers using md5 and directly use that as the disk/memory cache key:

key = _as_hexdigest(*k), func.__qualname__
, but this is unsafe; we could pull the wrong compiled kernel in case of hash collision.

@ksagiyam ksagiyam added the bug label Feb 13, 2025
@connorjward
Copy link
Contributor

I think that we have always assumed that hash collisions are so rare as to effectively never happen, and hence that this is safe "enough". For our disk caching we have to generate a hash because we need a unique file name.

@ksagiyam
Copy link
Contributor Author

The behaviour can not be stochastic.

For disk caching, we could for instance mimic what the standard hash table does by also storing the pickled representors:

.cache/pyop2/qw/ertyuiopasdfghjklzxcvbnm123456/0/a.pickle
.cache/pyop2/qw/ertyuiopasdfghjklzxcvbnm123456/0/a.so
.cache/pyop2/qw/ertyuiopasdfghjklzxcvbnm123456/1/a.pickle
.cache/pyop2/qw/ertyuiopasdfghjklzxcvbnm123456/1/a.so

with which, upon hash-matching, we check equality by comparing the pickled values.

@connorjward
Copy link
Contributor

The behaviour can not be stochastic.

For disk caching, we could for instance mimic what the standard hash table does by also storing the pickled representors:

.cache/pyop2/qw/ertyuiopasdfghjklzxcvbnm123456/0/a.pickle .cache/pyop2/qw/ertyuiopasdfghjklzxcvbnm123456/0/a.so .cache/pyop2/qw/ertyuiopasdfghjklzxcvbnm123456/1/a.pickle .cache/pyop2/qw/ertyuiopasdfghjklzxcvbnm123456/1/a.so

with which, upon hash-matching, we check equality by comparing the pickled values.

Yeah this would work. Alternatively we could just choose a "sufficiently" strong hashing algorithm like SHA-256: https://stackoverflow.com/questions/4014090/is-it-safe-to-ignore-the-possibility-of-sha-collisions-in-practice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants