-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster hashing for Pauli operators. #13355
Comments
Fwiw, I feel like the most major story here is about how slow the label computation is. It really shouldn't be expensive to compute the Pauli label, even from Python space. I threw this together quickly just now, and it's somewhere in the region of 33x faster than the current implementation of CHARS = np.array([ord("I"), ord("X"), ord("Z"), ord("Y")], dtype=np.uint8)
def label(pauli):
index = (pauli.z << 1)
index += pauli.x
ascii_label = CHARS[index[::-1]].data.tobytes()
phase_label = ("", "-i", "-", "i")[(ascii_label.count(b'Y') - pauli._phase.item()) % 4]
return phase_label + ascii_label.decode("ascii") (I couldn't entirely compare to your function because of the ragged array thing.) Would us switching the |
The numbers I get for the second encoding are consistently smaller. That said, if you think |
A faster |
For point 2: I'm really just being really conservative about the hashing thing with integers here - in the end, I suspect that the collision-resolution in To explain a bit more: you're absolutely right that the hash of an integer is (nearly) itself in Python (actually, for negative numbers and once the numbers get beyond If we construct some dictionaries with integers that are engineered to share the majority of their bits (but still all be technically unique integrs), we can force the hashmap conflict resolution to work in overdrive: num = (2**20) * 2 // 3 - 2
size = 200
shifts = list(range(129))
lookup = []
for shift in shifts:
keys = list(
range(
(1 << size) - ((num // 2) << shift),
(1 << size) + ((num // 2) << shift),
1 << shift,
)
)
hashmap = dict.fromkeys(keys)
loop = %timeit -o -n 10 for key in keys: pass
contains = %timeit -o -n 10 for key in keys: key in hashmap
lookup.append(contains.best - loop.best) (Beware that'll take a while to run.) It's constructing dictionaries with integer keys that are evenly spaced around You can see in the output that different numbers of leading zeros cause wildly different behaviour in the dict lookup, at various points in the structured hash keys, even though all the returns from I won't pretend to be able to explain everything about the graph, and I already spent way too long investigating haha, but the point I was trying to make is that: using integers with a structure to them as hash keys can be dangerous for performance. A large dict of large Paulis is likely to have structure to them, which makes me nervous about using an symplectic bit-based key for their hashing in Python, since there's only limited mixing in of the higher bits. The difference between the biggest peaks and the biggest troughs in my graph is about a 20x performance hit. Python's hashing algorithm for strings is such that a small change in a string has a cascading effect, so for Paulis, which are likely to have low Hamming distances between them in practice (most Paulis have a low weight), the strings just feel slightly more reliable to me. |
What should we add?
Currently the hashing of a Pauli operator is computed as
hash(self.to_label())
. It is basically asking for the label of the Pauli operator which is an expensive computation to run. Instead building it directly fromself.x
,self.z
,self.phase
seems more reasonable.I did a quick test where I do the hashing with
int.from_bytes(np.packbits([self.x, self.z]+[self.phase]))
and it was about 50 times faster.The text was updated successfully, but these errors were encountered: