-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oxidize random clifford #12695
Oxidize random clifford #12695
Conversation
…rices by significantly cheaper in-place prepend operations
Pull Request Test Coverage Report for Build 9775885225Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. the code looks ok but is there a way to check the randomness or at least that assuming the same random choice, we get the same output as in the python code?
How much this code improves when you port it from python to rust? |
Pull Request Test Coverage Report for Build 10579201069Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
…s with >= 10 qubits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that it's faster in the parallel version 👍🏻 👍🏻 Just curious: did you run a profiling to see which parts of the Rust code are the slowest? 🙂
|
||
use crate::getenv_use_multiple_threads; | ||
|
||
const PARALLEL_THRESHOLD: usize = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to add a comment on this number? I assume it's a heuristic from when the parallel execution is actually faster? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code for parallel execution mimics the code in pauli_exp_val.rs
, though there the threshold is chosen to be 19
. Indeed, I have chosen 10
based on some local experiments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment explaining this, so we know it in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added an explicit comment in 7f443e0
perm[i] = inds[k]; | ||
inds.remove(k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this ever run into the problem that it's trying to access an entry that has been removed? Maybe it's obvious that it cannot but I don't quite see it directly 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this was not clear to me either, but this is both a direct port of the now removed python function, and the pseudocode in the referenced paper.
|
||
// Compute stabilizer table | ||
let zero = Array2::from_elem((num_qubits, num_qubits), false); | ||
let prod1 = binary_matmul_inner(gamma1.view(), delta1.view()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're looking for more improvements, we could have a custom multiplication that takes into account that the upper triangular part of delta is always false (or by adding a flag to the binary_matmul_inner
) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, indeed we probably could exploit this somehow. If you think it's worth it, we can think about this and treat this in a potential follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just a suggestion since you mentioned that the matrix multiplications are slow, but I'm fine not adding this at this point, as I'm not sure random_clifford
is a bottleneck -- so up to you 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the suggestion, but would rather not go down this rabbit hole for this PR. As they say, "premature optimization is the root of all evil" 🙂.
let mut gamma2: Array2<bool> = Array2::from_elem((num_qubits, num_qubits), false); | ||
for i in 0..num_qubits { | ||
gamma2[[i, i]] = rng.gen(); | ||
} | ||
fill_tril(gamma2.view_mut(), &mut rng, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea for improvement would be to have fill_tril
do the random diagonal, and avoid constructing the matrix just to rewrite almost all entries again 🙂 E.g.
fn get_tril(rng: ... , symmetric: bool, random_diag: bool) -> Array2<bool>
where random_diag=false
sets the diagonal to true
(for the delta
matrices) and random_diag=true
creates the random diagonal for the gamma
matrices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically speaking only half of the entries (when symmetric
is False). :) I will use the same excuse that I tried to keep the code as close as possible to the original. In any case I don't think this makes much runtime difference (compared to the matrix multiplication between table1
and table
).
Thanks for the review! To answer your question, I was manually instrumenting my code to print the times to execute various blocks until I saw that the single most expensive line is the binary multiplication of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @alexanderivrii
} | ||
|
||
/// Add symmetric random boolean value to off diagonal entries. | ||
fn fill_tril(mut mat: ArrayViewMut2<bool>, rng: &mut Pcg64Mcg, symmetric: bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps this function should be in linear utils? it's useful to generate a random symmetric boolean matrix, which can represent a CZ circuit. In this case, it should be called something like "random_boolean matrix".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should expose the two very special "random linear" functions required for this PR: one that generates random symmetric binary matrices with 0s on the diagonal, and another that generates upper-triangular random symmetric binary matrices with 1s on the diagonal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, but I still think that the name "fill_tril" could be changed to something more meaningful (it also doesn't appear in the paper)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But... this is the name from our Python code :). Ok, will try to think of better names for this and other functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @alexanderivrii .
If @Cryoris has no further comments, then we can merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for all the work! If we ever need to revisit I think we have some ideas in this PR to even optimize the code a bit more 🙂
Summary
Closes #12696.
This PR implements the function
random_clifford
in Rust. The Rust function is calledrandom_clifford_tableau
.This is a close port of the python function, with the exception that (1) optimizations for low-rank matrices were not yet included, (2) some of the numpy optimizations were replaced by the simpler approach presented in the appendix of the original paper.
UPDATE:
By far the main bottleneck of the process is the (already existing) Rust function
binary_matmul_inner
for multiplying binary matrices. With the older implementation of this function, the Rust code happens to be slower than the Python code, likely becausenumpy
does some clever optimizations.However, parallelizing
binary_matmul_inner
makes things better; finally, the Rust code is about 3x faster than Python code.Here are the detailed results for generating 50 random Cliffords for different numbers of qubits: