-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pauli_twirl_2q_gates function #13331
Conversation
This commit adds a new function twirl_circuit to apply Pauli twirling to a given circuit. This function only works with a fixed set of two qubit gates and is not a general solution. For performance this new function is written in rust and uses static lookup tables for the twirling sets around the fixed two qubit gates that are randomly sampled. There is an option to perform twirling multiple times and return a list of circuits instead of just doing it once. Ideally we'd do this in parallel, but we're currently blocked on the use of OnceCell in the PackedInstruction and the ParameterTable from parallelizing with CircuitData. We can further improve the performance of this new function by using a parallel iterator once Qiskit#13219 is resolved. The function is written in a way to make this simple in the future. Fixes Qiskit#13325 Co-authored-by: Paul Nation <nonhermitian@gmail.com>
One or more of the following people are relevant to this code:
|
I ran an asv comparison (since the benchmarks needed to change to show the improvement here it's not a direct comparison) and this is showing a improvement of the previous twirling benchmark written in python from taking 254±4ms to 6.38±0.3ms with the
The "Change" X is because asv detected the benchmark changed and the results may not be comparable. |
Pull Request Test Coverage Report for Build 11721761474Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a pretty neat speedup for something used a lot 🙂
crates/accelerate/src/twirling.rs
Outdated
if gate == twirled_gate { | ||
let qubits: Vec<Qubit> = out_circ.get_qargs(inst.qubits).to_vec(); | ||
let (twirl, twirl_phase) = twirl_set.choose(rng).unwrap(); | ||
out_circ.push_standard_gate(twirl[0], &[], &[qubits[0]])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a performance benefit to omitting identity gates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it wouldn't make much of a difference for the performance of this function. The two ways to do it would be to condition the push
call on the value of twirl[index]
which adds a bunch of branching logic around each push call or somehow tweak the static twirling sets which would be tricky because they work well because everything is fixed sizes. There is a potential benefit downstream of this function because we've potentially used less gates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On average, 25% of the gates we'd be adding to the circuit will be identities, and that's a fair amount of data to reduce by not outputting it, which I'd expect to have a measurable impact on the runtime of this function. I wouldn't imagine that the branching logic will make any huge amount of difference, because there's already a decent amount of branching and hashing work hiding within each function call here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nice thing is that id gates explicitly show the added twirling gates. Moreover, in practice one should run a 1Q gate cleanup after twirling that should take care of this e,g
post_pm = PassManager([PauliTwirling('ecr'),
Optimize1qGatesDecomposition(backend.target.operation_names)])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those kinds of nice things have a very poor effect on performance. We've just made a big deal about Qiskit performance, and if we want to maintain it, we need to keep putting performance first, and designing our interfaces around that.
We can potentially allow it to be opt-in to show the twirling gates, but it's fairly trivial to write the twirling algorithm yourself if that's what you care about seeing, and that's always going to give you the most control over the output. If what we're after is that kind of "utility" function rather than a high-performance one, I'm not sure it fits in Qiskit - we already shipped off bunches of qiskit_algorithms
and the applications libraries out of the core library because they didn't fit the "highest performance core data structures and compilation" model, and decided to do that with documentation instead. If the purpose of the function is educational, then it feels to me like it fits in that vein, and should be handled with guide-level documentation on "introduction to Pauli twirling" rather than being presented in the core library as "this is an efficient way to do it".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok if id gates are a performance blocker than so be it. It also does not change the fact that I need to run a 1Q cleanup at the end, so either way, it is more important to have a function that gives users the ability to do this easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I commented on that in my main review. It's not really clear to me what the purpose of this function is at all from a performance perspective.
If the purpose is to run it post-compilation, then we should design the interface of the function to be smarter about twirls in the first place, so it outputs at least hardware-compatible circuits. Technically there'll still be possibilities for post-optimisation, since 1q runs of gates will likely be possible to be resynthesised, which again is something we might want to consider.
Co-authored-by: Julien Gacon <gaconju@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to be generally useful to people, I'm ok with adding it, but I think we need to consider the interfaces a bit more, and how it fits in with the rest of Qiskit. The benchpress version was one very specific implementation for a specific type of circuit that only works for one particular way of building the input circuit, and as long as it doesn't include quite a few of the possible features of Qiskit circuits.
In addition to the review comments, one more top-level one: I'm not really sure where in a user's compilation pipeline this is meant to be called, which also affects what I'd expect to have in the function. The twirling is adding Pauli X/Y/Z gates (obviously), which implies to me that it needs to come before compilation, but then it also makes assumptions that everything's going to be decomposed in terms of one single 2q gate, which, aside from being an awkward assumption for the future, means that we might need a better way to handle things than to output non-translated 1q gates? If we instead want to push it more in the direction of "apply this to a virtual circuit, but still one that's decomposed into 2q operations", I think we need to consider interfaces to control the known twirling sets for additional gates.
I'm concerned that we're committing to an interface acting on QuantumCircuit
that doesn't really act on either virtual or physical circuits, but somewhere in the middle. That's a transpiler pass, except that the num_twirls
thing makes that awkward, because any sort of multi-in / multi-out branching within the transpiler needs a significant rework of the pass-manager framework to become safe (speaking as somebody who already wrote such a modification once).
fn add_global_phase(py: Python, phase: &Param, other: &Param) -> PyResult<Param> { | ||
pub(crate) fn add_global_phase(py: Python, phase: &Param, other: &Param) -> PyResult<Param> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to use this elsewhere, can we think about where it should live, and how we should be organising the crate? Imports like crate::dag_circuit::add_global_phase
are awkward to work with internally if you weren't involved with the PR that originally put the function there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move it somewhere, I only left it here for brevity. Where would you like me to put it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't know right now. It's probably part of a separate issue.
@jakelishman PT is not something that fits into our current circuit transformation pipelines. It basically has to be a middle step between two passmanagers. One to map to HW, followed by PT, and then another that does cleanup of the 1Q gates and casts back into the HW basis set (usually over a set of circuits generated by PT). This is one of the motivations for #13326 |
If it doesn't really fit in Qiskit's models right now (and I agree, it doesn't particularly), then it feels like we're trying to merge such a function into the core library that'll never really fit in. If we're asking users to start using this now as part of the core library, they're going to find it far more awkward in the future if we say "actually, don't do this anymore and instead use this transpiler pass". That'll alienate our users, who already don't like the amount of changes we make. Making the compiler multi-in, multi-out is interesting, and something we already do internally in the IBM backends (I wrote it), but it's hugely fragile in the current pass-manager system, and it's tricky to make it better without compromising performance. We can get away with it in the backends because there's far tighter control over what passes are in the pipelines and when a pass might trigger such fan-out behaviour. We'll have a chance to do that better when we move more of the pass-manager infrastructure into Rust; we'll have more opportunities to make prior state in the If this is only ever meant to be an intermediate-term thing, I'm most worried about how we'll present this narratively to users, especially when we need to shift them to something else. On another narrative note: are we intending to present twirling as something that all users will need to do, and not the domain of a primitive implementer? Adding twirling as something we do on a single circuit, outside the context of the estimator observables and potential lightcone analysis might accidentally cause primitives implementations to have to do more work and have the user do more work to reconstruct their final results, since now a user might well end up passing a much larger broadcast matrix as their estimator inputs. |
But this is equiv to kicking the can down the road to a undetermined time frame, and features that have value will likely never see the light of day, waiting for the ideal conditions for implementation. Perhaps latter a pass can just call this same function that remains exposed to users as well, who knows. Twirling has value for Qiskit outside of the IBM primitives. Note that no other provider outside of IBM has a primitives interface that supports twirling. So there is value to other HW providers, and in addition there is value for users who want to see (and perhaps save) the exact circuits they are executing. I am not sure there is any major downside on the primitives side save for perhaps extra work to be done that otherwise didn't need to be done, e.g. twirl twice if that was an automatic option. However, I am pretty sure there is an option for that. |
This commit expands the twirling_gate argument to work with strings instead of classes and also specify a list of gates instead of a single gate type. It also changes the default to `None` to twirl all supported gates in the circuit.
A later pass can never call this function, because it's defined on the wrong types of inputs. Besides, that doesn't help the user, who would still have to switch what they're doing. Yes, I am proposing step back and think through what a joined-up interface to Qiskit from the user right through to hardware looks like, rather than us adding things in that implement one overly-specific microbenchmark more efficiently. We can't really claim either performance and stability if that's how we go about additions to the core library. I'm not saying twirling isn't important, or even that we shouldn't put it in right now, I'm saying that software engineering is more than just throwing interfaces at a wall to see what sticks. That works well in the early days of open-source work, but not for a product whose guiding principles are meant to be stability, performance and utility. For example, if the purpose is that this will always be used as a post-compilation pipeline where it's immediately followed by an
If we promote that you should do your twirling from within Qiskit, this takes away important information that a primitives implementation can use to reduce the number of QPU hours and shot-loops necessary, because now the user is supplying many different There are ways around these problems, but again, they mean stepping back and thinking through a joined-up interface, in conjunction with implementers of the primitives. For example, my problem of knowing when two multi-qubit potentially parametric blocks implement the same operation can be eased by additional I do totally agree that this could have utility outside IBM primitives implementations as well, but if so, we need to really think through how those other users should use this, and how we'll design our interfaces so that users will, by default and by using the most natural forms, get the experience that is most performant and most pleasant end-to-end. |
I guess if you want to future proof it a bit then you should implement as a transpiler pass that only does a single twirl. In that case the current usage would require two pass managers still, but the input to the second would be something like |
This is just why I wanted to talk about things - if the intention of this function is to just be a short-term, low-level post- If I'm understanding right, then the two proposed use cases for this function are:
In that case, maybe we work in an interface where we have an optional argument that's either Also if that's the case, then that impacts the interface promises we should make about the output circuit w.r.t. propagating any stored The concerns about this interacting poorly in general with how efficient |
The workflow is the following:
The twirling could be a function, or a |
This commit updates the twirl_circuits logic to recurse into a control flow operation and twirl any gates that are potentially in the block.
Step three is the big question to me: is this not something that narratively we promote is what a primitives implementer might do for you? How does adding this function interact with that? Can you expand on why you have to do this manually, when this is something that IBM primitives definitely can do? Knowing your step 2 is important, because then we can make the function better - having "twirl" be a standalone step isn't necessarily even the best interface for your workflow, because you have to do a full 1q post-optimize, while we can do that more efficiently on-the-fly (since 1q optimisation is pretty simple and a peephole optimisation). I'd like to know more about what other users' use-cases for manual twirling are - you're presenting one use case, but it might not be the only reason people are twirling themselves, and I wouldn't want the interface to get in the way of that. I know that there's a few people out there who've written things approximately similar to what's here, but I don't know for what purposes they've done that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the new interface, I was just commenting my nerd-sniped bits on the performance of the new twirl finder lol
Forcing users to instantiate a Optimize1qGatesDecomposition transpiler pass was a bit heavy when the twirling function wasn't actually calling the pass directly. All the pass was being used for was to pull the constraints out and pass it to Rust. It is simpler for users to just give the target to the function directly and also simplifies the function's code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor questions, otherwise LGTM
let z_matrix = aview2(&qiskit_circuit::gate_matrix::Z_GATE); | ||
let iter_set = [IGate, XGate, YGate, ZGate]; | ||
let kron_set: [Array2<Complex64>; 16] = [ | ||
kron(&i_matrix, &i_matrix), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is redundant to check if you want to directly push it onto out_vec
🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good call. I guess the question is should we still do that if the output is otherwise empty? This function will always return 4 identity gates even if there is no other valid twirling combination.
I'm thinking we should probably error if 4 identities are all that matches so only add it to the output list if there are other matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah an error (or at least a warning) would be good to raise to let users know the code didn't work as expected 👍🏻
let norm: f64 = diff_frob_norm_sq(twirled_matrix.view(), gate_matrix); | ||
if norm.abs() < 1e-15 { | ||
out_vec.push(([*i, *j, *k, *l], 0.)); | ||
} else if (norm - 16.).abs() < 1e-15 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like something that could give a false positive, i.e. some custom defined gate where the difference is 16 yet they differ by a relative phase or something... or is that not possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @anedumla I now believe this is true 😬
Co-authored-by: Julien Gacon <gaconju@gmail.com>
Since Qiskit#13331 merged the nightly asv runs have been failing. This is because the asv twirling benchmark code was updated in that PR to use the new twirling function instead of the embedded equivalent. However, in the PR the name of that function was renamed from twirl_circuit to pauli_twirl_2q_gates during the development of the feature. This rename was missed in the asv benchmark, and as nothing executes the asv code in CI this went uncaught until the asv nightly runs. This commit fixes this oversight and updates the function name to the correct name so that the asv benchmarks no longer fail on import.
* Fix twirling asv benchmark Since #13331 merged the nightly asv runs have been failing. This is because the asv twirling benchmark code was updated in that PR to use the new twirling function instead of the embedded equivalent. However, in the PR the name of that function was renamed from twirl_circuit to pauli_twirl_2q_gates during the development of the feature. This rename was missed in the asv benchmark, and as nothing executes the asv code in CI this went uncaught until the asv nightly runs. This commit fixes this oversight and updates the function name to the correct name so that the asv benchmarks no longer fail on import. * Update test/benchmarks/manipulate.py
Summary
This commit adds a new function twirl_circuit to apply Pauli twirling to a given circuit. This function only works with a fixed set of two qubit gates and is not a general solution. For performance this new function is written in rust and uses static lookup tables for the twirling sets around the fixed two qubit gates that are randomly sampled. There is an option to perform twirling multiple times and return a list of circuits instead of just doing it once. Ideally we'd do this in parallel, but we're currently blocked on the use of OnceCell in the PackedInstruction and the ParameterTable from parallelizing with CircuitData. We can further improve the performance of this new function by using a parallel iterator once #13219 is resolved. The function is written in a way to make this simple in the future.
Details and comments
Fixes #13325