Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pauli_twirl_2q_gates function #13331

Merged
merged 35 commits into from
Nov 7, 2024
Merged

Conversation

mtreinish
Copy link
Member

Summary

This commit adds a new function twirl_circuit to apply Pauli twirling to a given circuit. This function only works with a fixed set of two qubit gates and is not a general solution. For performance this new function is written in rust and uses static lookup tables for the twirling sets around the fixed two qubit gates that are randomly sampled. There is an option to perform twirling multiple times and return a list of circuits instead of just doing it once. Ideally we'd do this in parallel, but we're currently blocked on the use of OnceCell in the PackedInstruction and the ParameterTable from parallelizing with CircuitData. We can further improve the performance of this new function by using a parallel iterator once #13219 is resolved. The function is written in a way to make this simple in the future.

Details and comments

Fixes #13325

This commit adds a new function twirl_circuit to apply Pauli twirling to
a given circuit. This function only works with a fixed set of two qubit
gates and is not a general solution. For performance this new function
is written in rust and uses static lookup tables for the twirling sets
around the fixed two qubit gates that are randomly sampled. There is
an option to perform twirling multiple times and return a list of
circuits instead of just doing it once. Ideally we'd do this in
parallel, but we're currently blocked on the use of OnceCell in the
PackedInstruction and the ParameterTable from parallelizing with
CircuitData. We can further improve the performance of this new
function by using a parallel iterator once Qiskit#13219 is resolved. The
function is written in a way to make this simple in the future.

Fixes Qiskit#13325

Co-authored-by: Paul Nation <nonhermitian@gmail.com>
@mtreinish mtreinish added Changelog: New Feature Include in the "Added" section of the changelog Rust This PR or issue is related to Rust code in the repository mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library labels Oct 15, 2024
@mtreinish mtreinish added this to the 1.3.0 milestone Oct 15, 2024
@mtreinish mtreinish requested a review from a team as a code owner October 15, 2024 21:47
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@mtreinish
Copy link
Member Author

I ran an asv comparison (since the benchmarks needed to change to show the improvement here it's not a direct comparison) and this is showing a improvement of the previous twirling benchmark written in python from taking 254±4ms to 6.38±0.3ms with the twirl_circuit function in this PR:

| Change   | Before [54fe09b2] <lint_incr_latest~1>   | After [33b4bde2] <twirl-for-paul>   |   Ratio | Benchmark (Parameter)                                 |
|----------|------------------------------------------|-------------------------------------|---------|-------------------------------------------------------|
| x        | 254±4ms                                  | 6.38±0.3ms                          |    0.03 | manipulate.TestCircuitManipulate.time_DTC100_twirling |

The "Change" X is because asv detected the benchmark changed and the results may not be comparable.

@coveralls
Copy link

coveralls commented Oct 15, 2024

Pull Request Test Coverage Report for Build 11721761474

Details

  • 398 of 410 (97.07%) changed or added relevant lines in 8 files are covered.
  • 10 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.06%) to 88.859%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/circuit_data.rs 26 29 89.66%
qiskit/circuit/twirling.py 57 60 95.0%
crates/accelerate/src/twirling.rs 289 295 97.97%
Files with Coverage Reduction New Missed Lines %
crates/accelerate/src/unitary_synthesis.rs 1 92.2%
crates/qasm2/src/expr.rs 1 94.02%
crates/qasm2/src/lex.rs 2 91.98%
crates/qasm2/src/parse.rs 6 97.62%
Totals Coverage Status
Change from base Build 11719787153: 0.06%
Covered Lines: 78194
Relevant Lines: 87998

💛 - Coveralls

Copy link
Contributor

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a pretty neat speedup for something used a lot 🙂

qiskit/circuit/twirling.py Outdated Show resolved Hide resolved
qiskit/circuit/twirling.py Outdated Show resolved Hide resolved
crates/accelerate/src/twirling.rs Outdated Show resolved Hide resolved
crates/accelerate/src/twirling.rs Outdated Show resolved Hide resolved
crates/circuit/src/circuit_data.rs Outdated Show resolved Hide resolved
if gate == twirled_gate {
let qubits: Vec<Qubit> = out_circ.get_qargs(inst.qubits).to_vec();
let (twirl, twirl_phase) = twirl_set.choose(rng).unwrap();
out_circ.push_standard_gate(twirl[0], &[], &[qubits[0]])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a performance benefit to omitting identity gates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it wouldn't make much of a difference for the performance of this function. The two ways to do it would be to condition the push call on the value of twirl[index] which adds a bunch of branching logic around each push call or somehow tweak the static twirling sets which would be tricky because they work well because everything is fixed sizes. There is a potential benefit downstream of this function because we've potentially used less gates

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On average, 25% of the gates we'd be adding to the circuit will be identities, and that's a fair amount of data to reduce by not outputting it, which I'd expect to have a measurable impact on the runtime of this function. I wouldn't imagine that the branching logic will make any huge amount of difference, because there's already a decent amount of branching and hashing work hiding within each function call here.

Copy link
Contributor

@nonhermitian nonhermitian Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nice thing is that id gates explicitly show the added twirling gates. Moreover, in practice one should run a 1Q gate cleanup after twirling that should take care of this e,g

post_pm = PassManager([PauliTwirling('ecr'), 
                       Optimize1qGatesDecomposition(backend.target.operation_names)])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those kinds of nice things have a very poor effect on performance. We've just made a big deal about Qiskit performance, and if we want to maintain it, we need to keep putting performance first, and designing our interfaces around that.

We can potentially allow it to be opt-in to show the twirling gates, but it's fairly trivial to write the twirling algorithm yourself if that's what you care about seeing, and that's always going to give you the most control over the output. If what we're after is that kind of "utility" function rather than a high-performance one, I'm not sure it fits in Qiskit - we already shipped off bunches of qiskit_algorithms and the applications libraries out of the core library because they didn't fit the "highest performance core data structures and compilation" model, and decided to do that with documentation instead. If the purpose of the function is educational, then it feels to me like it fits in that vein, and should be handled with guide-level documentation on "introduction to Pauli twirling" rather than being presented in the core library as "this is an efficient way to do it".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok if id gates are a performance blocker than so be it. It also does not change the fact that I need to run a 1Q cleanup at the end, so either way, it is more important to have a function that gives users the ability to do this easily.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I commented on that in my main review. It's not really clear to me what the purpose of this function is at all from a performance perspective.

If the purpose is to run it post-compilation, then we should design the interface of the function to be smarter about twirls in the first place, so it outputs at least hardware-compatible circuits. Technically there'll still be possibilities for post-optimisation, since 1q runs of gates will likely be possible to be resynthesised, which again is something we might want to consider.

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be generally useful to people, I'm ok with adding it, but I think we need to consider the interfaces a bit more, and how it fits in with the rest of Qiskit. The benchpress version was one very specific implementation for a specific type of circuit that only works for one particular way of building the input circuit, and as long as it doesn't include quite a few of the possible features of Qiskit circuits.

In addition to the review comments, one more top-level one: I'm not really sure where in a user's compilation pipeline this is meant to be called, which also affects what I'd expect to have in the function. The twirling is adding Pauli X/Y/Z gates (obviously), which implies to me that it needs to come before compilation, but then it also makes assumptions that everything's going to be decomposed in terms of one single 2q gate, which, aside from being an awkward assumption for the future, means that we might need a better way to handle things than to output non-translated 1q gates? If we instead want to push it more in the direction of "apply this to a virtual circuit, but still one that's decomposed into 2q operations", I think we need to consider interfaces to control the known twirling sets for additional gates.

I'm concerned that we're committing to an interface acting on QuantumCircuit that doesn't really act on either virtual or physical circuits, but somewhere in the middle. That's a transpiler pass, except that the num_twirls thing makes that awkward, because any sort of multi-in / multi-out branching within the transpiler needs a significant rework of the pass-manager framework to become safe (speaking as somebody who already wrote such a modification once).

qiskit/circuit/twirling.py Outdated Show resolved Hide resolved
qiskit/circuit/twirling.py Outdated Show resolved Hide resolved
qiskit/circuit/twirling.py Outdated Show resolved Hide resolved
qiskit/circuit/twirling.py Outdated Show resolved Hide resolved
fn add_global_phase(py: Python, phase: &Param, other: &Param) -> PyResult<Param> {
pub(crate) fn add_global_phase(py: Python, phase: &Param, other: &Param) -> PyResult<Param> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use this elsewhere, can we think about where it should live, and how we should be organising the crate? Imports like crate::dag_circuit::add_global_phase are awkward to work with internally if you weren't involved with the PR that originally put the function there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it somewhere, I only left it here for brevity. Where would you like me to put it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't know right now. It's probably part of a separate issue.

crates/accelerate/src/twirling.rs Outdated Show resolved Hide resolved
crates/circuit/src/circuit_data.rs Outdated Show resolved Hide resolved
crates/accelerate/src/twirling.rs Outdated Show resolved Hide resolved
crates/accelerate/src/twirling.rs Outdated Show resolved Hide resolved
qiskit/circuit/twirling.py Outdated Show resolved Hide resolved
@nonhermitian
Copy link
Contributor

@jakelishman PT is not something that fits into our current circuit transformation pipelines. It basically has to be a middle step between two passmanagers. One to map to HW, followed by PT, and then another that does cleanup of the 1Q gates and casts back into the HW basis set (usually over a set of circuits generated by PT). This is one of the motivations for #13326

@jakelishman
Copy link
Member

If it doesn't really fit in Qiskit's models right now (and I agree, it doesn't particularly), then it feels like we're trying to merge such a function into the core library that'll never really fit in. If we're asking users to start using this now as part of the core library, they're going to find it far more awkward in the future if we say "actually, don't do this anymore and instead use this transpiler pass". That'll alienate our users, who already don't like the amount of changes we make.

Making the compiler multi-in, multi-out is interesting, and something we already do internally in the IBM backends (I wrote it), but it's hugely fragile in the current pass-manager system, and it's tricky to make it better without compromising performance. We can get away with it in the backends because there's far tighter control over what passes are in the pipelines and when a pass might trigger such fan-out behaviour.

We'll have a chance to do that better when we move more of the pass-manager infrastructure into Rust; we'll have more opportunities to make prior state in the PropertySet copy-on-write, rather than potentially eagerly copying large amounts of data at the point of fan-out, or being smarter about invalidating stale data in the PropertySet. Both of those would make an internal IR that's fan-out aware easier to make safe (the main concern) and performant (the next most important), but we're not there yet.


If this is only ever meant to be an intermediate-term thing, I'm most worried about how we'll present this narratively to users, especially when we need to shift them to something else.

On another narrative note: are we intending to present twirling as something that all users will need to do, and not the domain of a primitive implementer? Adding twirling as something we do on a single circuit, outside the context of the estimator observables and potential lightcone analysis might accidentally cause primitives implementations to have to do more work and have the user do more work to reconstruct their final results, since now a user might well end up passing a much larger broadcast matrix as their estimator inputs.

@nonhermitian
Copy link
Contributor

If it doesn't really fit in Qiskit's models right now (and I agree, it doesn't particularly), then it feels like we're trying to merge such a function into the core library that'll never really fit in

But this is equiv to kicking the can down the road to a undetermined time frame, and features that have value will likely never see the light of day, waiting for the ideal conditions for implementation. Perhaps latter a pass can just call this same function that remains exposed to users as well, who knows.

Twirling has value for Qiskit outside of the IBM primitives. Note that no other provider outside of IBM has a primitives interface that supports twirling. So there is value to other HW providers, and in addition there is value for users who want to see (and perhaps save) the exact circuits they are executing.

I am not sure there is any major downside on the primitives side save for perhaps extra work to be done that otherwise didn't need to be done, e.g. twirl twice if that was an automatic option. However, I am pretty sure there is an option for that.

This commit expands the twirling_gate argument to work with strings
instead of classes and also specify a list of gates instead of a single
gate type. It also changes the default to `None` to twirl all supported
gates in the circuit.
@jakelishman
Copy link
Member

A later pass can never call this function, because it's defined on the wrong types of inputs. Besides, that doesn't help the user, who would still have to switch what they're doing.

Yes, I am proposing step back and think through what a joined-up interface to Qiskit from the user right through to hardware looks like, rather than us adding things in that implement one overly-specific microbenchmark more efficiently. We can't really claim either performance and stability if that's how we go about additions to the core library. I'm not saying twirling isn't important, or even that we shouldn't put it in right now, I'm saying that software engineering is more than just throwing interfaces at a wall to see what sticks. That works well in the early days of open-source work, but not for a product whose guiding principles are meant to be stability, performance and utility.

For example, if the purpose is that this will always be used as a post-compilation pipeline where it's immediately followed by an Optimize1qGates pass, then the interface of this function probably should change in some manner that allows the twirling itself to inline the optimal 1q gate sequence for a particular backend. That will be more performant than this current implementation, because it won't need an additional transpiler pass to come later. Using the current interface over optimises for the microbenchmark, which isn't representative of the work that actually needs to be done - it elevates a subroutine of one particular (inefficient) choice of implementation to be a standard we should hit.

I am not sure there is any major downside on the primitives side save for perhaps extra work to be done that otherwise didn't need to be done.

If we promote that you should do your twirling from within Qiskit, this takes away important information that a primitives implementation can use to reduce the number of QPU hours and shot-loops necessary, because now the user is supplying many different Estimator inputs (one per twirl). That means that the same light-cone analysis now has to be done for each separate input, whereas with twirling done in the primitive implementations, it can be done before the twirling is. If the primitive is doing any other additional error learning or error mitigation, it'll likely now end up doing the same learning for each of the different input circuits, which could severely impact the number of QPU hours spent. It also means that the user has more data come back that they now have to merge together, which means more data the user has to send over the wire to us, and more data we have to send over the wire back to the user.

There are ways around these problems, but again, they mean stepping back and thinking through a joined-up interface, in conjunction with implementers of the primitives. For example, my problem of knowing when two multi-qubit potentially parametric blocks implement the same operation can be eased by additional boxes with annotations, which is something I'm currently in active discussions with the primitives teams about. However, if that's the way we go, then this function needs to be designed in such a way that it outputs those boxes, and we need to build the interface around that.

I do totally agree that this could have utility outside IBM primitives implementations as well, but if so, we need to really think through how those other users should use this, and how we'll design our interfaces so that users will, by default and by using the most natural forms, get the experience that is most performant and most pleasant end-to-end.

@nonhermitian
Copy link
Contributor

I guess if you want to future proof it a bit then you should implement as a transpiler pass that only does a single twirl. In that case the current usage would require two pass managers still, but the input to the second would be something like post_pm.run([qc]*num_twirls). Then you could probably extend it to one-in many-out in a fairly seamless manner. That would also leave the 1Q optimization separate, which is more in the spirit of the passes doing basic things

@jakelishman
Copy link
Member

This is just why I wanted to talk about things - if the intention of this function is to just be a short-term, low-level post-transpile thing, then we can still have a function like what's currently implemented, it just means we tweak the interface so it's more efficient, and we need to take care about how we teach users about it.

If I'm understanding right, then the two proposed use cases for this function are:

  • illustrate what Pauli twirling does
  • multiply twirl a post-compilation circuit, without concern for how efficiently that will be executable on a primitive (possibly because it'll be given to a non-primitive direct-access BackendV2.run, which might not be an IBM one).

In that case, maybe we work in an interface where we have an optional argument that's either None or an instance of Optimize1qDecomposition. If we're given the optimiser, the twirler can run along the 1q runs ahead of each twirled gate, then give that run and the twirling gate to the optimiser, and inline the output of that. That allows both use cases: for use case one, you don't pass the optimiser, and for use case two, you do. In the second use case, we're now more efficient than anything else proposed here, because there's no need for a follow-up passmanager at all.

Also if that's the case, then that impacts the interface promises we should make about the output circuit w.r.t. propagating any stored TranspileLayout, the register structure, and so on. Again, easy enough to do, we just needed to really hone in on what the purpose and use-case of the function is in order to know that.

The concerns about this interacting poorly in general with how efficient Estimators will want to work are still there, but that can generally be handled by documentation. That of course wouldn't be the case if this was intended to be used before input to a primitive, but that's what I want to make sure we're thinking about.

@nonhermitian
Copy link
Contributor

The workflow is the following:

  1. Transpile circuit as best as possible

  2. Take this best circuit and generate num_twirls versions that have been twirled, and then do a 1Q cleanup on the return set

  3. Execute (perhaps with things like DD added as well after twirling) using whatever execution pipeline a provider has

  4. Aggregate counts and compute desired quantities of interest.

The twirling could be a function, or a TransformationPass with the latter generating only a single twirl at the moment. Then one could combine the 1Q optimization pass in a post_pm as done above and get the desired output.

This commit updates the twirl_circuits logic to recurse into a control
flow operation and twirl any gates that are potentially in the block.
@jakelishman
Copy link
Member

Step three is the big question to me: is this not something that narratively we promote is what a primitives implementer might do for you? How does adding this function interact with that? Can you expand on why you have to do this manually, when this is something that IBM primitives definitely can do?

Knowing your step 2 is important, because then we can make the function better - having "twirl" be a standalone step isn't necessarily even the best interface for your workflow, because you have to do a full 1q post-optimize, while we can do that more efficiently on-the-fly (since 1q optimisation is pretty simple and a peephole optimisation).

I'd like to know more about what other users' use-cases for manual twirling are - you're presenting one use case, but it might not be the only reason people are twirling themselves, and I wouldn't want the interface to get in the way of that. I know that there's a few people out there who've written things approximately similar to what's here, but I don't know for what purposes they've done that.

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the new interface, I was just commenting my nerd-sniped bits on the performance of the new twirl finder lol

crates/accelerate/src/twirling.rs Outdated Show resolved Hide resolved
crates/accelerate/src/twirling.rs Outdated Show resolved Hide resolved
Forcing users to instantiate a Optimize1qGatesDecomposition transpiler
pass was a bit heavy when the twirling function wasn't actually calling
the pass directly. All the pass was being used for was to pull the
constraints out and pass it to Rust. It is simpler for users to just
give the target to the function directly and also simplifies the
function's code.
Copy link
Contributor

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor questions, otherwise LGTM

crates/circuit/src/circuit_data.rs Outdated Show resolved Hide resolved
qiskit/circuit/__init__.py Outdated Show resolved Hide resolved
qiskit/circuit/twirling.py Outdated Show resolved Hide resolved
let z_matrix = aview2(&qiskit_circuit::gate_matrix::Z_GATE);
let iter_set = [IGate, XGate, YGate, ZGate];
let kron_set: [Array2<Complex64>; 16] = [
kron(&i_matrix, &i_matrix),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is redundant to check if you want to directly push it onto out_vec 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good call. I guess the question is should we still do that if the output is otherwise empty? This function will always return 4 identity gates even if there is no other valid twirling combination.

I'm thinking we should probably error if 4 identities are all that matches so only add it to the output list if there are other matches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah an error (or at least a warning) would be good to raise to let users know the code didn't work as expected 👍🏻

let norm: f64 = diff_frob_norm_sq(twirled_matrix.view(), gate_matrix);
if norm.abs() < 1e-15 {
out_vec.push(([*i, *j, *k, *l], 0.));
} else if (norm - 16.).abs() < 1e-15 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like something that could give a false positive, i.e. some custom defined gate where the difference is 16 yet they differ by a relative phase or something... or is that not possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to @anedumla I now believe this is true 😬

test/python/circuit/test_twirling.py Outdated Show resolved Hide resolved
@raynelfss raynelfss added the on hold Can not fix yet label Oct 29, 2024
@mtreinish mtreinish removed the on hold Can not fix yet label Nov 5, 2024
@mtreinish mtreinish changed the title Add twirl_circuit function Add pauli_twirl_2q_gates function Nov 5, 2024
@mtreinish mtreinish added this pull request to the merge queue Nov 7, 2024
Merged via the queue into Qiskit:main with commit 3e136d7 Nov 7, 2024
17 checks passed
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Nov 11, 2024
Since Qiskit#13331 merged the nightly asv runs have been failing. This is
because the asv twirling benchmark code was updated in that PR to use
the new twirling function instead of the embedded equivalent. However,
in the PR the name of that function was renamed from twirl_circuit to
pauli_twirl_2q_gates during the development of the feature. This rename
was missed in the asv benchmark, and as nothing executes the asv code in
CI this went uncaught until the asv nightly runs. This commit fixes this
oversight and updates the function name to the correct name so that the
asv benchmarks no longer fail on import.
github-merge-queue bot pushed a commit that referenced this pull request Nov 11, 2024
* Fix twirling asv benchmark

Since #13331 merged the nightly asv runs have been failing. This is
because the asv twirling benchmark code was updated in that PR to use
the new twirling function instead of the embedded equivalent. However,
in the PR the name of that function was renamed from twirl_circuit to
pauli_twirl_2q_gates during the development of the feature. This rename
was missed in the asv benchmark, and as nothing executes the asv code in
CI this went uncaught until the asv nightly runs. This commit fixes this
oversight and updates the function name to the correct name so that the
asv benchmarks no longer fail on import.

* Update test/benchmarks/manipulate.py
@mtreinish mtreinish deleted the twirl-for-paul branch November 16, 2024 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend Pauli twirling to end users
7 participants