Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auxiliary qubit tracking in HighLevelSynthesis #12911

Merged
merged 7 commits into from
Aug 14, 2024
Merged

Conversation

Cryoris
Copy link
Contributor

@Cryoris Cryoris commented Aug 6, 2024

Summary

Enable tracking the state of auxiliary qubits in high-level synthesis. This allows synthesis plugins to leverage additional qubits in a decomposition, leading to a potential reduction of circuit depth and number of operations (which is why I tagged it for the performance demo 🙂). This PR unblocks work on the circuit library, in particular refactoring of the MCX and MCMT gates.

Details and comments

This first implementation follows a greedy approach, where each operation is given access to all currently available auxiliary qubits. It does so by using the newly added QubitTracker, which keeps track of which qubits in the circuit/dag are "clean" (in state $|0\rangle$) or "dirty" (an unknown state). Generally, if qubits start out as clean, and as soon as it is involved in an operation, it changes to dirty (special rules apply for operations like resets, barriers or delays).

Another important change is that synthesized blocks must be synthesized again. This recursion is necessary, since unrolling or decomposing an operation, could yield other operations that must be synthesized. This slows down the code compared to the current state, but only by another iteration over the DAG, so that should be a reasonably small overhead.

Follow-ups could involve:

  • speedup the code by having the synthesis plugins return DAG circuits (will be easier once we could construct a DAG from Rust's CircuitData or an iterator over circuit instructions)
  • avoid redundant recursion if plugins can report the basis gate set of their decomposition
  • do the whole thing in Rust 🙂
The asv utility benchmark finds that this does not significantly reduce performance.
Benchmarks that have stayed the same:

| Change   | Before [1fdd527f] <main>   | After [9726d19b] <hls+aux~1>   | Ratio   | Benchmark (Parameter)                                                         |
|----------|----------------------------|--------------------------------|---------|-------------------------------------------------------------------------------|
|          | 0                          | 0                              | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx')                 |
|          | 0                          | 0                              | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz')                 |
|          | 0                          | 0                              | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr')                |
|          | 701±6ms                    | 756±6ms                        | 1.08    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')             |
|          | 877±9ms                    | 939±10ms                       | 1.07    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')            |
|          | 3.40±0.02s                 | 3.61±0.08s                     | 1.06    | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                           |
|          | 3.33±0.01s                 | 3.51±0.08s                     | 1.05    | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                            |
|          | 5.46±0s                    | 5.74±0.06s                     | 1.05    | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                            |
|          | 1.02±0.02s                 | 1.07±0.01s                     | 1.05    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')             |
|          | 223±5ms                    | 231±9ms                        | 1.04    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                        |
|          | 1.38±0.01s                 | 1.43±0.01s                     | 1.04    | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                         |
|          | 248±2ms                    | 256±3ms                        | 1.03    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                        |
|          | 15.3±0.01s                 | 15.7±0.1s                      | 1.03    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                          |
|          | 238±1ms                    | 243±2ms                        | 1.02    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                       |
|          | 6.81±0.03ms                | 6.93±0.2ms                     | 1.02    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')               |
|          | 917±5ms                    | 931±6ms                        | 1.02    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                          |
|          | 73.4±0.4ms                 | 74.0±0.2ms                     | 1.01    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')                        |
|          | 73.5±0.2ms                 | 74.1±0.6ms                     | 1.01    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')                        |
|          | 73.3±0.2ms                 | 73.7±0.3ms                     | 1.01    | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')                       |
|          | 6.78±0.05ms                | 6.86±0.08ms                    | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')               |
|          | 6.86±0.08ms                | 6.90±0.05ms                    | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')              |
|          | 73.0±0.4ms                 | 73.5±0.3ms                     | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                |
|          | 23.3±0.2ms                 | 23.5±0.2ms                     | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')  |
|          | 23.3±0.09ms                | 23.7±0.3ms                     | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')  |
|          | 1.78±0.01s                 | 1.80±0.01s                     | 1.01    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                          |
|          | 14.7±0.03s                 | 14.8±0.01s                     | 1.01    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                           |
|          | 15.8±0.01s                 | 16.1±0.03s                     | 1.01    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                           |
|          | 3.84±0.01s                 | 3.84±0.01s                     | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                       |
|          | 395                        | 395                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')                 |
|          | 397                        | 397                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')                 |
|          | 397                        | 397                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr')                |
|          | 300                        | 300                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')                |
|          | 300                        | 300                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')                |
|          | 300                        | 300                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')               |
|          | 1483                       | 1483                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                   |
|          | 1488                       | 1488                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                   |
|          | 1488                       | 1488                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                  |
|          | 1954                       | 1954                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                    |
|          | 1954                       | 1954                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                    |
|          | 1954                       | 1954                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                   |
|          | 2538                       | 2538                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                     |
|          | 2538                       | 2538                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                     |
|          | 2538                       | 2538                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                    |
|          | 435                        | 435                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')      |
|          | 435                        | 435                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')      |
|          | 435                        | 435                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')     |
|          | 3.85±0.06s                 | 3.80±0.05s                     | 0.99    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                      |
|          | 73.1±0.9ms                 | 72.7±0.8ms                     | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                |
|          | 73.0±0.5ms                 | 72.4±0.6ms                     | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')               |
|          | 23.4±0.3ms                 | 23.2±0.1ms                     | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr') |
|          | 3.78±0.1s                  | 3.67±0.02s                     | 0.97    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                       |

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

<\details>

@Cryoris Cryoris added this to the 1.3 beta milestone Aug 6, 2024
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Cryoris
  • @Qiskit/terra-core
  • @ajavadia

@coveralls
Copy link

coveralls commented Aug 6, 2024

Pull Request Test Coverage Report for Build 10370210258

Details

  • 177 of 193 (91.71%) changed or added relevant lines in 6 files are covered.
  • 8 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.03%) to 89.684%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/synthesis/high_level_synthesis.py 121 126 96.03%
qiskit/transpiler/passes/synthesis/qubit_tracker.py 52 63 82.54%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 2 92.48%
crates/qasm2/src/parse.rs 6 96.69%
Totals Coverage Status
Change from base Build 10357074988: 0.03%
Covered Lines: 67507
Relevant Lines: 75272

💛 - Coveralls

Copy link
Contributor

@henryzou50 henryzou50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM Great job on the implementation and the thorough testing.

However, I noticed that it fails a lint test. This should be fixed by removing the unused import in clifford_decompose_bm.py.

Additionally, there were several sections in high_level_synthesis.py that seemed to have outdated comments. It would be helpful to update these to maintain clarity.

I also left some suggestions for small improvements to qubit_tracker.py that could enhance its efficiency. These changes involve optimizing set operations and membership testing, which should help improve the performance of the qubit tracking.

Once these minor issues are addressed, it should be good to merge!

qiskit/synthesis/clifford/clifford_decompose_bm.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/synthesis/high_level_synthesis.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/synthesis/high_level_synthesis.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/synthesis/high_level_synthesis.py Outdated Show resolved Hide resolved
@@ -66,6 +66,7 @@ def transpile( # pylint: disable=too-many-return-statements
optimization_method: Optional[str] = None,
ignore_backend_supplied_default_methods: bool = False,
num_processes: Optional[int] = None,
qubits_initially_zero: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we talked about the variable naming earlier today. This works well, but if you want to try out a shorter name, I think qubits_zeroed works here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea, or maybe qubits_reset if we want to avoid the "zero" 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I like that suggestion. It is clear and avoids any confusion with the term "zero" :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, personally I like the current name qubits_initially_zero a bit more than qubits_reset or qubits_zeroed since it's more clear what it means.

qiskit/transpiler/passes/synthesis/qubit_tracker.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/synthesis/qubit_tracker.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/synthesis/qubit_tracker.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/synthesis/qubit_tracker.py Outdated Show resolved Hide resolved
Copy link
Contributor

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am half-way looking at the refactoring and I really like it! I have left a couple of interim questions and suggestions.

qiskit/transpiler/passes/synthesis/qubit_tracker.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/synthesis/qubit_tracker.py Outdated Show resolved Hide resolved
Comment on lines 370 to 375
@deprecate_arg(
"use_qubit_indices",
since="1.3",
additional_msg="The qubit indices will then always be used.",
pending=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still trying to figure out whether we need use_qubit_indices or not. The motivation was to provide the context whether HLS runs before or after layout/routing. In the default transpiler flow, it runs before, i.e. on the logical circuit. Note that HLS has access to target/coupling_map but (as no layout/routing has been done) it does not know the mapping between logical and physical qubits. In addition, most of the available synthesis plugins do not take the coupling map into account, e.g., all clifford synthesis methods synthesizes cliffords implicitly assuming the all-to-all connectivity. However, in some cases we might want to run HLS after layout/routing and we have a few internal and external plugins that ensure adherence to the coupling map (e.g., the token swapper plugin, or the external sat-based synthesis plugins for cliffods).

Hmm... I think the only problematic usecase would be if we wanted to run say token swapper on the logical circuit, in which case it would wrongly assume that it has to adhere to the target connectivity and introduce more logic (that would later be routed potentially introducing even more swaps).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarizing the offline discussion: let's keep it and if use_qubit_indices=False, then we assume we cannot query the target whether a gate is supported. Should we rename this to physical_qubits as well to make it clearer that the circuit has been mapped to physical qubits already?

qiskit/transpiler/passes/synthesis/high_level_synthesis.py Outdated Show resolved Hide resolved
Copy link
Contributor

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Cryoris for taking over and greatly improving the PR. I have left a few additional questions related to potential edgecases which I am not sure might or might not happen.

Copy link
Contributor

@henryzou50 henryzou50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for changes in the first round of reviews. I've looked over the updates, and everything appears to be in order. Great work!

I noticed that Sasha has also left some insightful comments. Once those are addressed, I believe we'll be in a good position to merge the changes

Copy link
Contributor

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all of my review comments and adding tests for different cornercases (plugin mechanism returning None, having clbits).

The only remaining minor nitpick is that _synthesize_operation should return a tuple (consisting of the synthesized object and used qubits), can you please update the signature of this function accordingly?

Copy link
Contributor

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Julien! I am now fully happy with this PR. @henryzou50, please take another look and see if it can be merged.

Copy link
Contributor

@henryzou50 henryzou50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work and thanks for the changes :)

@henryzou50 henryzou50 added this pull request to the merge queue Aug 14, 2024
Merged via the queue into Qiskit:main with commit 11e769f Aug 14, 2024
15 checks passed
@Cryoris Cryoris deleted the hls+aux branch August 14, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog synthesis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants