Reimplement SabreSwap heuristic scoring in Rust #7977

mtreinish · 2022-04-22T14:02:58Z

Summary

This commit re-implements the core heuristic scoring of swap candidates
in the SabreSwap pass as a multithread Rust routine. The heuristic
scoring in sabre previously looped over all potential swap candidates
serially in Python and applied a computed a heuristic score on which to
candidate to pick. This can easily be done in parallel as there is no
data dependency between scoring the different candidates. By performing
this in Rust not only is the scoring operation done more quickly for
each candidate but we can also leverage multithreading to do this
efficiently in parallel.

Details and comments

TODO:

Fix behavior differences with Python implementation
Benchmarking and tuning
Organize rust sabre code into module
Add release note
Rust function docs

This commit re-implements the core heuristic scoring of swap candidates in the SabreSwap pass as a multithread Rust routine. The heuristic scoring in sabre previously looped over all potential swap candidates serially in Python and applied a computed a heuristic score on which to candidate to pick. This can easily be done in parallel as there is no data dependency between scoring the different candidates. By performing this in Rust not only is the scoring operation done more quickly for each candidate but we can also leverage multithreading to do this efficiently in parallel.

qiskit-bot · 2022-04-22T14:03:00Z

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

@Qiskit/terra-core
@kevinhartman
@mtreinish

coveralls · 2022-04-22T14:39:13Z

Pull Request Test Coverage Report for Build 2697565720

51 of 53 (96.23%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.002%) to 83.998%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
qiskit/transpiler/passes/routing/sabre_swap.py	50	52	96.15%

Totals
Change from base Build 2695829821:	0.002%
Covered Lines:	55876
Relevant Lines:	66521

💛 - Coveralls

This commit moves the sabre specific code into a separate rust module. We already were using a separate Python module for the sabre code this just mirrors that in the rust code for better organization.

jakelishman

Looks like a promising choice for Rust - swapping away from the slow Python-based Layout (though maybe we should work on fixing that) alone should give us a lot. I'm excited to see the benchmarks for the rest of the pass!

qiskit/transpiler/passes/routing/sabre_swap.py

jakelishman · 2022-04-22T14:38:15Z

src/sabre_swap.rs

+use crate::qubits_decay::QubitsDecay;
+use crate::swap_scores::SwapScores;
+
+const EXTENDED_SET_WEIGHT: f64 = 0.5;


I suspect this overrides the Python variable of the same name?

jakelishman · 2022-04-22T14:39:39Z

src/sabre_swap.rs

+        .iter()
+        .filter_map(|(k, v)| if v == min_score { Some(*k) } else { None })
+        .collect();
+    best_swaps.par_sort();


Do we need to sort best_swaps if they're already all the same value? I guess this is to do with keeping the same routing for the same random seed, but given the test changes, it looks like we already may be breaking something there. That said, I think the differences in the tests might be to do with possible mistake in handling the decay mentioned above.

Yeah, I only did this because it was in the python code. I assumed the sort was there mostly for repeatability because we're dependent on the insertion order otherwise. I can drop it though and see what happens, it would speed things up without needing to do this.

jakelishman · 2022-04-22T14:42:35Z

src/sabre_swap.rs

+fn score_heuristic(
+    heuristic: &Heuristic,
+    layer: &[[usize; 2]],
+    extended_set: &[[usize; 2]],
+    layout: &NLayout,
+    swap_qubits: &[usize; 2],
+    dist: &ArrayView2<f64>,
+    qubits_decay: &[f64],
+) -> f64 {
+    match heuristic {
+        Heuristic::Basic => compute_cost(layer, layout, dist),
+        Heuristic::Lookahead => score_lookahead(layer, extended_set, layout, dist),
+        Heuristic::Decay => {
+            score_decay(layer, extended_set, layout, dist, swap_qubits, qubits_decay)
+        }
+    }
+}


Does Rust have function pointers / first-class functions? It feels like it ought to be more efficient to do this switch block one stack frame higher, but maybe the compiler or modern branch prediction is good enough that it doesn't matter, and the different parameters would be a nuisance.

You can do that with rust, I think the problem though is the arguments are different for each function so the type checker will complain that fn(&[[usize; 2]], &NLayout, &ArrayView2<f64>) -> f64 is different from fn(&[[usize; 2]], &[[usize; 2]], &NLayout, &ArrayView2<f64>) -> f64 and fn(&[[usize; 2]], &[[usize; 2]], &NLayout, &ArrayView2<f64>, &[f64]) -> f64. To fix this we can add unused arguments to basic and lookahead so they all match. That being said I assume the compiler is probably smart enough to optimize it away and even if it didn't yeah I assume the branch predictor will predict it correctly. Once I get this more finalized and the performance where it should be I can look at doing this as a potential follow on optimization.

I need to do some more work with Rust even if only to recalibrate my programming expectations from a pure interpreted language like Python to one that passes through an optimising compiler - it's been a while since I did any major amount of work in a language where you didn't have to do the compiler's job for it!

This commit removes an unecessary parallel iterator over the swap scores to find the minimum and just does it serially. The threading overhead for the parallel iterator is unecessary as it is fairly quick.

The use of an inner hashmap meant the swap candidates were being evaluated in a different order based on the hash seeding instead of the order generated from the python side. This commit fixes by switching the internal type to an IndexMap which for a little overhead preserves the insertion order on iteration.

kevinhartman

Looks like a nice improvement so far.

src/nlayout.rs

src/sabre_swap/swap_scores.rs

mtreinish · 2022-07-01T00:44:35Z

I've been running some scale testing on qv circuits (with a fixed depth to limit runtime) on heavy hex coupling graphs. So far SabreLayout finished first and the data looks quite good:

While this is showing the ratio the absolute times for the 886 qubit run were 36.065 secs for rust and 12605.42 secs for the python version on main.

This was with data generated from this script:

import csv
import time

import numpy as np

from qiskit.converters import circuit_to_dag
from qiskit.transpiler import CouplingMap
from qiskit.circuit.library import QuantumVolume
from qiskit.transpiler import PassManager
from qiskit.transpiler.passes import (
    SabreLayout,
    FullAncillaAllocation,
    EnlargeWithAncilla,
    ApplyLayout,
    SabreSwap,
    Unroll3qOrMore,
)


def bench_qv():
    with open("sabre_swap_sabre_layout.csv", "w", newline="") as csvfile:
        times_writer = csv.writer(csvfile)
        times_writer.writerow(["cmap_size", "width", "depth", "time"])
        for distance in range(3, 21, 2):
            cmap = CouplingMap.from_heavy_hex(distance)
            layout_pm = PassManager(
                [
                    Unroll3qOrMore(),
                ]
            )
            stoch_pass = SabreLayout(cmap, max_iterations=4, seed=50024)
            width = len(cmap.graph)
            depth = 10
            print(width)
            circuit = QuantumVolume(width, 5, seed=50024)
            circuit.measure_all()
            layout_circ = layout_pm.run(circuit)
            stoch_pass = SabreSwap(cmap, heuristic="decay", seed=50024)
            stoch_pass.property_set = layout_pm.property_set
            dag = circuit_to_dag(layout_circ)
            start = time.time()
            stoch_pass.run(dag)
            stop = time.time()
            run_time = stop - start
            times_writer.writerow([cmap.size(), width, depth, run_time])


if __name__ == "__main__":
    bench_qv()

I'm doing another run measuring SabreSwap after the layout measured here. I'll post the results when they finish.

mtreinish · 2022-07-01T11:32:04Z

The SabreSwap run finished this morning:

This was generated using data from a very similar script, but timing running SabreSwap like optimization level 3 does after SabreLayout:

import csv
import time

import numpy as np

from qiskit.converters import circuit_to_dag
from qiskit.transpiler import CouplingMap
from qiskit.circuit.library import QuantumVolume
from qiskit.transpiler import PassManager
from qiskit.transpiler.passes import (
    SabreLayout,
    FullAncillaAllocation,
    EnlargeWithAncilla,
    ApplyLayout,
    SabreSwap,
    Unroll3qOrMore,
)


def bench_qv():
    with open("rabre_rwap.csv", "w", newline="") as csvfile:
        times_writer = csv.writer(csvfile)
        times_writer.writerow(["cmap_size", "width", "depth", "time"])
        for distance in range(3, 21, 2):
            cmap = CouplingMap.from_heavy_hex(distance)
            layout_pm = PassManager(
                [
                    Unroll3qOrMore(),
                    SabreLayout(cmap, max_iterations=4, seed=50024),
                    FullAncillaAllocation(cmap),
                    EnlargeWithAncilla(),
                    ApplyLayout(),
                ]
            )
            width = len(cmap.graph)
            depth = 10
            print(width)
            circuit = QuantumVolume(width, 5, seed=50024)
            circuit.measure_all()
            layout_circ = layout_pm.run(circuit)
            stoch_pass = SabreSwap(cmap, heuristic="decay", seed=50024)
            stoch_pass.property_set = layout_pm.property_set
            dag = circuit_to_dag(layout_circ)
            start = time.time()
            stoch_pass.run(dag)
            stop = time.time()
            run_time = stop - start
            times_writer.writerow([cmap.size(), width, depth, run_time])


if __name__ == "__main__":
    bench_qv()

I also graphed just the rust implementation run times to get a feel for the scaling:

kevinhartman

Looks good. Just a few small comments.

src/sabre_swap/edge_list.rs

src/sabre_swap/mod.rs

src/sabre_swap/qubits_decay.rs

Co-authored-by: Kevin Hartman <kevin@hart.mn>

This commit updates the sort step in the sabre algorithm to only run a parallel sort if we're not already in a parallel context. This is to prevent a potential over dispatch of work if we're trying to use multiple threads from multiple processes. At the same time the sort algorithm used is switched to the unstable variant because a stable sort isn't necessary for this application and an unstable sort has less overhead.

In Qiskit#7977 we started the process of oxidizing SabreSwap by replacing the inner-most scoring heuristic loop with a rust routine. This greatly improved the overall performance and scaling of the transpiler pass. Continuing from where that started this commit migrates more of the pass into the Rust domain so that almost all the pass's operations are done inside a rust module and all that is returned is a list of swaps to run prior to each 2q gate. This should further improve the runtime performance of the pass and scaling because the only steps performed in Python are generating the input data structures and then replaying the circuit with SWAPs inserted at the appropriate points. While we could have stuck with Qiskit#7977 as the performance of the pass was more than sufficient after it. What this commit really enables by moving most of the pass to the rust domain is to expand with improvments and expansion of the sabre algorithm which will require multithreaded to be efficiently implemented. So while this will have some modest performance improvements this is more about setting the stage for introducing variants of SabreSwap that do more thorough analysis in the future (which were previously preculded by the parallelism limitations of python).

This commit updates the preset pass manager construction to use the SabreLayout and SabreSwap passes by default for optimization level 1 and level 2. With the recently merged Qiskit#7977 the performance of the sabre swap pass has improved significantly enough to be considered for use by default with optimization levels 1 and 2. While for small numbers of target device qubits (< 30) the SabreLayout/SabreSwap pass doesn't quite match the runtime performance of DenseLayout/StochasticSwap it typically has better runtime performance for larger target devices. Additionally, the runtime performance of Sabre should also improve further after Qiskit#8388 is finished. However, the output quality from the sabre passes is typically better resulting in fewer swap gates being inserted. With the combination of better quality and comparable runtime performance it makes sense to use sabre as the default for optimization levels 1 and 2. For optimization level 0 stochastic swap is still used there because we want to continue to leverage TrivialLayout for that level and to get the full quality advantages SabreSwap and SabreLayout should be used together.

In Qiskit#7977 we moved to use compiled objects for part of the SabreSwap compiler pass. However an unintended side effect of that PR was the use of Rust objects stored in instance level variables which weren't pickleable. This breaks multiprocessing at the PassManager level which expects to be able to pickle and send a SabreSwap object to the subprocess running on a circuit. This commit fixes this by making the Rust NeighborTable object pickleable and switching to storing the heuristic string at the instance level instead of the heuristic enum.

* Further oxidize sabre In #7977 we started the process of oxidizing SabreSwap by replacing the inner-most scoring heuristic loop with a rust routine. This greatly improved the overall performance and scaling of the transpiler pass. Continuing from where that started this commit migrates more of the pass into the Rust domain so that almost all the pass's operations are done inside a rust module and all that is returned is a list of swaps to run prior to each 2q gate. This should further improve the runtime performance of the pass and scaling because the only steps performed in Python are generating the input data structures and then replaying the circuit with SWAPs inserted at the appropriate points. While we could have stuck with #7977 as the performance of the pass was more than sufficient after it. What this commit really enables by moving most of the pass to the rust domain is to expand with improvments and expansion of the sabre algorithm which will require multithreaded to be efficiently implemented. So while this will have some modest performance improvements this is more about setting the stage for introducing variants of SabreSwap that do more thorough analysis in the future (which were previously preculded by the parallelism limitations of python). * Fix most test failures This commit fixes a small typo/logic error in the algorithm implementation that was preventing sabre from making forward progress because it wasn't correctly identifying successors for the next layer. By fixing this all the hard errors in the SabreSwap tests are fixed. The only failures left seem to be related to a different layout which hopefully is not a correctness issue but just caused by different ordering. * Rework circuit reconstruction to use layer order In some tests there were subtle differences in the relative positioning of the 1q gates relative to inserted swaps (i.e. a 1q gate which was before the swap previously could move to after it). This was caused by different topological ordering being used between the hybrid python sabre implementation and the mostly rust sabre implementations. To ensure a consistent ordering fater moving mostly to rust this changes the swap insertion loop to iterate over the circuit layers which mirrors how the old sabre implementation worked. * Differentiate between empty extended_set and none * Simplify arguments passing to remove adjacency matrix storage * Only check env variables once in rust * Rust side NLayout.copy() * Preserve SabreSwap execution order This commit fixes an issue where in some cases the topological order the DAGCircuit is traversed is different from the topological order that sabre uses internally. The build_swap_map sabre swap function is only valid if the 2q gates are replayed in the same exact order when rebuilding the DAGCircuit. If a 2q gate gets replayed in a different order the layout mapping will cause the circuit to diverge and potentially be invalid. This commit updates the replay logic in the python side to detect when the topological order over the dagcircuit differs from the sabre traversal order and attempts to correct it. * Rework SabreDAG to include full DAGCircuit structure Previously we attempted to just have the rust component of sabre deal solely with the 2q component of the input circuit. However, while this works for ~80% of the cases it fails to account ordering and interactions between non-2q gates or instructions with classical bits. To address this the sabre dag structure is modified to contain all isntructions in the input circuit and structurally match the DAGCircuit's edges. This fixes most of the issues related to gate ordering the previous implementation was encountering. It also simplifies the swap insertion/replay of the circuit in the python side as we now get an exact application order from the rust code. * Switch back to topological_op_nodes() for SabreDAG creation * Fix lint * Fix extended set construction * Fix typo in application of decay rate * Remove unused QubitsDecay class * Remove unused EdgeList class * Remove unnecessary SabreRNG class * Cleanup SabreDAG docstring and comments * Remove unused edge weights from SabreDAG The edge weights in the SabreDAG struct were set to the qubit indices from the input DAGCircuit because the edges represent the flow of data on the qubit. However, we never actually inspect the edge weights and all having them present does is use extra memory. This commit changes SabreDAG to just not set any weight for edges as all we need is the source and target nodes for the algorithm to work. * s/_bit_indices/_qubit_indices/g * Fix sabre rust class signatures

* Use Sabre by default for optimization levels 1 and 2 This commit updates the preset pass manager construction to use the SabreLayout and SabreSwap passes by default for optimization level 1 and level 2. With the recently merged #7977 the performance of the sabre swap pass has improved significantly enough to be considered for use by default with optimization levels 1 and 2. While for small numbers of target device qubits (< 30) the SabreLayout/SabreSwap pass doesn't quite match the runtime performance of DenseLayout/StochasticSwap it typically has better runtime performance for larger target devices. Additionally, the runtime performance of Sabre should also improve further after #8388 is finished. However, the output quality from the sabre passes is typically better resulting in fewer swap gates being inserted. With the combination of better quality and comparable runtime performance it makes sense to use sabre as the default for optimization levels 1 and 2. For optimization level 0 stochastic swap is still used there because we want to continue to leverage TrivialLayout for that level and to get the full quality advantages SabreSwap and SabreLayout should be used together. * Fix pickling of SabreSwap object In #7977 we moved to use compiled objects for part of the SabreSwap compiler pass. However an unintended side effect of that PR was the use of Rust objects stored in instance level variables which weren't pickleable. This breaks multiprocessing at the PassManager level which expects to be able to pickle and send a SabreSwap object to the subprocess running on a circuit. This commit fixes this by making the Rust NeighborTable object pickleable and switching to storing the heuristic string at the instance level instead of the heuristic enum. * Update layout tests to match new default This commit updates a failing layout test which was assuming that level 1 and level 2 where still running DenseLayout. The test has been updated to reflect the new default of SabreLayout. * Fix stochastic swap specific test to use that routing method Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

This commit modifies the SabreLayout pass when run without the routing_pass argument to run primarily in Rust. This builds on top of the rust version of SabreSwap previously added in Qiskit#7977, Qiskit#8388, and Qiskit#8572. Internally, when the routing_pass argument is not set SabreLayout will perform the full sabre algorithm both layout selection and final swap mapping in rust and return the selected initial layout, the final layout, the toplogical sorting used to traverse the circuit, and a SwapMap for any swaps inserted. This is then used to build the output circuit in place of running separate layout and routing passes. The preset pass managers are updated to handle the new combined layout and routing mode of operation for SabreLayout. The routing stage to the preset pass managers remains intact, it will just operate as if a perfect layout was selected and skip SabreSwap because the circuit is already matching the connectivity constraints. Besides just operating more quickly because the heavy lifting of the algorithm operates more efficiently in a compiled language, doing this in rust also lets change our parallelization model for running multiple seed in Sabre. Just as in Qiskit#8572 we added support for SabreSwap to run multiple parallel trials with different seeds this commit adds a layout_trials argument to SabreLayout to try multiple seeds in parallel. When this is used it parallelizes at the outer layer for each layout/routing combination and the total minimal swap count seed is used. So for example if you set swap_trials=5 and layout_trails=5 that will run 5 tasks in the threadpool with 5 different seeds for the outer layout run. Inside that every time sabre swap is run (which will be multiple times as part of layout plus the final routing run) it tries 5 different seeds for each execution serially inside that parallel task. This should hopefully further improve the quality of the transpiler output and better match expectations for users who were previously calling transpile() multiple times to emulate this behavior. Implements Qiskit#9090

* Oxidize SabreLayout pass This commit modifies the SabreLayout pass when run without the routing_pass argument to run primarily in Rust. This builds on top of the rust version of SabreSwap previously added in #7977, #8388, and #8572. Internally, when the routing_pass argument is not set SabreLayout will perform the full sabre algorithm both layout selection and final swap mapping in rust and return the selected initial layout, the final layout, the toplogical sorting used to traverse the circuit, and a SwapMap for any swaps inserted. This is then used to build the output circuit in place of running separate layout and routing passes. The preset pass managers are updated to handle the new combined layout and routing mode of operation for SabreLayout. The routing stage to the preset pass managers remains intact, it will just operate as if a perfect layout was selected and skip SabreSwap because the circuit is already matching the connectivity constraints. Besides just operating more quickly because the heavy lifting of the algorithm operates more efficiently in a compiled language, doing this in rust also lets change our parallelization model for running multiple seed in Sabre. Just as in #8572 we added support for SabreSwap to run multiple parallel trials with different seeds this commit adds a layout_trials argument to SabreLayout to try multiple seeds in parallel. When this is used it parallelizes at the outer layer for each layout/routing combination and the total minimal swap count seed is used. So for example if you set swap_trials=5 and layout_trails=5 that will run 5 tasks in the threadpool with 5 different seeds for the outer layout run. Inside that every time sabre swap is run (which will be multiple times as part of layout plus the final routing run) it tries 5 different seeds for each execution serially inside that parallel task. This should hopefully further improve the quality of the transpiler output and better match expectations for users who were previously calling transpile() multiple times to emulate this behavior. Implements #9090 * Use deepcopy for coupling map copy Previously this PR was using copy() to copy the coupling map before we mutated it to be symmetric (a requirement for the sabre algorithm). However, this modification of the object was leaking out causing test failures. This commit switches it to a deepcopy to ensure there are no shared references (and a comment added to explain it's needed). * Fix failing unitary synthesis tests This PR branch modifies the default behavior of the SabreLayout pass so it is now a transformation pass that computes a layout, applies it, and then performs routing. This means when using sabre layout in a custom pass manager we no longer need to embed a layout after computing the layout. The failing unitary synthesis tests were using a custom pass manager and trying to apply the layout again after SabreLayout already did. This commit just removes this now unecessary steps from the test code. * Add release note * Run BarrierBeforeMeasurement before new SabreLayout Now that the routing stage is integrated into the SabreLayout pass we should be running the BarrierBeforeMeasurement pass prior to layout in the preset pass managers instead of before routing. The goal of the pass is to prevent the routing algorithms for accidentally reusing a qubit after a final measurement which would be invalid by inserting a barrier before the measurements to ensure all qubits are swap mapped prior to adding the measurements during routing. While this might not strictly be necessary (it didn't affect any test output) it feels like best practice to ensure we're doing this prior to potentially routing to prevent issues. * Improve docstrings * Set a fixed number of layout trials in preset pass managers For reproducible results with a fixed seed this commit sets a fixed number of layout_trials for the SabreLayout pass in the preset pass managers. If we did not set a fixed value than the output of the transpiler with a fixed seed will vary based on the number of physical cores that is running the compilation. To start optimization levels 0 and 1 use 5, level 2 uses 10, and level 3 uses 20 which matches the swap_trials argument we used. This is just a starting point, we can adjust these values later if needed. * Update tests for layout changes This commit updates the tests which are checking exact layouts with a fixed seed when running SabreLayout. The changes to SabreLayout breaks exact seed reproducibility from the earlier version of the pass. So we need to update these tests for their new layout assignment from the improved pass. One exception is a test which was trying to assert that transpile() preserves a swap if it's in the basis set. However, the new layout and routing output from SabreLayout for that test was resulting in all the swaps getting optimized away at optimization level 3 (resulting in 13 cx gates instead of ~4 cx gates and 5 swaps before, which would be more efficient on real hardware). So the test was removed and only run at lower optimziation levels. * Set a fixed number of layout trials in SabreLayout tests The dedicated tests for SabreLayout were not running a fixed number of trials. This was causing a different layout to be returned in tests when run across multiple systems as the number of trials defaults to the number of physical CPUs. This commit fixes the trial count to the number of cores on the local system where the layout was updated. This should fix the non-determinism in the tests causing failures in CI and on different local systems. * Run SabreSwap in parallel if only a single layout trial If there is only a single layout trial being run we don't have to worry about trying to do too much work in parallel at once by parallelizing the inner sabre swap execution. This commit updates the threading logic to enable running the inner sabre swap trials in parallel if there is only a single layout trial. * Remove duplicated SabreDAG creation * Correctly apply selected layout on dag nodes This commit corrects a bug in the PR branch that was caused by applying the selected initial layout in a trial to the swapped order node list. This was causing unexpected results when applying the circuit because the intent was to apply it only to the original input not the reversed input. * Remove unnecessary clone from serial layout trials In the case we're evaluating the layout trials serially instead of in a parallel iterator we don't need to clone the dag nodes list. This is because nothing will be modifying it in parallel, so we don't need a thread local copy. Each call to layout_trial() will keep the dag nodes vector intact (see previous commit for fixing this) so it can just be passed by reference if there are no parallel threads involved. * Fix seed setup when no user seed specified This commit fixes an issue prevent seed randomization when no seed is specified. On subsequent uses of a pass SabreLayout would not randomize the seed between runs because it was setting the seed to instance state. This commit fixes this issue by relying on initializing the RNG from entropy each time run() is called if no user specified seed is provided. * Start from trivial layout for routing stage This commit fixes the routing run to run from a trivial layout instead of the initial layout. By the time we do final routing for a trial we've already applied the selected initial layout to the SabreDAG. So the correct layout to use for running final swap mapping is a trivial layout where logical bit 0 is at physical bit 0. Using initial layout twice means we end up mapping more than is needed resulting in incorrect results. * Revert "Correctly apply selected layout on dag nodes" This change was incorrect, the output was already in the correct order and this was causing the behavior it strived to fix. This commit reverts the addition of the extra mem::swap() call to fix things. This reverts commit d98ef6c. * Deduplicate NLayout trivial layout creation This commit deduplicates the trivial layout generation for the NLayout class. Previously there were a few places both in rust and python that sabre layout was manually generating a trivial NLayout object. THis commit adds a static method to the NLayout class that allows both Python and Rust to easily create a new trivial NLayout object instead of manually creating the object. * Fix fixed layout tests after updates Since more recent commits fixed a few bugs in the behavior of the SabreLayout pass, the previously updated fixed layout tests were no longer correct. This commit updates the tests which were now failing because the layout changed again after fixing bugs in the new pass code. * Try nesting parallelism in the sabres Looking at profiles for running the new SabreLayout pass, as expected the runtime of the rust SabreSwap routines is dominating. This is because we've basically serialized the sabre swap routines and are running multiple seed trials. As an experiment this commit sets the inner SabreSwap routines to run in parallel too. Since the rayon algorithm uses a work stealing algorithm this hopefully shouldn't cause too much extra overhead, especially because the layout trials are quite fast. This ideally means we're just scheduling each sabre swap trial in a big parallel work queue and rayon does the rest of the magic to figure out how to execute things. Initial testing is showing an improvement for large circuits and a more modest improvement for more modest circuits. * Add skip_routing argument to preserve custom user provided routing This commit adds a new argument, skip_routing, to the SabreLayout constructor. The intent of this new option is to enable mixing custom routing_method user arguments with SabreLayout in it's new accelerated mode of operation. In the earlier commits no matter what users specified the preset pass manager construction would use sabreswap for routing as it was run internally as part of layout. This meant doing something like: transpile(qc, backend, routing_method='stochastic') would really run SabreSwap which is clearly not the user intent. To provide the layout benefits with multiple seed trials this new argument allows disabling the application of the routing found. This comes with a runtime penalty because effectively we end up running routing twice and only using one of the results. But for custom user provided methods or plugins this seems like a reasonable tradeoff. * Fix typo in docstring * Update random seed usage in rust code In #9132 we updated the random seed parameters in the rust code for sabre swap to make the seed optional and default to initializing from entropy if it's not specified. This commit updates the usage to account for this change on main. * s/retworkx/rustworkx/g * Add alternate constructor for NLayout from a logic_to_phys vec This commit adds a new constructor method to the NLayout class that builds an NLayout object from just a logic_to_phys Vec. This constructor can be accessed from either rust or python (although it's not as efficient from Python). This is used to simplify some of the SabreLayout rust code that was doing this inline manually. * Move layout embedding into a method This commit moves the code the optimized SabreLayout pass was using to embed the found layout from the Rust code into a method. This will make it easier to refactor later if a more efficient pass manager path is added. * Simplify pass logic and update comments This commit removes an unnecessary else branch in the SabreLayout.run() code to make it slightly easier to read. At the same time some comments are updated to better explain the logic of the code. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Oxidize SabreLayout pass This commit modifies the SabreLayout pass when run without the routing_pass argument to run primarily in Rust. This builds on top of the rust version of SabreSwap previously added in Qiskit#7977, Qiskit#8388, and Qiskit#8572. Internally, when the routing_pass argument is not set SabreLayout will perform the full sabre algorithm both layout selection and final swap mapping in rust and return the selected initial layout, the final layout, the toplogical sorting used to traverse the circuit, and a SwapMap for any swaps inserted. This is then used to build the output circuit in place of running separate layout and routing passes. The preset pass managers are updated to handle the new combined layout and routing mode of operation for SabreLayout. The routing stage to the preset pass managers remains intact, it will just operate as if a perfect layout was selected and skip SabreSwap because the circuit is already matching the connectivity constraints. Besides just operating more quickly because the heavy lifting of the algorithm operates more efficiently in a compiled language, doing this in rust also lets change our parallelization model for running multiple seed in Sabre. Just as in Qiskit#8572 we added support for SabreSwap to run multiple parallel trials with different seeds this commit adds a layout_trials argument to SabreLayout to try multiple seeds in parallel. When this is used it parallelizes at the outer layer for each layout/routing combination and the total minimal swap count seed is used. So for example if you set swap_trials=5 and layout_trails=5 that will run 5 tasks in the threadpool with 5 different seeds for the outer layout run. Inside that every time sabre swap is run (which will be multiple times as part of layout plus the final routing run) it tries 5 different seeds for each execution serially inside that parallel task. This should hopefully further improve the quality of the transpiler output and better match expectations for users who were previously calling transpile() multiple times to emulate this behavior. Implements Qiskit#9090 * Use deepcopy for coupling map copy Previously this PR was using copy() to copy the coupling map before we mutated it to be symmetric (a requirement for the sabre algorithm). However, this modification of the object was leaking out causing test failures. This commit switches it to a deepcopy to ensure there are no shared references (and a comment added to explain it's needed). * Fix failing unitary synthesis tests This PR branch modifies the default behavior of the SabreLayout pass so it is now a transformation pass that computes a layout, applies it, and then performs routing. This means when using sabre layout in a custom pass manager we no longer need to embed a layout after computing the layout. The failing unitary synthesis tests were using a custom pass manager and trying to apply the layout again after SabreLayout already did. This commit just removes this now unecessary steps from the test code. * Add release note * Run BarrierBeforeMeasurement before new SabreLayout Now that the routing stage is integrated into the SabreLayout pass we should be running the BarrierBeforeMeasurement pass prior to layout in the preset pass managers instead of before routing. The goal of the pass is to prevent the routing algorithms for accidentally reusing a qubit after a final measurement which would be invalid by inserting a barrier before the measurements to ensure all qubits are swap mapped prior to adding the measurements during routing. While this might not strictly be necessary (it didn't affect any test output) it feels like best practice to ensure we're doing this prior to potentially routing to prevent issues. * Improve docstrings * Set a fixed number of layout trials in preset pass managers For reproducible results with a fixed seed this commit sets a fixed number of layout_trials for the SabreLayout pass in the preset pass managers. If we did not set a fixed value than the output of the transpiler with a fixed seed will vary based on the number of physical cores that is running the compilation. To start optimization levels 0 and 1 use 5, level 2 uses 10, and level 3 uses 20 which matches the swap_trials argument we used. This is just a starting point, we can adjust these values later if needed. * Update tests for layout changes This commit updates the tests which are checking exact layouts with a fixed seed when running SabreLayout. The changes to SabreLayout breaks exact seed reproducibility from the earlier version of the pass. So we need to update these tests for their new layout assignment from the improved pass. One exception is a test which was trying to assert that transpile() preserves a swap if it's in the basis set. However, the new layout and routing output from SabreLayout for that test was resulting in all the swaps getting optimized away at optimization level 3 (resulting in 13 cx gates instead of ~4 cx gates and 5 swaps before, which would be more efficient on real hardware). So the test was removed and only run at lower optimziation levels. * Set a fixed number of layout trials in SabreLayout tests The dedicated tests for SabreLayout were not running a fixed number of trials. This was causing a different layout to be returned in tests when run across multiple systems as the number of trials defaults to the number of physical CPUs. This commit fixes the trial count to the number of cores on the local system where the layout was updated. This should fix the non-determinism in the tests causing failures in CI and on different local systems. * Run SabreSwap in parallel if only a single layout trial If there is only a single layout trial being run we don't have to worry about trying to do too much work in parallel at once by parallelizing the inner sabre swap execution. This commit updates the threading logic to enable running the inner sabre swap trials in parallel if there is only a single layout trial. * Remove duplicated SabreDAG creation * Correctly apply selected layout on dag nodes This commit corrects a bug in the PR branch that was caused by applying the selected initial layout in a trial to the swapped order node list. This was causing unexpected results when applying the circuit because the intent was to apply it only to the original input not the reversed input. * Remove unnecessary clone from serial layout trials In the case we're evaluating the layout trials serially instead of in a parallel iterator we don't need to clone the dag nodes list. This is because nothing will be modifying it in parallel, so we don't need a thread local copy. Each call to layout_trial() will keep the dag nodes vector intact (see previous commit for fixing this) so it can just be passed by reference if there are no parallel threads involved. * Fix seed setup when no user seed specified This commit fixes an issue prevent seed randomization when no seed is specified. On subsequent uses of a pass SabreLayout would not randomize the seed between runs because it was setting the seed to instance state. This commit fixes this issue by relying on initializing the RNG from entropy each time run() is called if no user specified seed is provided. * Start from trivial layout for routing stage This commit fixes the routing run to run from a trivial layout instead of the initial layout. By the time we do final routing for a trial we've already applied the selected initial layout to the SabreDAG. So the correct layout to use for running final swap mapping is a trivial layout where logical bit 0 is at physical bit 0. Using initial layout twice means we end up mapping more than is needed resulting in incorrect results. * Revert "Correctly apply selected layout on dag nodes" This change was incorrect, the output was already in the correct order and this was causing the behavior it strived to fix. This commit reverts the addition of the extra mem::swap() call to fix things. This reverts commit d98ef6c. * Deduplicate NLayout trivial layout creation This commit deduplicates the trivial layout generation for the NLayout class. Previously there were a few places both in rust and python that sabre layout was manually generating a trivial NLayout object. THis commit adds a static method to the NLayout class that allows both Python and Rust to easily create a new trivial NLayout object instead of manually creating the object. * Fix fixed layout tests after updates Since more recent commits fixed a few bugs in the behavior of the SabreLayout pass, the previously updated fixed layout tests were no longer correct. This commit updates the tests which were now failing because the layout changed again after fixing bugs in the new pass code. * Try nesting parallelism in the sabres Looking at profiles for running the new SabreLayout pass, as expected the runtime of the rust SabreSwap routines is dominating. This is because we've basically serialized the sabre swap routines and are running multiple seed trials. As an experiment this commit sets the inner SabreSwap routines to run in parallel too. Since the rayon algorithm uses a work stealing algorithm this hopefully shouldn't cause too much extra overhead, especially because the layout trials are quite fast. This ideally means we're just scheduling each sabre swap trial in a big parallel work queue and rayon does the rest of the magic to figure out how to execute things. Initial testing is showing an improvement for large circuits and a more modest improvement for more modest circuits. * Add skip_routing argument to preserve custom user provided routing This commit adds a new argument, skip_routing, to the SabreLayout constructor. The intent of this new option is to enable mixing custom routing_method user arguments with SabreLayout in it's new accelerated mode of operation. In the earlier commits no matter what users specified the preset pass manager construction would use sabreswap for routing as it was run internally as part of layout. This meant doing something like: transpile(qc, backend, routing_method='stochastic') would really run SabreSwap which is clearly not the user intent. To provide the layout benefits with multiple seed trials this new argument allows disabling the application of the routing found. This comes with a runtime penalty because effectively we end up running routing twice and only using one of the results. But for custom user provided methods or plugins this seems like a reasonable tradeoff. * Fix typo in docstring * Update random seed usage in rust code In Qiskit#9132 we updated the random seed parameters in the rust code for sabre swap to make the seed optional and default to initializing from entropy if it's not specified. This commit updates the usage to account for this change on main. * s/retworkx/rustworkx/g * Add alternate constructor for NLayout from a logic_to_phys vec This commit adds a new constructor method to the NLayout class that builds an NLayout object from just a logic_to_phys Vec. This constructor can be accessed from either rust or python (although it's not as efficient from Python). This is used to simplify some of the SabreLayout rust code that was doing this inline manually. * Move layout embedding into a method This commit moves the code the optimized SabreLayout pass was using to embed the found layout from the Rust code into a method. This will make it easier to refactor later if a more efficient pass manager path is added. * Simplify pass logic and update comments This commit removes an unnecessary else branch in the SabreLayout.run() code to make it slightly easier to read. At the same time some comments are updated to better explain the logic of the code. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

mtreinish added the performance label Apr 22, 2022

mtreinish added this to the 0.21 milestone Apr 22, 2022

mtreinish requested a review from a team as a code owner April 22, 2022 14:02

mtreinish added 2 commits April 22, 2022 10:42

Make sabre_swap a separate Rust module

8095c24

This commit moves the sabre specific code into a separate rust module. We already were using a separate Python module for the sabre code this just mirrors that in the rust code for better organization.

Fix lint

6f80e93

jakelishman reviewed Apr 22, 2022

View reviewed changes

mtreinish added 11 commits April 22, 2022 12:48

Remove unnecessary parallel iteration

3e0fad6

This commit removes an unecessary parallel iterator over the swap scores to find the minimum and just does it serially. The threading overhead for the parallel iterator is unecessary as it is fairly quick.

Revert change to DECAY_RESET_INTERVAL behavior

b76284e

Avoid Bit._index

eef49df

Merge remote-tracking branch 'origin/main' into RabreRwap

e39181a

Add __str__ definition for DEBUG logs

5ae5572

Cleanup greedy swap path

ccce496

Work with virtual indices win obtain swap

40285e7

Simplify decay reset() method

f3e5599

Fix lint

1b8c082

Fix typo

556b128

kevinhartman reviewed Apr 22, 2022

View reviewed changes

src/nlayout.rs Outdated Show resolved Hide resolved

src/nlayout.rs Outdated Show resolved Hide resolved

src/sabre_swap/swap_scores.rs Outdated Show resolved Hide resolved

mtreinish added the Rust This PR or issue is related to Rust code in the repository label Apr 25, 2022

mtreinish added 5 commits April 26, 2022 07:36

Rename nlayout methods

9640ed1

Update docstrings for SwapScores type

ce0ca51

Merge remote-tracking branch 'origin/main' into RabreRwap

4b608db

Use correct swap method for _undo_operations()

f78547e

Merge remote-tracking branch 'origin/main' into RabreRwap

4bb0ac8

jakelishman self-assigned this May 11, 2022

mtreinish mentioned this pull request May 12, 2022

Fix SabreSwap with classically conditioned gates #8041

Merged

Merge branch 'main' into RabreRwap

3c732bf

mtreinish changed the title ~~Reimplement SabreSwap heuristic scoring in multithreaded Rust~~ Reimplement SabreSwap heuristic scoring in Rust Jun 30, 2022

Use int32 for max default rng seed for windows compat

39c303b

mtreinish mentioned this pull request Jul 8, 2022

Move compose_u3 into Rust #8307

Merged

kevinhartman reviewed Jul 18, 2022

View reviewed changes

src/sabre_swap/edge_list.rs Outdated Show resolved Hide resolved

src/sabre_swap/mod.rs Outdated Show resolved Hide resolved

src/sabre_swap/qubits_decay.rs Outdated Show resolved Hide resolved

mtreinish and others added 4 commits July 18, 2022 15:45

Fix bounds check on custom sequence type's __getitem__

9ef8dfd

Co-authored-by: Kevin Hartman <kevin@hart.mn>

Merge remote-tracking branch 'origin/main' into RabreRwap

235e9a6

Merge branch 'main' into RabreRwap

f90b4d6

mtreinish requested a review from kevinhartman July 19, 2022 15:21

kevinhartman approved these changes Jul 19, 2022

View reviewed changes

kevinhartman added the automerge label Jul 19, 2022

mergify bot merged commit 6dd0d69 into Qiskit:main Jul 19, 2022

mtreinish deleted the RabreRwap branch July 19, 2022 16:49

mtreinish mentioned this pull request Jul 21, 2022

Further oxidize sabre #8388

Merged

2 tasks

mtreinish mentioned this pull request Aug 16, 2022

Use Sabre by default for optimization levels 1 and 2 #8552

Merged

mtreinish mentioned this pull request Nov 10, 2022

Oxidize SabreLayout pass #9116

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement SabreSwap heuristic scoring in Rust #7977

Reimplement SabreSwap heuristic scoring in Rust #7977

mtreinish commented Apr 22, 2022 •

edited

Loading

qiskit-bot commented Apr 22, 2022

coveralls commented Apr 22, 2022 •

edited

Loading

jakelishman left a comment

jakelishman Apr 22, 2022

jakelishman Apr 22, 2022

mtreinish Apr 22, 2022

jakelishman Apr 22, 2022

mtreinish Apr 22, 2022 •

edited

Loading

jakelishman Apr 22, 2022

kevinhartman left a comment

mtreinish commented Jul 1, 2022

mtreinish commented Jul 1, 2022

kevinhartman left a comment

Reimplement SabreSwap heuristic scoring in Rust #7977

Reimplement SabreSwap heuristic scoring in Rust #7977

Conversation

mtreinish commented Apr 22, 2022 • edited Loading

Summary

Details and comments

qiskit-bot commented Apr 22, 2022

coveralls commented Apr 22, 2022 • edited Loading

Pull Request Test Coverage Report for Build 2697565720

💛 - Coveralls

jakelishman left a comment

Choose a reason for hiding this comment

jakelishman Apr 22, 2022

Choose a reason for hiding this comment

jakelishman Apr 22, 2022

Choose a reason for hiding this comment

mtreinish Apr 22, 2022

Choose a reason for hiding this comment

jakelishman Apr 22, 2022

Choose a reason for hiding this comment

mtreinish Apr 22, 2022 • edited Loading

Choose a reason for hiding this comment

jakelishman Apr 22, 2022

Choose a reason for hiding this comment

kevinhartman left a comment

Choose a reason for hiding this comment

mtreinish commented Jul 1, 2022

mtreinish commented Jul 1, 2022

kevinhartman left a comment

Choose a reason for hiding this comment

mtreinish commented Apr 22, 2022 •

edited

Loading

coveralls commented Apr 22, 2022 •

edited

Loading

mtreinish Apr 22, 2022 •

edited

Loading