From 02a193919c66aaa6520144252104fc90ee326634 Mon Sep 17 00:00:00 2001
From: Jake Lishman <jake.lishman@ibm.com>
Date: Wed, 9 Nov 2022 21:54:38 +0000
Subject: [PATCH] Calculate relative, not absolute, scores in SabreSwap (#9012)

* Calculate relative, not absolute, scores in SabreSwap

This is a significant performance improvement for very wide (100q+)
circuits.

The previous `SabreSwap` algorithm would score each candidate swap by
applying the swap to the current layout, iterating through every element
in the front layer and extended sets summing the total distances of the
2q gates, and then undoing the swap.  However, in the front layer, a
given swap can affect at most two 2q operations.  This new form instead
scores each swap by calculating its total distance relative to if the
swap had not been made.  This means that the basic score is formed from
only one or two gates, no matter how many are in the front layer.

This is an algorithmic complexity improvement in the scoring for
volumetric circuits with respect to the number of qubits.  Such a
circuit with `n` qubits has `n / 2` gates in its front layer at all
times, and so (assuming a coupling map that expands by a constant
connectivity factor per qubit, like heavy hex) `k * n` swaps to score.
This means that choosing the best swap has quadratic complexity with the
original Sabre scoring algorithm.  With this new algorithm, the score
for a given swap is calculated in constant time, so choosing the best is
instead linear.  In practice, I did not see all these improvements at
the scales I tested at, but I did see significant improvements - a 1081q
heavy-hex quantum-volume circuit at depth 5 was swap-mapped on my
machine in 25s with this commit compared to 100s before it.

The principal change is the structs `FrontLayer` and `ExtendedSet`,
which combine constant-time hash-set insertions, lookups and removals
with vectors to enable constant-time lookup of the affected qubits.
`FrontLayer` now only ever holds currently unroutable 2q operations;
routable operations are immediately placed into the output structures
as soon as a new 2q gate is routed.  This routing is also done by
exploiting that a swap can only affect two previously unroutable gates
in the front layer; we just walk forwards along the outbound edges from
those two nodes, adding any operations that become routable.  This
avoids another linear scan through the whole front layer after each
swap, although in practice this has less of a speed-up effect, because
it already wasn't quadratic.

In theory, this change does not affect how any swap is scored relative
to any other with the same front layer and extended set (though the
scores used in the comparison do change).  In order to precisely match
the current implementation (and ensure reproducibility from a given
seed), this tracks the insertion order of nodes into the front layer,
including after removals.

This commit completely modifies the internals of the Rust components of
Sabre, although the actual algorithm is largely unchanged, aside from
the scoring difference.  Various parts of the implementation do change
for efficiency, though.

This commit maintains RNG compatibility with the previous Rust
implementation in most cases.  It is possible in some circuits for
floating-point differences to cause different output, when several swaps
are at the minimum score, but plus/minus 1ULP.  This happens in both the
old and new forms of the implementation, but _which_ of the minimal
swaps get the minus-1ULP score varies between them, and consequently
affects the swap choice.  In fairly extensive testing, this appears to
be the only mechanism for differences; I've verified that the
release-valve mechanism and predecessor-requirement tracking function
identically to before.  The resultant scores - relative for "basic" and
"lookahead", absolute for "decay" - are in practice within 2ULP of the
old algorithm's.

In maintaining RNG compatibility, this commit leaves several further
speed-ups on the table.  There is additional memory usage and tracking
to maintain some required iteration orders, and some reordering checks
that are not strictly necessary any more.  Further, the sorting stages
at the ends of the swap-choosing functions (to maintain repeatability)
can be made redundant now, since some hash-set iteration (which is
effectively an uncontrolled randomisation per run) is no longer
required.  These will be addressed in follow-ups.

* Fix lint

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
---
 .../transpiler/passes/routing/sabre_swap.py   |   3 +-
 src/sabre_swap/layer.rs                       | 295 +++++++++
 src/sabre_swap/mod.rs                         | 593 +++++++++---------
 src/sabre_swap/sabre_dag.rs                   |  36 +-
 test/python/transpiler/test_sabre_swap.py     |  32 +
 5 files changed, 657 insertions(+), 302 deletions(-)
 create mode 100644 src/sabre_swap/layer.rs

diff --git a/qiskit/transpiler/passes/routing/sabre_swap.py b/qiskit/transpiler/passes/routing/sabre_swap.py
index 64fa06282866..8f1bdf4b676b 100644
--- a/qiskit/transpiler/passes/routing/sabre_swap.py
+++ b/qiskit/transpiler/passes/routing/sabre_swap.py
@@ -223,8 +223,7 @@ def run(self, dag):
                     cargs,
                 )
             )
-        front_layer = np.asarray([x._node_id for x in dag.front_layer()], dtype=np.uintp)
-        sabre_dag = SabreDAG(len(dag.qubits), len(dag.clbits), dag_list, front_layer)
+        sabre_dag = SabreDAG(len(dag.qubits), len(dag.clbits), dag_list)
         swap_map, gate_order = build_swap_map(
             len(dag.qubits),
             sabre_dag,
diff --git a/src/sabre_swap/layer.rs b/src/sabre_swap/layer.rs
new file mode 100644
index 000000000000..ccde01159bc0
--- /dev/null
+++ b/src/sabre_swap/layer.rs
@@ -0,0 +1,295 @@
+// This code is part of Qiskit.
+//
+// (C) Copyright IBM 2022
+//
+// This code is licensed under the Apache License, Version 2.0. You may
+// obtain a copy of this license in the LICENSE.txt file in the root directory
+// of this source tree or at http://www.apache.org/licenses/LICENSE-2.0.
+//
+// Any modifications or derivative works of this code must retain this
+// copyright notice, and modified files need to carry a notice indicating
+// that they have been altered from the originals.
+
+use hashbrown::HashMap;
+use ndarray::prelude::*;
+use retworkx_core::petgraph::prelude::*;
+
+use crate::nlayout::NLayout;
+
+/// A container for the current non-routable parts of the front layer.  This only ever holds
+/// two-qubit gates; the only reason a 0q- or 1q operation can be unroutable is because it has an
+/// unsatisfied 2q predecessor, which disqualifies it from being in the front layer.
+pub struct FrontLayer {
+    /// Map of the (index to the) node to the qubits it acts on.
+    nodes: HashMap<NodeIndex, [usize; 2]>,
+    /// Map of each qubit to the node that acts on it and the other qubit that node acts on, if this
+    /// qubit is active (otherwise `None`).
+    qubits: Vec<Option<(NodeIndex, usize)>>,
+    /// Tracking the insertion order of nodes, so iteration can always go through them in a
+    /// deterministic order.  This is important for reproducibility from a set seed - when building
+    /// up the extended set with a fixed, finite size, the iteration order through the nodes of the
+    /// front layer is important.  We need to maintain the insertion order even with removals from
+    /// the layer.
+    iteration_order: Vec<Option<NodeIndex>>,
+    /// The index of the first populated entry in the `iteration_order`.  If the iteration order is
+    /// empty, this will be 0.
+    iteration_start: usize,
+    /// The index one past the last populated entry in the `iteration_order`.  If the iteration
+    /// order is empty, this will be 0.
+    iteration_end: usize,
+}
+
+impl FrontLayer {
+    pub fn new(num_qubits: usize) -> Self {
+        FrontLayer {
+            // This is the maximum capacity of the front layer, since each qubit must be one of a
+            // pair, and can only have one gate in the layer.
+            nodes: HashMap::with_capacity(num_qubits / 2),
+            qubits: vec![None; num_qubits],
+            iteration_order: vec![None; num_qubits],
+            iteration_start: 0,
+            iteration_end: 0,
+        }
+    }
+
+    /// Add a node into the front layer, with the two qubits it operates on.  This usually has
+    /// constant-time complexity, except if the iteration-order buffer is full.
+    pub fn insert(&mut self, index: NodeIndex, qubits: [usize; 2]) {
+        let [a, b] = qubits;
+        self.qubits[a] = Some((index, b));
+        self.qubits[b] = Some((index, a));
+        self.nodes.insert(index, qubits);
+
+        self.iteration_order[self.iteration_end] = Some(index);
+        self.iteration_end += 1;
+        if self.iteration_end == self.iteration_order.len() {
+            // Condense items back to the start of the vector.
+            let mut ptr = 0;
+            for i in self.iteration_start..self.iteration_end {
+                if let Some(value) = self.iteration_order[i] {
+                    self.iteration_order[i] = None;
+                    self.iteration_order[ptr] = Some(value);
+                    ptr += 1;
+                }
+            }
+            self.iteration_start = 0;
+            self.iteration_end = ptr;
+        }
+    }
+
+    /// Remove a node from the front layer.
+    pub fn remove(&mut self, index: &NodeIndex) {
+        let [q0, q1] = self.nodes.remove(index).unwrap();
+        self.qubits[q0] = None;
+        self.qubits[q1] = None;
+
+        // If the element was at the start of the iteration order, advance the pointer.
+        match self.iteration_order[self.iteration_start] {
+            Some(a) if a == *index => {
+                self.iteration_order[self.iteration_start] = None;
+                if self.iteration_start + 1 == self.iteration_end {
+                    self.iteration_start = 0;
+                    self.iteration_end = 0;
+                }
+                while self.iteration_start < self.iteration_end
+                    && self.iteration_order[self.iteration_start].is_none()
+                {
+                    self.iteration_start += 1;
+                }
+            }
+            _ => (),
+        }
+        // Search through and remove the element.  We leave a gap and preserve the insertion order.
+        for i in (self.iteration_start + 1)..self.iteration_end {
+            match self.iteration_order[i] {
+                Some(a) if a == *index => {
+                    self.iteration_order[i] = None;
+                    break;
+                }
+                _ => (),
+            }
+        }
+    }
+
+    /// Query whether a qubit has an active node.
+    #[inline]
+    pub fn is_active(&self, qubit: usize) -> bool {
+        self.qubits[qubit].is_some()
+    }
+
+    /// Calculate the score _difference_ caused by this swap, compared to not making the swap.
+    #[inline]
+    pub fn score(&self, swap: [usize; 2], layout: &NLayout, dist: &ArrayView2<f64>) -> f64 {
+        if self.is_empty() {
+            return 0.0;
+        }
+        // At most there can be two affected gates in the front layer (one on each qubit in the
+        // swap), since any gate whose closest path passes through the swapped qubit link has its
+        // "virtual-qubit path" order changed, but not the total weight.  In theory, we should
+        // never consider the same gate in both `if let` branches, because if we did, the gate would
+        // already be routable.  It doesn't matter, though, because the two distances would be
+        // equal anyway, so not affect the score.
+        let [a, b] = swap;
+        let mut total = 0.0;
+        if let Some((_, c)) = self.qubits[a] {
+            let p_c = layout.logic_to_phys[c];
+            total += dist[[layout.logic_to_phys[b], p_c]] - dist[[layout.logic_to_phys[a], p_c]]
+        }
+        if let Some((_, c)) = self.qubits[b] {
+            let p_c = layout.logic_to_phys[c];
+            total += dist[[layout.logic_to_phys[a], p_c]] - dist[[layout.logic_to_phys[b], p_c]]
+        }
+        total / self.nodes.len() as f64
+    }
+
+    /// Calculate the total absolute of the current front layer on the given layer.
+    pub fn total_score(&self, layout: &NLayout, dist: &ArrayView2<f64>) -> f64 {
+        if self.is_empty() {
+            return 0.0;
+        }
+        self.iter()
+            .map(|(_, &[l_a, l_b])| dist[[layout.logic_to_phys[l_a], layout.logic_to_phys[l_b]]])
+            .sum::<f64>()
+            / self.nodes.len() as f64
+    }
+
+    /// Populate a of nodes that would be routable if the given swap was applied to a layout.  This
+    /// mutates `routable` to avoid heap allocations in the main logic loop.
+    pub fn routable_after(
+        &self,
+        routable: &mut Vec<NodeIndex>,
+        swap: &[usize; 2],
+        layout: &NLayout,
+        coupling: &DiGraph<(), ()>,
+    ) {
+        let [a, b] = *swap;
+        if let Some((node, c)) = self.qubits[a] {
+            if coupling.contains_edge(
+                NodeIndex::new(layout.logic_to_phys[b]),
+                NodeIndex::new(layout.logic_to_phys[c]),
+            ) {
+                routable.push(node);
+            }
+        }
+        if let Some((node, c)) = self.qubits[b] {
+            if coupling.contains_edge(
+                NodeIndex::new(layout.logic_to_phys[a]),
+                NodeIndex::new(layout.logic_to_phys[c]),
+            ) {
+                routable.push(node);
+            }
+        }
+    }
+
+    /// True if there are no nodes in the current layer.
+    #[inline]
+    pub fn is_empty(&self) -> bool {
+        self.nodes.is_empty()
+    }
+
+    /// Iterator over the nodes and the pair of qubits they act on.
+    pub fn iter(&self) -> impl Iterator<Item = (&NodeIndex, &[usize; 2])> {
+        (&self.iteration_order)[self.iteration_start..self.iteration_end]
+            .iter()
+            .filter_map(move |node_opt| node_opt.as_ref().map(|node| (node, &self.nodes[node])))
+    }
+
+    /// Iterator over the nodes.
+    pub fn iter_nodes(&self) -> impl Iterator<Item = &NodeIndex> {
+        (&self.iteration_order)[self.iteration_start..self.iteration_end]
+            .iter()
+            .filter_map(|node_opt| node_opt.as_ref())
+    }
+
+    /// Iterator over the qubits that have active nodes on them.
+    pub fn iter_active(&self) -> impl Iterator<Item = &usize> {
+        (&self.iteration_order)[self.iteration_start..self.iteration_end]
+            .iter()
+            .filter_map(move |node_opt| node_opt.as_ref().map(|node| &self.nodes[node]))
+            .flatten()
+    }
+}
+
+/// This is largely similar to the `FrontLayer` struct, but does not need to track the insertion
+/// order of the nodes, and can have more than one node on each active qubit.  This does not have a
+/// `remove` method (and its data structures aren't optimised for fast removal), since the extended
+/// set is built from scratch each time a new gate is routed.
+pub struct ExtendedSet {
+    nodes: HashMap<NodeIndex, [usize; 2]>,
+    qubits: Vec<Vec<usize>>,
+}
+
+impl ExtendedSet {
+    pub fn new(num_qubits: usize, max_size: usize) -> Self {
+        ExtendedSet {
+            nodes: HashMap::with_capacity(max_size),
+            qubits: vec![Vec::new(); num_qubits],
+        }
+    }
+
+    /// Add a node and its active qubits to the extended set.
+    pub fn insert(&mut self, index: NodeIndex, qubits: &[usize; 2]) -> bool {
+        let [a, b] = *qubits;
+        if self.nodes.insert(index, *qubits).is_none() {
+            self.qubits[a].push(b);
+            self.qubits[b].push(a);
+            true
+        } else {
+            false
+        }
+    }
+
+    /// Calculate the score of applying the given swap, relative to not applying it.
+    pub fn score(&self, swap: [usize; 2], layout: &NLayout, dist: &ArrayView2<f64>) -> f64 {
+        if self.nodes.is_empty() {
+            return 0.0;
+        }
+        let [l_a, l_b] = swap;
+        let p_a = layout.logic_to_phys[l_a];
+        let p_b = layout.logic_to_phys[l_b];
+        let mut total = 0.0;
+        for &l_other in self.qubits[l_a].iter() {
+            // If the other qubit is also active then the score won't have changed, but since the
+            // distance is absolute, we'd double count rather than ignore if we didn't skip it.
+            if l_other == l_b {
+                continue;
+            }
+            let p_other = layout.logic_to_phys[l_other];
+            total += dist[[p_b, p_other]] - dist[[p_a, p_other]];
+        }
+        for &l_other in self.qubits[l_b].iter() {
+            if l_other == l_a {
+                continue;
+            }
+            let p_other = layout.logic_to_phys[l_other];
+            total += dist[[p_a, p_other]] - dist[[p_b, p_other]];
+        }
+        total / self.nodes.len() as f64
+    }
+
+    /// Calculate the total absolute score of this set of nodes over the given layout.
+    pub fn total_score(&self, layout: &NLayout, dist: &ArrayView2<f64>) -> f64 {
+        if self.nodes.is_empty() {
+            return 0.0;
+        }
+        self.nodes
+            .iter()
+            .map(|(_, &[l_a, l_b])| dist[[layout.logic_to_phys[l_a], layout.logic_to_phys[l_b]]])
+            .sum::<f64>()
+            / self.nodes.len() as f64
+    }
+
+    /// Clear all nodes from the extended set.
+    pub fn clear(&mut self) {
+        for &[a, b] in self.nodes.values() {
+            self.qubits[a].clear();
+            self.qubits[b].clear();
+        }
+        self.nodes.clear()
+    }
+
+    /// Number of nodes in the set.
+    pub fn len(&self) -> usize {
+        self.nodes.len()
+    }
+}
diff --git a/src/sabre_swap/mod.rs b/src/sabre_swap/mod.rs
index 1798a0dd0e39..f25e7acb06a6 100644
--- a/src/sabre_swap/mod.rs
+++ b/src/sabre_swap/mod.rs
@@ -12,13 +12,14 @@
 
 #![allow(clippy::too_many_arguments)]
 
+pub mod layer;
 pub mod neighbor_table;
 pub mod sabre_dag;
 pub mod swap_map;
 
 use std::cmp::Ordering;
 
-use hashbrown::{HashMap, HashSet};
+use hashbrown::HashMap;
 use ndarray::prelude::*;
 use numpy::IntoPyArray;
 use numpy::PyReadonlyArray2;
@@ -37,6 +38,7 @@ use retworkx_core::shortest_path::dijkstra;
 use crate::getenv_use_multiple_threads;
 use crate::nlayout::NLayout;
 
+use layer::{ExtendedSet, FrontLayer};
 use neighbor_table::NeighborTable;
 use sabre_dag::SabreDAG;
 use swap_map::SwapMap;
@@ -65,72 +67,62 @@ struct TrialResult {
 /// on hardware and the physical qubits in that neighborhood. Every SWAP
 /// on virtual qubits that corresponds to one of those physical couplings
 /// is a candidate SWAP.
-///
-/// Candidate swaps are sorted so SWAP(i,j) and SWAP(j,i) are not duplicated.
-fn obtain_swaps(
-    front_layer: &[[usize; 2]],
-    neighbors: &NeighborTable,
-    layout: &NLayout,
-) -> HashSet<[usize; 2]> {
-    // This will likely under allocate as it's a function of the number of
-    // neighbors for the qubits in the layer too, but this is basically a
-    // minimum allocation assuming each qubit has only 1 unique neighbor
-    let mut candidate_swaps: HashSet<[usize; 2]> = HashSet::with_capacity(2 * front_layer.len());
-    for node in front_layer {
-        for v in node {
-            let physical = layout.logic_to_phys[*v];
-            for neighbor in &neighbors.neighbors[physical] {
-                let virtual_neighbor = layout.phys_to_logic[*neighbor];
-                let swap: [usize; 2] = if &virtual_neighbor > v {
-                    [*v, virtual_neighbor]
+fn obtain_swaps<'a>(
+    front_layer: &'a FrontLayer,
+    neighbors: &'a NeighborTable,
+    layout: &'a NLayout,
+) -> impl Iterator<Item = [usize; 2]> + 'a {
+    front_layer.iter_active().flat_map(move |&v| {
+        neighbors.neighbors[layout.logic_to_phys[v]]
+            .iter()
+            .filter_map(move |&neighbor| {
+                let virtual_neighbor = layout.phys_to_logic[neighbor];
+                if virtual_neighbor > v || !front_layer.is_active(virtual_neighbor) {
+                    // This normalisation line is only necessary to ensure equal output in the
+                    // swap-sorting stage later to the previous version of this algorithm; it can be
+                    // removed when we break that matching.  It isn't needed for determinism.
+                    if v < virtual_neighbor {
+                        Some([v, virtual_neighbor])
+                    } else {
+                        Some([virtual_neighbor, v])
+                    }
                 } else {
-                    [virtual_neighbor, *v]
-                };
-                candidate_swaps.insert(swap);
-            }
-        }
-    }
-    candidate_swaps
+                    None
+                }
+            })
+    })
 }
 
-fn obtain_extended_set(
+/// Fill the given `extended_set` with the next nodes that would be reachable after the front layer
+/// (and themselves).  This uses `required_predecessors` as scratch space for efficiency, but
+/// returns it to the same state as the input on return.
+fn populate_extended_set(
+    extended_set: &mut ExtendedSet,
     dag: &SabreDAG,
-    front_layer: &[NodeIndex],
+    front_layer: &FrontLayer,
     required_predecessors: &mut [u32],
-) -> Vec<[usize; 2]> {
-    let mut extended_set: Vec<[usize; 2]> = Vec::new();
-    let mut decremented: Vec<usize> = Vec::new();
-    let mut tmp_front_layer: Vec<NodeIndex> = front_layer.to_vec();
-    let mut done: bool = false;
-    while !tmp_front_layer.is_empty() && !done {
-        let mut new_tmp_front_layer = Vec::new();
-        for node in tmp_front_layer {
-            for edge in dag.dag.edges(node) {
-                let successor_index = edge.target();
-                let successor = successor_index.index();
-                decremented.push(successor);
-                required_predecessors[successor] -= 1;
-                if required_predecessors[successor] == 0 {
-                    new_tmp_front_layer.push(successor_index);
-                    let node_weight = dag.dag.node_weight(successor_index).unwrap();
-                    let qargs = &node_weight.1;
-                    if qargs.len() == 2 {
-                        let extended_set_edges: [usize; 2] = [qargs[0], qargs[1]];
-                        extended_set.push(extended_set_edges);
-                    }
+) {
+    let mut to_visit = front_layer.iter_nodes().copied().collect::<Vec<_>>();
+    let mut decremented: HashMap<usize, u32> = HashMap::new();
+    let mut i = 0;
+    while i < to_visit.len() && extended_set.len() < EXTENDED_SET_SIZE {
+        for edge in dag.dag.edges_directed(to_visit[i], Direction::Outgoing) {
+            let successor_node = edge.target();
+            let successor_index = successor_node.index();
+            *decremented.entry(successor_index).or_insert(0) += 1;
+            required_predecessors[successor_index] -= 1;
+            if required_predecessors[successor_index] == 0 {
+                if let [a, b] = dag.dag[successor_node].1[..] {
+                    extended_set.insert(successor_node, &[a, b]);
                 }
-            }
-            if extended_set.len() >= EXTENDED_SET_SIZE {
-                done = true;
-                break;
+                to_visit.push(successor_node);
             }
         }
-        tmp_front_layer = new_tmp_front_layer;
+        i += 1;
     }
-    for node in decremented {
-        required_predecessors[node] += 1;
+    for (node, amount) in decremented.iter() {
+        required_predecessors[*node] += *amount;
     }
-    extended_set
 }
 
 fn cmap_from_neighor_table(neighbor_table: &NeighborTable) -> DiGraph<(), ()> {
@@ -162,7 +154,7 @@ pub fn build_swap_map(
     layout: &mut NLayout,
     num_trials: usize,
 ) -> (SwapMap, PyObject) {
-    let run_in_parallel = getenv_use_multiple_threads();
+    let run_in_parallel = getenv_use_multiple_threads() && num_trials > 1;
     let dist = distance_matrix.as_array();
     let coupling_graph: DiGraph<(), ()> = cmap_from_neighor_table(neighbor_table);
     let outer_rng = Pcg64Mcg::seed_from_u64(seed);
@@ -235,12 +227,11 @@ fn swap_map_trial(
     mut layout: NLayout,
 ) -> TrialResult {
     let max_iterations_without_progress = 10 * neighbor_table.neighbors.len();
-    let mut gate_order: Vec<usize> = Vec::with_capacity(dag.dag.node_count());
-    let mut ops_since_progress: Vec<[usize; 2]> = Vec::new();
     let mut out_map: HashMap<usize, Vec<[usize; 2]>> = HashMap::new();
-    let mut front_layer: Vec<NodeIndex> = dag.first_layer.clone();
+    let mut gate_order = Vec::with_capacity(dag.dag.node_count());
+    let mut front_layer = FrontLayer::new(num_qubits);
+    let mut extended_set = ExtendedSet::new(num_qubits, EXTENDED_SET_SIZE);
     let mut required_predecessors: Vec<u32> = vec![0; dag.dag.node_count()];
-    let mut extended_set: Option<Vec<[usize; 2]>> = None;
     let mut num_search_steps: u8 = 0;
     let mut qubits_decay: Vec<f64> = vec![1.; num_qubits];
     let mut rng = Pcg64Mcg::seed_from_u64(seed);
@@ -250,158 +241,81 @@ fn swap_map_trial(
             required_predecessors[edge.target().index()] += 1;
         }
     }
+    route_reachable_nodes(
+        &dag.first_layer,
+        dag,
+        &layout,
+        coupling_graph,
+        &mut gate_order,
+        &mut front_layer,
+        &mut required_predecessors,
+    );
+    populate_extended_set(
+        &mut extended_set,
+        dag,
+        &front_layer,
+        &mut required_predecessors,
+    );
+    // Main logic loop; the front layer only becomes empty when all nodes have been routed.  At
+    // each iteration of this loop, we route either one or two gates.
+    let mut routable_nodes = Vec::<NodeIndex>::with_capacity(2);
     while !front_layer.is_empty() {
-        let mut execute_gate_list: Vec<NodeIndex> = Vec::new();
-        // Remove as many immediately applicable gates as possible
-        let mut new_front_layer: Vec<NodeIndex> = Vec::new();
-        for node in front_layer {
-            let node_weight = dag.dag.node_weight(node).unwrap();
-            let qargs = &node_weight.1;
-            if qargs.len() == 2 {
-                let physical_qargs: [usize; 2] = [
-                    layout.logic_to_phys[qargs[0]],
-                    layout.logic_to_phys[qargs[1]],
-                ];
-                if coupling_graph
-                    .find_edge(
-                        NodeIndex::new(physical_qargs[0]),
-                        NodeIndex::new(physical_qargs[1]),
-                    )
-                    .is_none()
-                {
-                    new_front_layer.push(node);
-                } else {
-                    execute_gate_list.push(node);
-                }
+        let mut current_swaps: Vec<[usize; 2]> = Vec::new();
+        // Swap-mapping loop.  This is the main part of the algorithm, which we repeat until we
+        // either successfully route a node, or exceed the maximum number of attempts.
+        while routable_nodes.is_empty() && current_swaps.len() <= max_iterations_without_progress {
+            let best_swap = choose_best_swap(
+                &front_layer,
+                &extended_set,
+                &layout,
+                neighbor_table,
+                dist,
+                &qubits_decay,
+                heuristic,
+                &mut rng,
+            );
+            front_layer.routable_after(&mut routable_nodes, &best_swap, &layout, coupling_graph);
+            current_swaps.push(best_swap);
+            layout.swap_logical(best_swap[0], best_swap[1]);
+            num_search_steps += 1;
+            if num_search_steps >= DECAY_RESET_INTERVAL {
+                qubits_decay.fill(1.);
+                num_search_steps = 0;
             } else {
-                execute_gate_list.push(node);
+                qubits_decay[best_swap[0]] += DECAY_RATE;
+                qubits_decay[best_swap[1]] += DECAY_RATE;
             }
         }
-        front_layer = new_front_layer.clone();
-
-        // Backtrack to the last time we made progress, then greedily insert swaps to route
-        // the gate with the smallest distance between its arguments.  This is f block a release
-        // valve for the algorithm to avoid infinite loops only, and should generally not
-        // come into play for most circuits.
-        if execute_gate_list.is_empty()
-            && ops_since_progress.len() > max_iterations_without_progress
-        {
-            // If we're stuck in a loop without making progress first undo swaps:
-            ops_since_progress
-                .drain(..)
-                .rev()
-                .for_each(|swap| layout.swap_logical(swap[0], swap[1]));
-            // Then pick the  closest pair in the current layer
-            let target_qubits = front_layer
-                .iter()
-                .map(|n| {
-                    let node_weight = dag.dag.node_weight(*n).unwrap();
-                    let qargs = &node_weight.1;
-                    [qargs[0], qargs[1]]
-                })
-                .min_by(|qargs_a, qargs_b| {
-                    let dist_a = dist[[
-                        layout.logic_to_phys[qargs_a[0]],
-                        layout.logic_to_phys[qargs_a[1]],
-                    ]];
-                    let dist_b = dist[[
-                        layout.logic_to_phys[qargs_b[0]],
-                        layout.logic_to_phys[qargs_b[1]],
-                    ]];
-                    dist_a.partial_cmp(&dist_b).unwrap_or(Ordering::Equal)
-                })
-                .unwrap();
-            // find Shortest path between target qubits
-            let mut shortest_paths: DictMap<NodeIndex, Vec<NodeIndex>> = DictMap::new();
-            let u = layout.logic_to_phys[target_qubits[0]];
-            let v = layout.logic_to_phys[target_qubits[1]];
-            (dijkstra(
-                &coupling_graph,
-                NodeIndex::<u32>::new(u),
-                Some(NodeIndex::<u32>::new(v)),
-                |_| Ok(1.),
-                Some(&mut shortest_paths),
-            ) as PyResult<Vec<Option<f64>>>)
-                .unwrap();
-            let shortest_path: Vec<usize> = shortest_paths
-                .get(&NodeIndex::new(v))
-                .unwrap()
-                .iter()
-                .map(|n| n.index())
-                .collect();
-            // Insert greedy swaps along that shortest path
-            let split: usize = shortest_path.len() / 2;
-            let forwards = &shortest_path[1..split];
-            let backwards = &shortest_path[split..shortest_path.len() - 1];
-            let mut greedy_swaps: Vec<[usize; 2]> = Vec::with_capacity(split);
-            for swap in forwards {
-                let logical_swap_bit = layout.phys_to_logic[*swap];
-                greedy_swaps.push([target_qubits[0], logical_swap_bit]);
-                layout.swap_logical(target_qubits[0], logical_swap_bit);
-            }
-            backwards.iter().rev().for_each(|swap| {
-                let logical_swap_bit = layout.phys_to_logic[*swap];
-                greedy_swaps.push([target_qubits[1], logical_swap_bit]);
-                layout.swap_logical(target_qubits[1], logical_swap_bit);
-            });
-            ops_since_progress = greedy_swaps;
-            continue;
-        }
-        if !execute_gate_list.is_empty() {
-            for node in execute_gate_list {
-                let node_weight = dag.dag.node_weight(node).unwrap();
-                gate_order.push(node_weight.0);
-                let out_swaps: Vec<[usize; 2]> = ops_since_progress.drain(..).collect();
-                if !out_swaps.is_empty() {
-                    out_map.insert(dag.dag.node_weight(node).unwrap().0, out_swaps);
-                }
-                for edge in dag.dag.edges(node) {
-                    let successor = edge.target().index();
-                    required_predecessors[successor] -= 1;
-                    if required_predecessors[successor] == 0 {
-                        front_layer.push(edge.target());
-                    }
-                }
+        // If we exceeded the number of allowed attempts without successfully routing a node, we
+        // reset back to the state we were in last time we routed a node, then find the node in the
+        // front layer whose qubits are the closest in the coupling map, and greedily insert swaps
+        // to make the node routable.  We could clone the layout each time we route a gate, but
+        // this path is only an escape mechansim for the algorithm getting stuck, so it should
+        // ideally never be taken, and it doesn't matter if it's not the speediest---it's better to
+        // keep the other path faster.
+        if routable_nodes.is_empty() {
+            undo_swaps(&mut current_swaps, &mut layout);
+            let (node, qubits) = closest_operation(&front_layer, &layout, dist);
+            swaps_to_route(&mut current_swaps, &qubits, &layout, coupling_graph);
+            for &[a, b] in current_swaps.iter() {
+                layout.swap_logical(a, b);
             }
-            qubits_decay.fill_with(|| 1.);
-            extended_set = None;
-            continue;
+            routable_nodes.push(node);
         }
-        let first_layer: Vec<[usize; 2]> = front_layer
-            .iter()
-            .map(|n| {
-                let node_weight = dag.dag.node_weight(*n).unwrap();
-                let qargs = &node_weight.1;
-                [qargs[0], qargs[1]]
-            })
-            .collect();
-        if extended_set.is_none() {
-            extended_set = Some(obtain_extended_set(
-                dag,
-                &front_layer,
-                &mut required_predecessors,
-            ));
-        }
-
-        let best_swap = sabre_score_heuristic(
-            &first_layer,
-            &mut layout,
-            neighbor_table,
-            extended_set.as_ref().unwrap(),
-            dist,
-            &qubits_decay,
-            heuristic,
-            &mut rng,
+        update_route(
+            &routable_nodes,
+            current_swaps,
+            dag,
+            &layout,
+            coupling_graph,
+            &mut gate_order,
+            &mut out_map,
+            &mut front_layer,
+            &mut extended_set,
+            &mut required_predecessors,
         );
-        num_search_steps += 1;
-        if num_search_steps >= DECAY_RESET_INTERVAL {
-            qubits_decay.fill_with(|| 1.);
-            num_search_steps = 0;
-        } else {
-            qubits_decay[best_swap[0]] += DECAY_RATE;
-            qubits_decay[best_swap[1]] += DECAY_RATE;
-        }
-        ops_since_progress.push(best_swap);
+        qubits_decay.fill(1.);
+        routable_nodes.clear();
     }
     TrialResult {
         out_map,
@@ -410,99 +324,216 @@ fn swap_map_trial(
     }
 }
 
-fn sabre_score_heuristic(
-    layer: &[[usize; 2]],
-    layout: &mut NLayout,
-    neighbor_table: &NeighborTable,
-    extended_set: &[[usize; 2]],
-    dist: &ArrayView2<f64>,
-    qubits_decay: &[f64],
-    heuristic: &Heuristic,
-    rng: &mut Pcg64Mcg,
-) -> [usize; 2] {
-    // Run in parallel only if we're not already in a multiprocessing context
-    // unless force threads is set.
-    let candidate_swaps = obtain_swaps(layer, neighbor_table, layout);
-    let mut min_score = f64::MAX;
-    let mut best_swaps: Vec<[usize; 2]> = Vec::new();
-    for swap_qubits in candidate_swaps {
-        layout.swap_logical(swap_qubits[0], swap_qubits[1]);
-        let score = score_heuristic(
-            heuristic,
-            layer,
-            extended_set,
-            layout,
-            &swap_qubits,
-            dist,
-            qubits_decay,
-        );
-        if score < min_score {
-            min_score = score;
-            best_swaps.clear();
-            best_swaps.push(swap_qubits);
-        } else if score == min_score {
-            best_swaps.push(swap_qubits);
+/// Update the system state as the given `nodes` are added to the routing order, preceded by the
+/// given `swaps`.  This involves updating the output values `gate_order` and `out_map`, but also
+/// the tracking objects `front_layer`, `extended_set` and `required_predecessors` by removing the
+/// routed nodes and adding any now-reachable ones.
+fn update_route(
+    nodes: &[NodeIndex],
+    swaps: Vec<[usize; 2]>,
+    dag: &SabreDAG,
+    layout: &NLayout,
+    coupling: &DiGraph<(), ()>,
+    gate_order: &mut Vec<usize>,
+    out_map: &mut HashMap<usize, Vec<[usize; 2]>>,
+    front_layer: &mut FrontLayer,
+    extended_set: &mut ExtendedSet,
+    required_predecessors: &mut [u32],
+) {
+    // First node gets the swaps attached.  We don't add to the `gate_order` here because
+    // `route_reachable_nodes` is responsible for that part.
+    let py_node = dag.dag[nodes[0]].0;
+    out_map.insert(py_node, swaps);
+    for node in nodes {
+        front_layer.remove(node);
+    }
+    route_reachable_nodes(
+        nodes,
+        dag,
+        layout,
+        coupling,
+        gate_order,
+        front_layer,
+        required_predecessors,
+    );
+    // Ideally we'd know how to mutate the extended set directly, but since its limited size ties
+    // its construction strongly to the iteration order through the front layer, it's not easy to
+    // do better than just emptying it and rebuilding.
+    extended_set.clear();
+    populate_extended_set(extended_set, dag, front_layer, required_predecessors);
+}
+
+/// Search forwards in the `dag` from all the nodes in `to_visit`, adding them to the `gate_order`
+/// or the current `front_layer` as appropriate, and continue inspecting gates until there is
+/// nothing further with no required predecessors.
+///
+/// The nodes in `to_visit` should all already have no further required predecessors.
+fn route_reachable_nodes(
+    to_visit: &[NodeIndex],
+    dag: &SabreDAG,
+    layout: &NLayout,
+    coupling: &DiGraph<(), ()>,
+    gate_order: &mut Vec<usize>,
+    front_layer: &mut FrontLayer,
+    required_predecessors: &mut [u32],
+) {
+    let mut to_visit = to_visit.to_vec();
+    let mut i = 0;
+    // Iterate through `to_visit`, except we often push new nodes onto the end of it.
+    while i < to_visit.len() {
+        let node = to_visit[i];
+        i += 1;
+        let (py_node, qubits) = &dag.dag[node];
+        match qubits[..] {
+            [a, b]
+                if !coupling.contains_edge(
+                    NodeIndex::new(layout.logic_to_phys[a]),
+                    NodeIndex::new(layout.logic_to_phys[b]),
+                ) =>
+            {
+                front_layer.insert(node, [a, b]);
+            }
+            _ => {
+                gate_order.push(*py_node);
+                for edge in dag.dag.edges_directed(node, Direction::Outgoing) {
+                    let successor_node = edge.target();
+                    let successor_index = successor_node.index();
+                    required_predecessors[successor_index] -= 1;
+                    if required_predecessors[successor_index] == 0 {
+                        to_visit.push(successor_node);
+                    }
+                }
+            }
         }
-        layout.swap_logical(swap_qubits[0], swap_qubits[1]);
     }
-    best_swaps.sort_unstable();
-    let best_swap = *best_swaps.choose(rng).unwrap();
-    layout.swap_logical(best_swap[0], best_swap[1]);
-    best_swap
 }
 
-#[inline]
-fn compute_cost(layer: &[[usize; 2]], layout: &NLayout, dist: &ArrayView2<f64>) -> f64 {
-    layer
-        .iter()
-        .map(|gate| dist[[layout.logic_to_phys[gate[0]], layout.logic_to_phys[gate[1]]]])
-        .sum()
+/// Walk through the swaps in the given vector, undoing them on the layout and removing them.
+fn undo_swaps(swaps: &mut Vec<[usize; 2]>, layout: &mut NLayout) {
+    swaps
+        .drain(..)
+        .rev()
+        .for_each(|swap| layout.swap_logical(swap[0], swap[1]));
 }
 
-fn score_lookahead(
-    layer: &[[usize; 2]],
-    extended_set: &[[usize; 2]],
+/// Find the node index and its associated virtual qubits that is currently the closest to being
+/// routable in terms of number of swaps.
+fn closest_operation(
+    front_layer: &FrontLayer,
     layout: &NLayout,
     dist: &ArrayView2<f64>,
-) -> f64 {
-    let mut first_cost = compute_cost(layer, layout, dist);
-    first_cost /= layer.len() as f64;
-    let second_cost = if extended_set.is_empty() {
-        0.
-    } else {
-        compute_cost(extended_set, layout, dist) / extended_set.len() as f64
-    };
-    first_cost + EXTENDED_SET_WEIGHT * second_cost
+) -> (NodeIndex, [usize; 2]) {
+    let (&node, qubits) = front_layer
+        .iter()
+        .map(|(node, qubits)| {
+            (
+                node,
+                [
+                    layout.logic_to_phys[qubits[0]],
+                    layout.logic_to_phys[qubits[1]],
+                ],
+            )
+        })
+        .min_by(|(_, qubits_a), (_, qubits_b)| {
+            dist[*qubits_a]
+                .partial_cmp(&dist[*qubits_b])
+                .unwrap_or(Ordering::Equal)
+        })
+        .unwrap();
+    (
+        node,
+        [
+            layout.phys_to_logic[qubits[0]],
+            layout.phys_to_logic[qubits[1]],
+        ],
+    )
 }
 
-fn score_decay(
-    layer: &[[usize; 2]],
-    extended_set: &[[usize; 2]],
+/// Add the minimal set of swaps to the `swaps` vector that bring the two `qubits` together so that
+/// a 2q gate on them could be routed.
+fn swaps_to_route(
+    swaps: &mut Vec<[usize; 2]>,
+    qubits: &[usize; 2],
     layout: &NLayout,
-    dist: &ArrayView2<f64>,
-    swap_qubits: &[usize; 2],
-    qubits_decay: &[f64],
-) -> f64 {
-    let total_cost = score_lookahead(layer, extended_set, layout, dist);
-    qubits_decay[swap_qubits[0]].max(qubits_decay[swap_qubits[1]]) * total_cost
+    coupling_graph: &DiGraph<(), ()>,
+) {
+    let mut shortest_paths: DictMap<NodeIndex, Vec<NodeIndex>> = DictMap::new();
+    let u = layout.logic_to_phys[qubits[0]];
+    let v = layout.logic_to_phys[qubits[1]];
+    (dijkstra(
+        coupling_graph,
+        NodeIndex::<u32>::new(u),
+        Some(NodeIndex::<u32>::new(v)),
+        |_| Ok(1.),
+        Some(&mut shortest_paths),
+    ) as PyResult<Vec<Option<f64>>>)
+        .unwrap();
+    let shortest_path: Vec<usize> = shortest_paths
+        .get(&NodeIndex::new(v))
+        .unwrap()
+        .iter()
+        .map(|n| n.index())
+        .collect();
+    // Insert greedy swaps along that shortest path
+    let split: usize = shortest_path.len() / 2;
+    let forwards = &shortest_path[1..split];
+    let backwards = &shortest_path[split..shortest_path.len() - 1];
+    swaps.reserve(shortest_path.len() - 2);
+    for swap in forwards {
+        swaps.push([qubits[0], layout.phys_to_logic[*swap]]);
+    }
+    for swap in backwards.iter().rev() {
+        swaps.push([qubits[1], layout.phys_to_logic[*swap]]);
+    }
 }
 
-fn score_heuristic(
-    heuristic: &Heuristic,
-    layer: &[[usize; 2]],
-    extended_set: &[[usize; 2]],
+/// Return the swap of two virtual qubits that produces the best score of all possible swaps.
+fn choose_best_swap(
+    layer: &FrontLayer,
+    extended_set: &ExtendedSet,
     layout: &NLayout,
-    swap_qubits: &[usize; 2],
+    neighbor_table: &NeighborTable,
     dist: &ArrayView2<f64>,
     qubits_decay: &[f64],
-) -> f64 {
-    match heuristic {
-        Heuristic::Basic => compute_cost(layer, layout, dist),
-        Heuristic::Lookahead => score_lookahead(layer, extended_set, layout, dist),
+    heuristic: &Heuristic,
+    rng: &mut Pcg64Mcg,
+) -> [usize; 2] {
+    let mut min_score = f64::MAX;
+    let mut best_swaps: Vec<[usize; 2]> = Vec::new();
+    // The decay heuristic is the only one that actually needs the absolute score.
+    let absolute_score = match heuristic {
         Heuristic::Decay => {
-            score_decay(layer, extended_set, layout, dist, swap_qubits, qubits_decay)
+            layer.total_score(layout, dist)
+                + EXTENDED_SET_WEIGHT * extended_set.total_score(layout, dist)
         }
+        _ => 0.0,
+    };
+    for swap in obtain_swaps(layer, neighbor_table, layout) {
+        let score = match heuristic {
+            Heuristic::Basic => layer.score(swap, layout, dist),
+            Heuristic::Lookahead => {
+                layer.score(swap, layout, dist)
+                    + EXTENDED_SET_WEIGHT * extended_set.score(swap, layout, dist)
+            }
+            Heuristic::Decay => {
+                qubits_decay[swap[0]].max(qubits_decay[swap[1]])
+                    * (absolute_score
+                        + layer.score(swap, layout, dist)
+                        + EXTENDED_SET_WEIGHT * extended_set.score(swap, layout, dist))
+            }
+        };
+        if score < min_score {
+            min_score = score;
+            best_swaps.clear();
+            best_swaps.push(swap);
+        } else if score == min_score {
+            best_swaps.push(swap);
+        }
+    }
+    if best_swaps.len() > 1 {
+        best_swaps.sort_unstable();
     }
+    *best_swaps.choose(rng).unwrap()
 }
 
 #[pymodule]
diff --git a/src/sabre_swap/sabre_dag.rs b/src/sabre_swap/sabre_dag.rs
index 480dbb00323e..7bec6b9a6ac8 100644
--- a/src/sabre_swap/sabre_dag.rs
+++ b/src/sabre_swap/sabre_dag.rs
@@ -10,8 +10,7 @@
 // copyright notice, and modified files need to carry a notice indicating
 // that they have been altered from the originals.
 
-use hashbrown::{HashMap, HashSet};
-use numpy::PyReadonlyArray1;
+use hashbrown::HashSet;
 use pyo3::prelude::*;
 use retworkx_core::petgraph::prelude::*;
 
@@ -20,7 +19,7 @@ use retworkx_core::petgraph::prelude::*;
 /// DAGCircuit, but the contents of the node are a tuple of DAGCircuit node ids,
 /// a list of qargs and a list of cargs
 #[pyclass(module = "qiskit._accelerate.sabre_swap")]
-#[pyo3(text_signature = "(num_qubits, num_clbits, nodes, front_layer, /)")]
+#[pyo3(text_signature = "(num_qubits, num_clbits, nodes, /)")]
 #[derive(Clone, Debug)]
 pub struct SabreDAG {
     pub dag: DiGraph<(usize, Vec<usize>), ()>,
@@ -34,36 +33,35 @@ impl SabreDAG {
         num_qubits: usize,
         num_clbits: usize,
         nodes: Vec<(usize, Vec<usize>, HashSet<usize>)>,
-        front_layer: PyReadonlyArray1<usize>,
     ) -> PyResult<Self> {
-        let mut qubit_pos: Vec<usize> = vec![usize::MAX; num_qubits];
-        let mut clbit_pos: Vec<usize> = vec![usize::MAX; num_clbits];
-        let mut reverse_index_map: HashMap<usize, NodeIndex> = HashMap::with_capacity(nodes.len());
+        let mut qubit_pos: Vec<Option<NodeIndex>> = vec![None; num_qubits];
+        let mut clbit_pos: Vec<Option<NodeIndex>> = vec![None; num_clbits];
         let mut dag: DiGraph<(usize, Vec<usize>), ()> =
             Graph::with_capacity(nodes.len(), 2 * nodes.len());
+        let mut first_layer = Vec::<NodeIndex>::new();
         for node in &nodes {
             let qargs = &node.1;
             let cargs = &node.2;
             let gate_index = dag.add_node((node.0, qargs.clone()));
-            reverse_index_map.insert(node.0, gate_index);
+            let mut is_front = true;
             for x in qargs {
-                if qubit_pos[*x] != usize::MAX {
-                    dag.add_edge(NodeIndex::new(qubit_pos[*x]), gate_index, ());
+                if let Some(predecessor) = qubit_pos[*x] {
+                    is_front = false;
+                    dag.add_edge(predecessor, gate_index, ());
                 }
-                qubit_pos[*x] = gate_index.index();
+                qubit_pos[*x] = Some(gate_index);
             }
             for x in cargs {
-                if clbit_pos[*x] != usize::MAX {
-                    dag.add_edge(NodeIndex::new(clbit_pos[*x]), gate_index, ());
+                if let Some(predecessor) = clbit_pos[*x] {
+                    is_front = false;
+                    dag.add_edge(predecessor, gate_index, ());
                 }
-                clbit_pos[*x] = gate_index.index();
+                clbit_pos[*x] = Some(gate_index);
+            }
+            if is_front {
+                first_layer.push(gate_index);
             }
         }
-        let first_layer = front_layer
-            .as_slice()?
-            .iter()
-            .map(|x| reverse_index_map[x])
-            .collect();
         Ok(SabreDAG { dag, first_layer })
     }
 }
diff --git a/test/python/transpiler/test_sabre_swap.py b/test/python/transpiler/test_sabre_swap.py
index 44c369fbe154..0851789de49d 100644
--- a/test/python/transpiler/test_sabre_swap.py
+++ b/test/python/transpiler/test_sabre_swap.py
@@ -17,6 +17,7 @@
 import ddt
 
 from qiskit.circuit.library import CCXGate, HGate, Measure, SwapGate
+from qiskit.converters import circuit_to_dag
 from qiskit.transpiler.passes import SabreSwap, TrivialLayout
 from qiskit.transpiler import CouplingMap, PassManager
 from qiskit import ClassicalRegister, QuantumRegister, QuantumCircuit
@@ -307,6 +308,37 @@ def test_conditional_measurement(self):
         result = SabreSwap(CouplingMap.from_line(3), seed=12345)(qc)
         self.assertEqual(result, expected)
 
+    @ddt.data("basic", "lookahead", "decay")
+    def test_deterministic(self, heuristic):
+        """Test that the output of the SabreSwap pass is deterministic for a given random seed."""
+        width = 40
+
+        # The actual circuit is unimportant, we just need one with lots of scoring degeneracy.
+        qc = QuantumCircuit(width)
+        for i in range(width // 2):
+            qc.cx(i, i + (width // 2))
+        for i in range(0, width, 2):
+            qc.cx(i, i + 1)
+        dag = circuit_to_dag(qc)
+
+        coupling = CouplingMap.from_line(width)
+        pass_0 = SabreSwap(coupling, heuristic, seed=0, trials=1)
+        pass_1 = SabreSwap(coupling, heuristic, seed=1, trials=1)
+        dag_0 = pass_0.run(dag)
+        dag_1 = pass_1.run(dag)
+
+        # This deliberately avoids using a topological order, because that introduces an opportunity
+        # for the re-ordering to sort the swaps back into a canonical order.
+        def normalize_nodes(dag):
+            return [(node.op.name, node.qargs, node.cargs) for node in dag.op_nodes()]
+
+        # A sanity check for the test - if unequal seeds don't produce different outputs for this
+        # degenerate circuit, then the test probably needs fixing (or Sabre is ignoring the seed).
+        self.assertNotEqual(normalize_nodes(dag_0), normalize_nodes(dag_1))
+
+        # Check that a re-run with the same seed produces the same circuit in the exact same order.
+        self.assertEqual(normalize_nodes(dag_0), normalize_nodes(pass_0.run(dag)))
+
 
 if __name__ == "__main__":
     unittest.main()