Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use physical qubits internally within Sabre #10782

Merged
merged 4 commits into from
Sep 20, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 105 additions & 65 deletions crates/accelerate/src/sabre_swap/layer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,24 @@ use indexmap::IndexMap;
use ndarray::prelude::*;
use rustworkx_core::petgraph::prelude::*;

use crate::nlayout::{NLayout, VirtualQubit};
use crate::nlayout::PhysicalQubit;

/// A container for the current non-routable parts of the front layer. This only ever holds
/// two-qubit gates; the only reason a 0q- or 1q operation can be unroutable is because it has an
/// unsatisfied 2q predecessor, which disqualifies it from being in the front layer.
///
/// It would be more algorithmically natural for this struct to work in terms of virtual qubits,
/// because then a swap insertion would not change the data contained. However, for each swap we
/// insert, we score tens or hundreds, yet the subsequent update only affects two qubits. This
/// makes it more efficient to do everything in terms of physical qubits, so the conversion between
/// physical and virtual qubits via the layout happens once per inserted swap and on layer
/// extension, not for every swap trialled.
pub struct FrontLayer {
/// Map of the (index to the) node to the qubits it acts on.
nodes: IndexMap<NodeIndex, [VirtualQubit; 2], ahash::RandomState>,
nodes: IndexMap<NodeIndex, [PhysicalQubit; 2], ahash::RandomState>,
/// Map of each qubit to the node that acts on it and the other qubit that node acts on, if this
/// qubit is active (otherwise `None`).
qubits: Vec<Option<(NodeIndex, VirtualQubit)>>,
qubits: Vec<Option<(NodeIndex, PhysicalQubit)>>,
}

impl FrontLayer {
Expand All @@ -42,7 +49,7 @@ impl FrontLayer {
}

/// Add a node into the front layer, with the two qubits it operates on.
pub fn insert(&mut self, index: NodeIndex, qubits: [VirtualQubit; 2]) {
pub fn insert(&mut self, index: NodeIndex, qubits: [PhysicalQubit; 2]) {
let [a, b] = qubits;
self.qubits[a.index()] = Some((index, b));
self.qubits[b.index()] = Some((index, a));
Expand All @@ -51,20 +58,20 @@ impl FrontLayer {

/// Remove a node from the front layer.
pub fn remove(&mut self, index: &NodeIndex) {
let [q0, q1] = self.nodes.remove(index).unwrap();
self.qubits[q0.index()] = None;
self.qubits[q1.index()] = None;
let [a, b] = self.nodes.remove(index).unwrap();
self.qubits[a.index()] = None;
self.qubits[b.index()] = None;
}

/// Query whether a qubit has an active node.
#[inline]
pub fn is_active(&self, qubit: VirtualQubit) -> bool {
pub fn is_active(&self, qubit: PhysicalQubit) -> bool {
self.qubits[qubit.index()].is_some()
}

/// Calculate the score _difference_ caused by this swap, compared to not making the swap.
#[inline]
pub fn score(&self, swap: [VirtualQubit; 2], layout: &NLayout, dist: &ArrayView2<f64>) -> f64 {
pub fn score(&self, swap: [PhysicalQubit; 2], dist: &ArrayView2<f64>) -> f64 {
if self.is_empty() {
return 0.0;
}
Expand All @@ -77,25 +84,21 @@ impl FrontLayer {
let [a, b] = swap;
let mut total = 0.0;
if let Some((_, c)) = self.qubits[a.index()] {
let p_c = c.to_phys(layout);
total += dist[[b.to_phys(layout).index(), p_c.index()]]
- dist[[a.to_phys(layout).index(), p_c.index()]]
total += dist[[b.index(), c.index()]] - dist[[a.index(), c.index()]]
}
if let Some((_, c)) = self.qubits[b.index()] {
let p_c = c.to_phys(layout);
total += dist[[a.to_phys(layout).index(), p_c.index()]]
- dist[[b.to_phys(layout).index(), p_c.index()]]
total += dist[[a.index(), c.index()]] - dist[[b.index(), c.index()]]
}
total / self.nodes.len() as f64
}

/// Calculate the total absolute of the current front layer on the given layer.
pub fn total_score(&self, layout: &NLayout, dist: &ArrayView2<f64>) -> f64 {
pub fn total_score(&self, dist: &ArrayView2<f64>) -> f64 {
if self.is_empty() {
return 0.0;
}
self.iter()
.map(|(_, &[a, b])| dist[[a.to_phys(layout).index(), b.to_phys(layout).index()]])
.map(|(_, &[a, b])| dist[[a.index(), b.index()]])
.sum::<f64>()
/ self.nodes.len() as f64
}
Expand All @@ -105,37 +108,54 @@ impl FrontLayer {
pub fn routable_after(
&self,
routable: &mut Vec<NodeIndex>,
swap: &[VirtualQubit; 2],
layout: &NLayout,
swap: &[PhysicalQubit; 2],
coupling: &DiGraph<(), ()>,
) {
let [a, b] = *swap;
if let Some((node, c)) = self.qubits[a.index()] {
if coupling.contains_edge(
NodeIndex::new(b.to_phys(layout).index()),
NodeIndex::new(c.to_phys(layout).index()),
) {
if coupling.contains_edge(NodeIndex::new(b.index()), NodeIndex::new(c.index())) {
routable.push(node);
}
}
if let Some((node, c)) = self.qubits[b.index()] {
if coupling.contains_edge(
NodeIndex::new(a.to_phys(layout).index()),
NodeIndex::new(c.to_phys(layout).index()),
) {
if coupling.contains_edge(NodeIndex::new(a.index()), NodeIndex::new(c.index())) {
routable.push(node);
}
}
}

/// Apply a physical swap to the current layout data structure.
pub fn apply_swap(&mut self, swap: [PhysicalQubit; 2]) {
let [a, b] = swap;
match (self.qubits[a.index()], self.qubits[b.index()]) {
(Some((index1, _)), Some((index2, _))) if index1 == index2 => {
let entry = self.nodes.get_mut(&index1).unwrap();
*entry = [entry[1], entry[0]];
return;
}
_ => {}
}
if let Some((index, c)) = self.qubits[a.index()] {
self.qubits[c.index()] = Some((index, b));
let entry = self.nodes.get_mut(&index).unwrap();
*entry = if *entry == [a, c] { [b, c] } else { [c, b] };
}
if let Some((index, c)) = self.qubits[b.index()] {
self.qubits[c.index()] = Some((index, a));
let entry = self.nodes.get_mut(&index).unwrap();
*entry = if *entry == [b, c] { [a, c] } else { [c, a] };
}
self.qubits.swap(a.index(), b.index());
}

/// True if there are no nodes in the current layer.
#[inline]
pub fn is_empty(&self) -> bool {
self.nodes.is_empty()
}

/// Iterator over the nodes and the pair of qubits they act on.
pub fn iter(&self) -> impl Iterator<Item = (&NodeIndex, &[VirtualQubit; 2])> {
pub fn iter(&self) -> impl Iterator<Item = (&NodeIndex, &[PhysicalQubit; 2])> {
self.nodes.iter()
}

Expand All @@ -145,90 +165,110 @@ impl FrontLayer {
}

/// Iterator over the qubits that have active nodes on them.
pub fn iter_active(&self) -> impl Iterator<Item = &VirtualQubit> {
pub fn iter_active(&self) -> impl Iterator<Item = &PhysicalQubit> {
self.nodes.values().flatten()
}
}

/// This is largely similar to the `FrontLayer` struct but can have more than one node on each active
/// qubit. This does not have `remove` method (and its data structures aren't optimised for fast
/// removal), since the extended set is built from scratch each time a new gate is routed.
/// This structure is currently reconstructed after each gate is routed, so there's no need to
/// worry about tracking gate indices or anything like that. We track length manually just to
/// avoid a summation.
pub struct ExtendedSet {
nodes: IndexMap<NodeIndex, [VirtualQubit; 2], ahash::RandomState>,
qubits: Vec<Vec<VirtualQubit>>,
qubits: Vec<Vec<PhysicalQubit>>,
len: usize,
}

impl ExtendedSet {
pub fn new(num_qubits: u32, max_size: usize) -> Self {
pub fn new(num_qubits: u32) -> Self {
ExtendedSet {
nodes: IndexMap::with_capacity_and_hasher(max_size, ahash::RandomState::default()),
qubits: vec![Vec::new(); num_qubits as usize],
len: 0,
}
}

/// Add a node and its active qubits to the extended set.
pub fn insert(&mut self, index: NodeIndex, qubits: &[VirtualQubit; 2]) -> bool {
let [a, b] = *qubits;
if self.nodes.insert(index, *qubits).is_none() {
self.qubits[a.index()].push(b);
self.qubits[b.index()].push(a);
true
} else {
false
}
pub fn push(&mut self, qubits: [PhysicalQubit; 2]) {
let [a, b] = qubits;
self.qubits[a.index()].push(b);
self.qubits[b.index()].push(a);
self.len += 1;
}

/// Calculate the score of applying the given swap, relative to not applying it.
pub fn score(&self, swap: [VirtualQubit; 2], layout: &NLayout, dist: &ArrayView2<f64>) -> f64 {
if self.nodes.is_empty() {
pub fn score(&self, swap: [PhysicalQubit; 2], dist: &ArrayView2<f64>) -> f64 {
if self.len == 0 {
return 0.0;
}
let [a, b] = swap;
let p_a = a.to_phys(layout);
let p_b = b.to_phys(layout);
let mut total = 0.0;
for other in self.qubits[a.index()].iter() {
// If the other qubit is also active then the score won't have changed, but since the
// distance is absolute, we'd double count rather than ignore if we didn't skip it.
if *other == b {
continue;
}
let p_other = other.to_phys(layout);
total += dist[[p_b.index(), p_other.index()]] - dist[[p_a.index(), p_other.index()]];
total += dist[[b.index(), other.index()]] - dist[[a.index(), other.index()]];
}
for other in self.qubits[b.index()].iter() {
if *other == a {
continue;
}
let p_other = other.to_phys(layout);
total += dist[[p_a.index(), p_other.index()]] - dist[[p_b.index(), p_other.index()]];
total += dist[[a.index(), other.index()]] - dist[[b.index(), other.index()]];
}
total / self.nodes.len() as f64
total / self.len as f64
}

/// Calculate the total absolute score of this set of nodes over the given layout.
pub fn total_score(&self, layout: &NLayout, dist: &ArrayView2<f64>) -> f64 {
if self.nodes.is_empty() {
pub fn total_score(&self, dist: &ArrayView2<f64>) -> f64 {
if self.len == 0 {
return 0.0;
}
self.nodes
.values()
.map(|&[a, b]| dist[[a.to_phys(layout).index(), b.to_phys(layout).index()]])
self.qubits
.iter()
.enumerate()
.map(move |(a_index, others)| {
others
.iter()
.map(|b| {
let b_index = b.index();
if a_index <= b_index {
dist[[a_index, b_index]]
jakelishman marked this conversation as resolved.
Show resolved Hide resolved
} else {
0.0
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an advantage to doing this vs a filter_map?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about it - the original code used map because it had an iterator that directly ran over nodes, so didn't need to worry about the double counting. I'll swap it to filter_map if you prefer, and another possibility is making the outer map a flat_map and removing the internal sum?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it was more an efficiency question, I didn't know if adding a bunch of 0.0s was going to be faster than using filter_map or something. But, yeah using a filter_map, flat_map, and a single sum seems the more natural way to do this with iterators.

Copy link
Member Author

@jakelishman jakelishman Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naively, I don't think you can't represent Option<f64> without a discriminant (assuming you care about propagating the bit pattern payload of qNaNs?) but given LLVM will have the full context of the ensuing operations, it can probably unpick everything into sensible code. I had a go in https://godbolt.org/z/Y531qhn9r, but I think at that point it'll be branch prediction and memory-access patterns that dictate the speed more than anything else. I'll try locally and just make it a flat_map+filter_map assuming it's not visible.

If this code turns out to be something we need to micro-optimise, the next thing I'd try would be enforcing the Vec<Vec<others>> stuff to always allocate the other qubits in groups of 4 or 8, and then get the compiler to emit always-vectorised code output (using the self-index in the outer vec to represent "missing" slots in the inner vec, to abuse dist[[a, a]] == 0.0), and then multiply the final result by 0.5 to remove the double-count effect. But that would 100% be a different PR lol.

edit: that vectorisation wouldn't survive a conversion of the extended set into a sequence of layers, so probably not worth trying at all tbh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important thing for performance here was just to remove the branching; the floating-point additions are probably mostly pipelined at more than a 2x rate compared to the cycle count of the operation (especially if the compiler's able to find some SIMD vectorisation). I also made this a flat-map - I didn't see much of a difference from that, but it looks a shade more natural now anyway.

Depending on where we go with the extended set in the future, I might look into rearranging the data structure to allow explicit SIMD vectorisation in all the calculations, but I'll leave that til after we know what we're going to do with a potential layer structure on the extended set.

Done in 0475b3b.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that a further re-organisation of the data structure to store more in order to have a direct iterator over each gate exactly once without branching might help as well, but I want to wait on that til we've looked at what we're going to do about building up the extended set as a delta, because this PR still offers a speed-up over the status quo, and I don't want to get into premature optimisation that might become obsolete as soon as we add the extra requirement that it must be possible to remove single gates from the ExtendedSet.

})
.sum::<f64>()
})
.sum::<f64>()
/ self.nodes.len() as f64
/ self.len as f64
}

/// Clear all nodes from the extended set.
pub fn clear(&mut self) {
for &[a, b] in self.nodes.values() {
self.qubits[a.index()].clear();
self.qubits[b.index()].clear();
for others in self.qubits.iter_mut() {
others.clear()
}
self.nodes.clear()
self.len = 0;
}

/// Number of nodes in the set.
pub fn len(&self) -> usize {
self.nodes.len()
self.len
}

/// Apply a physical swap to the current layout data structure.
pub fn apply_swap(&mut self, swap: [PhysicalQubit; 2]) {
let [a, b] = swap;
for other in self.qubits[a.index()].iter_mut() {
if *other == b {
*other = a
}
}
for other in self.qubits[b.index()].iter_mut() {
if *other == a {
*other = b
}
}
self.qubits.swap(a.index(), b.index());
}
}
Loading