Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate relative, not absolute, scores in SabreSwap #9012

Merged
merged 4 commits into from
Nov 9, 2022

Conversation

jakelishman
Copy link
Member

Summary

This is a significant performance improvement for very wide (100q+) circuits - technically it's an asymptotic improvement in the scoring calculation for volumetric circuits, though in practice other constant factors especially in Python seem to mask that somewhat or perhaps there's another quadratic component I hadn't thought about - the output number of swaps seems like a potential place an implicit O(n^2) dependency could appear.

For a depth-5 QV 1081 circuit on a heavy-hex lattice, this version of Sabre routed the problem in 25s on my machine compared to 100s on main. In relatively informal testing, I didn't see any notable performance increases or decreases in any of our current Sabre benchmark suite; I think other overhead masks any effects, and the circuits aren't large enough for it to matter anyway.

This is the first of a series of PRs I hope to make about Sabre in Terra. The purpose of this commit is to demonstrate that this improved mechanism of scoring does not need to affect the quality of output at all. This commit maintains RNG compatibility with the existing Sabre implementation despite completely overhauling the internal logic, and implementing new scoring.

In further PRs, I intend to relax the RNG compatibility requirement, because there are a lot of speed-ups currently left on the table. I haven't made too much effort to optimise things within the RNG compatibility, because it'll be easier once that restriction is relaxed.

Details and comments

The previous SabreSwap algorithm would score each candidate swap by applying the swap to the current layout, iterating through every element in the front layer and extended sets summing the total distances of the 2q gates, and then undoing the swap. However, in the front layer, a given swap can affect at most two 2q operations. This new form instead scores each swap by calculating its total distance relative to if the swap had not been made. This means that the basic score is formed from only one or two gates, no matter how many are in the front layer.

This is an algorithmic complexity improvement in the scoring for volumetric circuits with respect to the number of qubits. Such a circuit with n qubits has n / 2 gates in its front layer at all times, and so (assuming a coupling map that expands by a constant connectivity factor per qubit, like heavy hex) k * n swaps to score. This means that choosing the best swap has quadratic complexity with the original Sabre scoring algorithm. With this new algorithm, the score for a given swap is calculated in constant time, so choosing the best is instead linear. In practice, I did not see all these improvements at the scales I tested at, but I did see significant improvements - a 1081q heavy-hex quantum-volume circuit at depth 5 was swap-mapped on my machine in 25s with this commit compared to 100s before it.

The principal change is the structs FrontLayer and ExtendedSet, which combine constant-time hash-set insertions, lookups and removals with vectors to enable constant-time lookup of the affected qubits. FrontLayer now only ever holds currently unroutable 2q operations; routable operations are immediately placed into the output structures as soon as a new 2q gate is routed. This routing is also done by exploiting that a swap can only affect two previously unroutable gates in the front layer; we just walk forwards along the outbound edges from those two nodes, adding any operations that become routable. This avoids another linear scan through the whole front layer after each swap, although in practice this has less of a speed-up effect, because it already wasn't quadratic.

In theory, this change does not affect how any swap is scored relative to any other with the same front layer and extended set (though the scores used in the comparison do change). In order to precisely match the current implementation (and ensure reproducibility from a given seed), this tracks the insertion order of nodes into the front layer, including after removals.

This commit completely modifies the internals of the Rust components of Sabre, although the actual algorithm is largely unchanged, aside from the scoring difference. Various parts of the implementation do change for efficiency, though.

This commit maintains RNG compatibility with the previous Rust implementation in most cases. It is possible in some circuits for floating-point differences to cause different output, when several swaps are at the minimum score, but plus/minus 1ULP. This happens in both the old and new forms of the implementation, but which of the minimal swaps get the minus-1ULP score varies between them, and consequently affects the swap choice. In fairly extensive testing, this appears to be the only mechanism for differences; I've verified that the release-valve mechanism and predecessor-requirement tracking function identically to before. The resultant scores - relative for "basic" and "lookahead", absolute for "decay" - are in practice within 2ULP of the old algorithm's.

In maintaining RNG compatibility, this commit leaves several further speed-ups on the table. There is additional memory usage and tracking to maintain some required iteration orders, and some reordering checks that are not strictly necessary any more. Further, the sorting stages at the ends of the swap-choosing functions (to maintain repeatability) can be made redundant now, since some hash-set iteration (which is effectively an uncontrolled randomisation per run) is no longer required. These will be addressed in follow-ups.


I haven't written a changelog entry for this right now - I'll do it as/when I make another PR that breaks RNG compat.

This is a significant performance improvement for very wide (100q+)
circuits.

The previous `SabreSwap` algorithm would score each candidate swap by
applying the swap to the current layout, iterating through every element
in the front layer and extended sets summing the total distances of the
2q gates, and then undoing the swap.  However, in the front layer, a
given swap can affect at most two 2q operations.  This new form instead
scores each swap by calculating its total distance relative to if the
swap had not been made.  This means that the basic score is formed from
only one or two gates, no matter how many are in the front layer.

This is an algorithmic complexity improvement in the scoring for
volumetric circuits with respect to the number of qubits.  Such a
circuit with `n` qubits has `n / 2` gates in its front layer at all
times, and so (assuming a coupling map that expands by a constant
connectivity factor per qubit, like heavy hex) `k * n` swaps to score.
This means that choosing the best swap has quadratic complexity with the
original Sabre scoring algorithm.  With this new algorithm, the score
for a given swap is calculated in constant time, so choosing the best is
instead linear.  In practice, I did not see all these improvements at
the scales I tested at, but I did see significant improvements - a 1081q
heavy-hex quantum-volume circuit at depth 5 was swap-mapped on my
machine in 25s with this commit compared to 100s before it.

The principal change is the structs `FrontLayer` and `ExtendedSet`,
which combine constant-time hash-set insertions, lookups and removals
with vectors to enable constant-time lookup of the affected qubits.
`FrontLayer` now only ever holds currently unroutable 2q operations;
routable operations are immediately placed into the output structures
as soon as a new 2q gate is routed.  This routing is also done by
exploiting that a swap can only affect two previously unroutable gates
in the front layer; we just walk forwards along the outbound edges from
those two nodes, adding any operations that become routable.  This
avoids another linear scan through the whole front layer after each
swap, although in practice this has less of a speed-up effect, because
it already wasn't quadratic.

In theory, this change does not affect how any swap is scored relative
to any other with the same front layer and extended set (though the
scores used in the comparison do change).  In order to precisely match
the current implementation (and ensure reproducibility from a given
seed), this tracks the insertion order of nodes into the front layer,
including after removals.

This commit completely modifies the internals of the Rust components of
Sabre, although the actual algorithm is largely unchanged, aside from
the scoring difference.  Various parts of the implementation do change
for efficiency, though.

This commit maintains RNG compatibility with the previous Rust
implementation in most cases.  It is possible in some circuits for
floating-point differences to cause different output, when several swaps
are at the minimum score, but plus/minus 1ULP.  This happens in both the
old and new forms of the implementation, but _which_ of the minimal
swaps get the minus-1ULP score varies between them, and consequently
affects the swap choice.  In fairly extensive testing, this appears to
be the only mechanism for differences; I've verified that the
release-valve mechanism and predecessor-requirement tracking function
identically to before.  The resultant scores - relative for "basic" and
"lookahead", absolute for "decay" - are in practice within 2ULP of the
old algorithm's.

In maintaining RNG compatibility, this commit leaves several further
speed-ups on the table.  There is additional memory usage and tracking
to maintain some required iteration orders, and some reordering checks
that are not strictly necessary any more.  Further, the sorting stages
at the ends of the swap-choosing functions (to maintain repeatability)
can be made redundant now, since some hash-set iteration (which is
effectively an uncontrolled randomisation per run) is no longer
required.  These will be addressed in follow-ups.
@jakelishman jakelishman added performance Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Oct 27, 2022
@jakelishman jakelishman added this to the 0.23.0 milestone Oct 27, 2022
@jakelishman jakelishman requested a review from a team as a code owner October 27, 2022 16:48
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

@coveralls
Copy link

coveralls commented Oct 27, 2022

Pull Request Test Coverage Report for Build 3431683015

  • 466 of 469 (99.36%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.06%) to 84.599%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/sabre_swap/layer.rs 188 191 98.43%
Totals Coverage Status
Change from base Build 3431646392: 0.06%
Covered Lines: 62663
Relevant Lines: 74071

💛 - Coveralls

@alexanderivrii
Copy link
Contributor

@jakelishman, the performance speedup on large benchmarks is very impressive. However, I believe that it would no longer be possible to run Sabre on DAGDependency-like graphs. Would it be possible to add this functionality without sacrificing the obtained performance speedup? Or would it make more sense to have two slightly different versions of Sabre, one very optimized that assumes that the DAG is a DAGCircuit, and one less optimized that allows more general graphs? What do you think?

@jakelishman
Copy link
Member Author

jakelishman commented Oct 31, 2022

You can get essentially the same speedup on DAGDependency as well, subject to the number of gates used in the score increasing a bit, but you'd still only need to use the gates that the swap affects directly. Mostly we'd just need to change the FrontLayer data structure a little so that the front layer holds a Vec for each qubit slot in the layer. Most of the rest stays the same, I think.

That can all be done in a unified manner in the algorithm here. We'd perhaps want to use SmallVec instead of the standard Vec in that case to keep the majority of our data local to the FrontLayer struct (for better cache access), since making everything a Vec would add a layer of indirection into the heap, but that's the sort of think we'd need to profile to verify anyway.

Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm a big fan of the front layer and extended set classes and associated methods. It makes the core of the algorithm a lot easier to read.

I'm not sure I like the use of required_predecessors as scratch space in populate_extended_set since it requires mutability that looks to otherwise not be needed. Purely a preference for contract readability, but if it helps performance, I'm not opposed.

I'll hold off on auto-merge in case anyone else wants to review.

(I spent a few hours reviewing this and couldn't find any bugs. I thought I might find an issue around gates with unsatisfied classical dependencies making it into the front layer, but that looks to be handled nicely by the tracking of required_predecessors 🙂).

Nice work!

/ self.nodes.len() as f64
}

/// Populate a of nodes that would be routable if the given swap was applied to a layout. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Populate a of nodes that would be routable if the given swap was applied to a layout. This
/// Populate a vector of nodes that would be routable if the given swap was applied to a layout. This

@jakelishman
Copy link
Member Author

If I remember correctly, the previous algorithm also used required_predecessors as scratch space when calculating the extended set. My intent in a couple of PRs' time is to modify the algorithm in a way that lets us build up the extended set as a delta on the previous one, rather than needing a complete reconstruction as we currently do, which may remove some of that scratch usage as well. I need to write that and test it first, though.

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great! Nothing really looked wrong with the logic to me, and the code is a lot more structured now too which is a big win. There were a few place where you used loops that I would have gone with an iterator instead, but that typically doesn't make much of difference in practice so I didn't mention it (except in one place). I left a few inline comments but they're mostly my musings as I was reading through the code more than anything else. I don't any of them are particularly actionable so I'm going to tag automerge thanks for all the work on this.

Comment on lines +94 to +98
while self.iteration_start < self.iteration_end
&& self.iteration_order[self.iteration_start].is_none()
{
self.iteration_start += 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A really tiny nit micro-optimization benchmarking question here which we should just disregard it really as it's more idle musing, especially as I know this block will be removed in the subsequent PR This just looked a bit odd to me as a way to find the index of the first non-None element in a slice. It works fine, but I wonder how it would compare performance wise to something like (untested so there are probably typos or mistakes):

self.iteration_start = match self.iteration_order[self.iteration_start..].iter().position(|x| x.is_some()) {
    Some(relative_index) => relative_index + self.iteration_start,
    None => self.iteration_end,
};

I expect this will be slower in the non-match case as it will traverse the full vec instead of terminating at insertion_end (although you could probably work around that with enumerate() and some other logic. It really is a question of how much the compiler can optimize with an iterator instead of a while loop I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer to most questions about why I used loops and not iterators is going to be "because I didn't know there was an iterator method that does it" haha. I think in your example, we could just limit the slice to self.iteration_start..self.iteration_end to get the early-exit behaviour too, though?

let mut decremented: HashMap<usize, u32> = HashMap::new();
let mut i = 0;
while i < to_visit.len() && extended_set.len() < EXTENDED_SET_SIZE {
for edge in dag.dag.edges_directed(to_visit[i], Direction::Outgoing) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, this would have been fine as edges() too because for a directed graph this is the direction behavior in petgraph (which is what the before version did). But being explicit is probably more clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok that's good to know thanks.

@@ -162,7 +154,7 @@ pub fn build_swap_map(
layout: &mut NLayout,
num_trials: usize,
) -> (SwapMap, PyObject) {
let run_in_parallel = getenv_use_multiple_threads();
let run_in_parallel = getenv_use_multiple_threads() && num_trials > 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's a good catch, I guess we were potentially creating an unnecessary thread pool before for the single trial case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't actually really mean to commit this as part of the same patch - I was just using it while doing a little bit of profiling so I could avoid a bunch of threading calls in the output (then I realised I could just force it with the environment...). Since it's in, probably best to just leave it now.

qubits_decay[best_swap[1]] += DECAY_RATE;
}
ops_since_progress.push(best_swap);
qubits_decay.fill(1.);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, I think there was a reason I used fill_with() instead of fill() for this (probably some random micro-optimization to save a few nanoseconds that I read about somewhere a while ago) but I can't remember exactly why and this will work fine (and if there was a performance difference it will be imperceivable).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, the code went through a few iterations, and I'd probably lost that you'd originally used fill_with when I wrote the fill version. As long as there's no meaningful difference, I guess it doesn't matter (and for types that implement Copy, I'd be surprised if fill_with could be more efficient?).

front_layer: &mut FrontLayer,
required_predecessors: &mut [u32],
) {
let mut to_visit = to_visit.to_vec();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can get away without copying here somehow. I haven't thought it through in depth but my gut instinct was we could potentially write this function without having to append to the input node list like this. But I'm probably just overlooking something.

Not that this really matters, and I doubt this has any real impact on performance, especially since the routable nodes list is never going to be that long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even in the worst case, we could just write a small class that borrows the input Vec, and only allocates new heap space on push, and redirects the indexing to the right places in it. I didn't want to get too deep into these optimisations quite yet for now.

Comment on lines +61 to +62
if is_front {
first_layer.push(gate_index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad we're able to do this inline now. In #8388 I had originally something similar to generate the first layer as we iterated and ending up producing a slightly different but still valid result in some cases so I reverted to just using the DAGCircuit method for that. This will make further usage of this data structure from inside rust (for other callers to sabre swap) easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back, I'm not 100% certain why I swapped everything to Option here, but this was one place I checked to ensure we built up everything in the same order. I think I originally started massively rewriting the SabreDAG struct as well, and then realised that it wasn't necessary, so I just reduced it to (mostly) just the modifications I needed.

@mtreinish
Copy link
Member

I wasn't sure what you wanted to do about the changelog tag, I was inclined to put it as a new feature changelog. Even if you add the release note on the follow-on commit I think would be good to have a line that says Calculate relative, not absolute, scores in SabreSwap on the release page. But I'll defer to you whether you want to tag it as a feature or none.

@jakelishman jakelishman added the Changelog: New Feature Include in the "Added" section of the changelog label Nov 9, 2022
@jakelishman
Copy link
Member Author

I've tagged it "new feature" at least - I suppose the labels are decoupled from the public-facing documentation, since the labels are for generating the PR-based changelog rather than the public one. I didn't really think about that before.

@mergify mergify bot merged commit 02a1939 into Qiskit:main Nov 9, 2022
@jakelishman jakelishman deleted the faster-sabre-swap branch November 14, 2022 21:18
Cryoris pushed a commit to Cryoris/qiskit-terra that referenced this pull request Jan 12, 2023
* Calculate relative, not absolute, scores in SabreSwap

This is a significant performance improvement for very wide (100q+)
circuits.

The previous `SabreSwap` algorithm would score each candidate swap by
applying the swap to the current layout, iterating through every element
in the front layer and extended sets summing the total distances of the
2q gates, and then undoing the swap.  However, in the front layer, a
given swap can affect at most two 2q operations.  This new form instead
scores each swap by calculating its total distance relative to if the
swap had not been made.  This means that the basic score is formed from
only one or two gates, no matter how many are in the front layer.

This is an algorithmic complexity improvement in the scoring for
volumetric circuits with respect to the number of qubits.  Such a
circuit with `n` qubits has `n / 2` gates in its front layer at all
times, and so (assuming a coupling map that expands by a constant
connectivity factor per qubit, like heavy hex) `k * n` swaps to score.
This means that choosing the best swap has quadratic complexity with the
original Sabre scoring algorithm.  With this new algorithm, the score
for a given swap is calculated in constant time, so choosing the best is
instead linear.  In practice, I did not see all these improvements at
the scales I tested at, but I did see significant improvements - a 1081q
heavy-hex quantum-volume circuit at depth 5 was swap-mapped on my
machine in 25s with this commit compared to 100s before it.

The principal change is the structs `FrontLayer` and `ExtendedSet`,
which combine constant-time hash-set insertions, lookups and removals
with vectors to enable constant-time lookup of the affected qubits.
`FrontLayer` now only ever holds currently unroutable 2q operations;
routable operations are immediately placed into the output structures
as soon as a new 2q gate is routed.  This routing is also done by
exploiting that a swap can only affect two previously unroutable gates
in the front layer; we just walk forwards along the outbound edges from
those two nodes, adding any operations that become routable.  This
avoids another linear scan through the whole front layer after each
swap, although in practice this has less of a speed-up effect, because
it already wasn't quadratic.

In theory, this change does not affect how any swap is scored relative
to any other with the same front layer and extended set (though the
scores used in the comparison do change).  In order to precisely match
the current implementation (and ensure reproducibility from a given
seed), this tracks the insertion order of nodes into the front layer,
including after removals.

This commit completely modifies the internals of the Rust components of
Sabre, although the actual algorithm is largely unchanged, aside from
the scoring difference.  Various parts of the implementation do change
for efficiency, though.

This commit maintains RNG compatibility with the previous Rust
implementation in most cases.  It is possible in some circuits for
floating-point differences to cause different output, when several swaps
are at the minimum score, but plus/minus 1ULP.  This happens in both the
old and new forms of the implementation, but _which_ of the minimal
swaps get the minus-1ULP score varies between them, and consequently
affects the swap choice.  In fairly extensive testing, this appears to
be the only mechanism for differences; I've verified that the
release-valve mechanism and predecessor-requirement tracking function
identically to before.  The resultant scores - relative for "basic" and
"lookahead", absolute for "decay" - are in practice within 2ULP of the
old algorithm's.

In maintaining RNG compatibility, this commit leaves several further
speed-ups on the table.  There is additional memory usage and tracking
to maintain some required iteration orders, and some reordering checks
that are not strictly necessary any more.  Further, the sorting stages
at the ends of the swap-choosing functions (to maintain repeatability)
can be made redundant now, since some hash-set iteration (which is
effectively an uncontrolled randomisation per run) is no longer
required.  These will be addressed in follow-ups.

* Fix lint

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
jakelishman added a commit to jakelishman/qiskit-terra that referenced this pull request Feb 9, 2023
The manul insertion-order tracking was initially added in 02a1939
(Qiskitgh-9012) during a complete rewrite of the scoring algorithms for Sabre
to demonstrate that the new algorithm could be made fully RNG
compatible.  It maintained the identical iteration order to the previous
iteration once the front-layer data structure was swapped from being a
raw list to hash-based.

The new data structure doesn't require the manual tracking to be
reproducible as long as the iteration order is independent of the hash
seed.  This swaps the relevant places over the `IndexMap` to be able to
remove a lot of the manual tracking.  In casual testing, this didn't
appear to have much effect on performance, but the resulting code is
much simpler.

The sorts (and deliberate canonicalisation of the swaps) was necessary
to match RNG compatibility in the pre-relative-score Sabre, but since
the swaps are now iterated through in a deterministic order and
guaranteed to be generated only once each (the previous version used a
hashset to remove duplicates), neither step is necessary.

For a QV circuit of depth 5 at 1081 qubits on heavy hex, this is worth
almost a 2x speedup in routing performance.
jakelishman added a commit to jakelishman/qiskit-terra that referenced this pull request Feb 9, 2023
The manual insertion-order tracking was initially added in 02a1939
(Qiskitgh-9012) during a complete rewrite of the scoring algorithms for Sabre
to demonstrate that the new algorithm could be made fully RNG
compatible.  It maintained the identical iteration order to the previous
iteration once the front-layer data structure was swapped from being a
raw list to hash-based.

The new data structure doesn't require the manual tracking to be
reproducible as long as the iteration order is independent of the hash
seed.  This swaps the relevant places over the `IndexMap` to be able to
remove a lot of the manual tracking.  In casual testing, this didn't
appear to have much effect on performance, but the resulting code is
much simpler.

The sorts (and deliberate canonicalisation of the swaps) was necessary
to match RNG compatibility in the pre-relative-score Sabre, but since
the swaps are now iterated through in a deterministic order and
guaranteed to be generated only once each (the previous version used a
hashset to remove duplicates), neither step is necessary.

For a QV circuit of depth 5 at 1081 qubits on heavy hex, this is worth
almost a 2x speedup in routing performance.
mergify bot pushed a commit that referenced this pull request Feb 15, 2023
…#9560)

* Remove Sabre's manual insertion-order iteration and unnecessary sorts

The manual insertion-order tracking was initially added in 02a1939
(gh-9012) during a complete rewrite of the scoring algorithms for Sabre
to demonstrate that the new algorithm could be made fully RNG
compatible.  It maintained the identical iteration order to the previous
iteration once the front-layer data structure was swapped from being a
raw list to hash-based.

The new data structure doesn't require the manual tracking to be
reproducible as long as the iteration order is independent of the hash
seed.  This swaps the relevant places over the `IndexMap` to be able to
remove a lot of the manual tracking.  In casual testing, this didn't
appear to have much effect on performance, but the resulting code is
much simpler.

The sorts (and deliberate canonicalisation of the swaps) was necessary
to match RNG compatibility in the pre-relative-score Sabre, but since
the swaps are now iterated through in a deterministic order and
guaranteed to be generated only once each (the previous version used a
hashset to remove duplicates), neither step is necessary.

For a QV circuit of depth 5 at 1081 qubits on heavy hex, this is worth
almost a 2x speedup in routing performance.

* Fix RNG-related tests

* Add RNG-change note
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog mod: transpiler Issues and PRs related to Transpiler performance priority: high Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants