Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use physical qubits internally within Sabre #10782
Use physical qubits internally within Sabre #10782
Changes from 1 commit
4e44644
0475b3b
03089f9
64d304c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an advantage to doing this vs a
filter_map
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about it - the original code used
map
because it had an iterator that directly ran over nodes, so didn't need to worry about the double counting. I'll swap it tofilter_map
if you prefer, and another possibility is making the outermap
aflat_map
and removing the internalsum
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it was more an efficiency question, I didn't know if adding a bunch of 0.0s was going to be faster than using
filter_map
or something. But, yeah using afilter_map
,flat_map
, and a singlesum
seems the more natural way to do this with iterators.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naively, I don't think you can't represent
Option<f64>
without a discriminant (assuming you care about propagating the bit pattern payload of qNaNs?) but given LLVM will have the full context of the ensuing operations, it can probably unpick everything into sensible code. I had a go in https://godbolt.org/z/Y531qhn9r, but I think at that point it'll be branch prediction and memory-access patterns that dictate the speed more than anything else. I'll try locally and just make it aflat_map+filter_map
assuming it's not visible.If this code turns out to be something we need to micro-optimise, the next thing I'd try would be enforcing the
Vec<Vec<others>>
stuff to always allocate the other qubits in groups of 4 or 8, and then get the compiler to emit always-vectorised code output (using the self-index in the outer vec to represent "missing" slots in the inner vec, to abusedist[[a, a]] == 0.0
), and then multiply the final result by0.5
to remove the double-count effect. But that would 100% be a different PR lol.edit: that vectorisation wouldn't survive a conversion of the extended set into a sequence of layers, so probably not worth trying at all tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most important thing for performance here was just to remove the branching; the floating-point additions are probably mostly pipelined at more than a 2x rate compared to the cycle count of the operation (especially if the compiler's able to find some SIMD vectorisation). I also made this a flat-map - I didn't see much of a difference from that, but it looks a shade more natural now anyway.
Depending on where we go with the extended set in the future, I might look into rearranging the data structure to allow explicit SIMD vectorisation in all the calculations, but I'll leave that til after we know what we're going to do with a potential layer structure on the extended set.
Done in 0475b3b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that a further re-organisation of the data structure to store more in order to have a direct iterator over each gate exactly once without branching might help as well, but I want to wait on that til we've looked at what we're going to do about building up the extended set as a delta, because this PR still offers a speed-up over the status quo, and I don't want to get into premature optimisation that might become obsolete as soon as we add the extra requirement that it must be possible to remove single gates from the
ExtendedSet
.