-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set max_trials
for VF2Layout
in preset pass managers.
#10054
Conversation
By setting max_trials, we limit the number of layouts enumerated and scored when iterating through vf2_mapping(). This is necessary for scoring to complete in a reasonable amount of time for circuits with many connected components on larger (e.g. 400 qubit) devices. These limits were chosen using a fake 400 qubit device, using 200 connected components, where each component is a single CX gate. Because layout scoring scales linearly with the number of qubits in the circuit, 250,000 (O3) takes abount a minute, 25,000 (O2) takes about 6 seconds, and 2,500 (O1) takes less than a second.
Thank you for opening a new pull request. Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone. One or more of the the following people are requested to review this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct for VF2Layout
, but I think we also need to set the limit on VF2PostLayout
The maximum number of trials evaluated when searching for the best | ||
layout using :class:`.VF2Layout` is now limited in | ||
:func:`qiskit.transpiler.preset_passmanagers.level_1_pass_manager`, | ||
:func:`qiskit.transpiler.preset_passmanagers.level_2_pass_manager`, | ||
and | ||
:func:`qiskit.transpiler.preset_passmanagers.level_3_pass_manager` | ||
to ``2,500``, ``25,000``, and ``250,000``, respectively. Previously, | ||
all possible layouts were evaluated. To perform a full search as | ||
before, manually run :class:`.VF2PostLayout` over the transpiled circuit, | ||
in strict mode, specifying ``0`` for ``max_trials``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a little tweaking both to explain the rationale behind why we're making this change, and also elaborating on exactly how to restore the previous behavior with an example. But, I'll just do this manually in the release notes roundup as we prepare 0.24.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to add an example, though does the rationale belong in the upgrade text section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well for things like this where we're making a potentially breaking change I always stress the importance of the "why" so we're explaining to users why we're potentially breaking them. It just provides context to users so hopefully they understand the rationale behind why we're changing the behavior of the code and aren't as angry with us :) . In this case we have a good justification because if we didn't truncate the search we would be spending a near infinite amount of time evaluating different layouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a bit more context in 9c0e5e8. Perhaps we should still add a code example in the release notes roundup as you mentioned.
@@ -133,6 +133,7 @@ def _vf2_match_not_found(property_set): | |||
call_limit=int(5e4), # Set call limit to ~100ms with rustworkx 0.10.2 | |||
properties=backend_properties, | |||
target=target, | |||
max_trials=int(2500), # Limits layout scoring to < 600ms on ~400 qubit devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about VF2PostLayout
, shouldn't we be setting this there too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 9c0e5e8 😄
Pull Request Test Coverage Report for Build 4862652824
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this LGTM, the only thing that is out of scope for something we can backport is the deprecation of the old limit function. We have to wait until 0.25.0 to properly deprecate that and start the timer towards removal. Other than that I think this is good to go. Thanks for the quick update.
@deprecate_func( | ||
additional_msg="Instead, use ``get_vf2_limits``.", | ||
since="0.24.0", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit too quick, we can't deprecate it until after we're included the alternative in the release, see: https://qiskit.org/documentation/deprecation_policy.html#removing-a-feature so can you remove the decorator and the release note for this. We can do this for 0.25.0, but not for this PR which we'll need to backport to 0.24.0 (alternatively you can set pending=True
on the decorator and remove the release not to downgrade the warning emitted to PendingDeprecationWarning
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's right, thanks for the reminder. I removed the decorator as well as the release note in 5ef7a94.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for doing this.
elif optimization_level == 3: | ||
limits = VF2Limits( | ||
int(3e7), # Set call limit to ~60 sec with rustworkx 0.10.2 | ||
250000, # Limits layout scoring to < 60 sec on ~400 qubit devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the need to count zeros is good. Maybe 25000
is near the readability threshold, but I'd still opt for making it more readable. Moreso for 250000
. Some options are
int(2.5e5)
250_000
int(2.5e4)
25_000
It's not obvious to me whether the LHS or RHS of 30_000_000 == int(3e7)
is more readable. Some researchers measure comprehension speed with clocks, but I couldn't find a study on this specific case. I guess in this file it makes sense to use one of either underscores or int
uniformly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I personally don't view it as a big deal, the number here is just an arbitrary limit that was found by experimentation, the actual numeric value isn't really relevant. My personal preference would be for 2.5e5
as that is how I reason about this in my head because it's more about the order of magnitude between each optimization level. But I can personally see that just as well with how it's written now (not so much by the absolute zero count but by the relative spacing differences). What I view as far more important here is the comment explaining where the value came from because that's what we're really setting is a wall time limit as a second order effect.
I removed this from the queue because I made a comment and wanted to look at it a bit more, maybe a short hour. If it is urgent to merge quickly, you can add it again. |
Please don't do this in the future unless it's a bug in the code or something we shouldn't be merging. Removing it from the merge queue causes a queue flush and resets everything enqueued behind it (luckily in this case it's just a single PR so it's an hour delay for merging this). I think a small question about style choices like how we represent large numbers in code really isn't worth that interruption especially since it's simple to change in a followup if there is consensus against what already merged |
I noticed there was only one item in the queue. |
EDIT This is resolved in the following comment. With this PR, this function It should probably be removed. Or if it's in the API deprecated. |
See the earlier discussion here: #10054 (comment) since it's a public api the intent is to deprecate it in 0.25.0. But we need to release the alternative first before we can mark the old function as deprecated. |
* Set max_trials for VF2Layout in preset pass managers. By setting max_trials, we limit the number of layouts enumerated and scored when iterating through vf2_mapping(). This is necessary for scoring to complete in a reasonable amount of time for circuits with many connected components on larger (e.g. 400 qubit) devices. These limits were chosen using a fake 400 qubit device, using 200 connected components, where each component is a single CX gate. Because layout scoring scales linearly with the number of qubits in the circuit, 250,000 (O3) takes abount a minute, 25,000 (O2) takes about 6 seconds, and 2,500 (O1) takes less than a second. * Address review comments. * Return tuple of None instead for finer control and a better interface. * Add deprecation notice to release note. * Remove deprecation until 0.25.0. * Remove unused import. (cherry picked from commit f13b1ed)
…10061) * Set max_trials for VF2Layout in preset pass managers. By setting max_trials, we limit the number of layouts enumerated and scored when iterating through vf2_mapping(). This is necessary for scoring to complete in a reasonable amount of time for circuits with many connected components on larger (e.g. 400 qubit) devices. These limits were chosen using a fake 400 qubit device, using 200 connected components, where each component is a single CX gate. Because layout scoring scales linearly with the number of qubits in the circuit, 250,000 (O3) takes abount a minute, 25,000 (O2) takes about 6 seconds, and 2,500 (O1) takes less than a second. * Address review comments. * Return tuple of None instead for finer control and a better interface. * Add deprecation notice to release note. * Remove deprecation until 0.25.0. * Remove unused import. (cherry picked from commit f13b1ed) Co-authored-by: Kevin Hartman <kevin@hart.mn>
* Set max_trials for VF2Layout in preset pass managers. By setting max_trials, we limit the number of layouts enumerated and scored when iterating through vf2_mapping(). This is necessary for scoring to complete in a reasonable amount of time for circuits with many connected components on larger (e.g. 400 qubit) devices. These limits were chosen using a fake 400 qubit device, using 200 connected components, where each component is a single CX gate. Because layout scoring scales linearly with the number of qubits in the circuit, 250,000 (O3) takes abount a minute, 25,000 (O2) takes about 6 seconds, and 2,500 (O1) takes less than a second. * Address review comments. * Return tuple of None instead for finer control and a better interface. * Add deprecation notice to release note. * Remove deprecation until 0.25.0. * Remove unused import.
Summary
By setting
max_trials
, we limit the number of layouts enumerated and scored when iterating throughvf2_mapping()
.This is necessary for scoring to complete in a reasonable amount of time for circuits with many connected components on larger (e.g. 400 qubit) devices.
Details and comments
These limits were chosen using a fake 400 qubit device, using 200 connected components, where each component is a single CX gate.
Because layout scoring scales linearly with the number of qubits in the circuit, 250,000 (O3) takes abount a minute, 25,000 (O2) takes about 6 seconds, and 2,500 (O1) takes less than a second.
By adding these limits, we may miss out on better layouts beyond the options we evaluate. To mitigate this, a separate PR will follow that introduces presorting of the interaction graph and the coupling map graph which coerces VF2 to place the heaviest / most error inducing portions of the interaction graph onto the best quality qubits, leading to better layouts being found with fewer trials.
Related to #9834, which can be resolved once the follow-on PR lands.