Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set max_trials for VF2Layout in preset pass managers. #10054

Merged
merged 6 commits into from
May 2, 2023

Conversation

kevinhartman
Copy link
Contributor

Summary

By setting max_trials, we limit the number of layouts enumerated and scored when iterating through vf2_mapping().

This is necessary for scoring to complete in a reasonable amount of time for circuits with many connected components on larger (e.g. 400 qubit) devices.

Details and comments

These limits were chosen using a fake 400 qubit device, using 200 connected components, where each component is a single CX gate.

Because layout scoring scales linearly with the number of qubits in the circuit, 250,000 (O3) takes abount a minute, 25,000 (O2) takes about 6 seconds, and 2,500 (O1) takes less than a second.

By adding these limits, we may miss out on better layouts beyond the options we evaluate. To mitigate this, a separate PR will follow that introduces presorting of the interaction graph and the coupling map graph which coerces VF2 to place the heaviest / most error inducing portions of the interaction graph onto the best quality qubits, leading to better layouts being found with fewer trials.

Related to #9834, which can be resolved once the follow-on PR lands.

By setting max_trials, we limit the number of layouts
enumerated and scored when iterating through vf2_mapping().

This is necessary for scoring to complete in a reasonable
amount of time for circuits with many connected components
on larger (e.g. 400 qubit) devices.

These limits were chosen using a fake 400 qubit device,
using 200 connected components, where each component is
a single CX gate.

Because layout scoring scales linearly
with the number of qubits in the circuit, 250,000 (O3)
takes abount a minute, 25,000 (O2) takes about 6 seconds,
and 2,500 (O1) takes less than a second.
@kevinhartman kevinhartman added this to the 0.24.0 milestone May 1, 2023
@kevinhartman kevinhartman requested a review from a team as a code owner May 1, 2023 20:29
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

@mtreinish mtreinish added priority: high Changelog: API Change Include in the "Changed" section of the changelog Changelog: Bugfix Include in the "Fixed" section of the changelog labels May 1, 2023
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct for VF2Layout, but I think we also need to set the limit on VF2PostLayout

Comment on lines 4 to 13
The maximum number of trials evaluated when searching for the best
layout using :class:`.VF2Layout` is now limited in
:func:`qiskit.transpiler.preset_passmanagers.level_1_pass_manager`,
:func:`qiskit.transpiler.preset_passmanagers.level_2_pass_manager`,
and
:func:`qiskit.transpiler.preset_passmanagers.level_3_pass_manager`
to ``2,500``, ``25,000``, and ``250,000``, respectively. Previously,
all possible layouts were evaluated. To perform a full search as
before, manually run :class:`.VF2PostLayout` over the transpiled circuit,
in strict mode, specifying ``0`` for ``max_trials``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a little tweaking both to explain the rationale behind why we're making this change, and also elaborating on exactly how to restore the previous behavior with an example. But, I'll just do this manually in the release notes roundup as we prepare 0.24.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to add an example, though does the rationale belong in the upgrade text section?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well for things like this where we're making a potentially breaking change I always stress the importance of the "why" so we're explaining to users why we're potentially breaking them. It just provides context to users so hopefully they understand the rationale behind why we're changing the behavior of the code and aren't as angry with us :) . In this case we have a good justification because if we didn't truncate the search we would be spending a near infinite amount of time evaluating different layouts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bit more context in 9c0e5e8. Perhaps we should still add a code example in the release notes roundup as you mentioned.

@@ -133,6 +133,7 @@ def _vf2_match_not_found(property_set):
call_limit=int(5e4), # Set call limit to ~100ms with rustworkx 0.10.2
properties=backend_properties,
target=target,
max_trials=int(2500), # Limits layout scoring to < 600ms on ~400 qubit devices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about VF2PostLayout, shouldn't we be setting this there too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 9c0e5e8 😄

qiskit/transpiler/preset_passmanagers/level1.py Outdated Show resolved Hide resolved
qiskit/transpiler/preset_passmanagers/level2.py Outdated Show resolved Hide resolved
qiskit/transpiler/preset_passmanagers/level3.py Outdated Show resolved Hide resolved
@mtreinish mtreinish added the stable backport potential The bug might be minimal and/or import enough to be port to stable label May 1, 2023
@mtreinish mtreinish self-assigned this May 1, 2023
@coveralls
Copy link

coveralls commented May 1, 2023

Pull Request Test Coverage Report for Build 4862652824

  • 17 of 17 (100.0%) changed or added relevant lines in 2 files are covered.
  • 31 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.04%) to 85.892%

Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 2 91.39%
qiskit/extensions/quantum_initializer/squ.py 2 80.0%
qiskit/transpiler/preset_passmanagers/common.py 9 93.09%
crates/qasm2/src/parse.rs 18 96.65%
Totals Coverage Status
Change from base Build 4829590605: -0.04%
Covered Lines: 71038
Relevant Lines: 82706

💛 - Coveralls

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM, the only thing that is out of scope for something we can backport is the deprecation of the old limit function. We have to wait until 0.25.0 to properly deprecate that and start the timer towards removal. Other than that I think this is good to go. Thanks for the quick update.

Comment on lines 550 to 553
@deprecate_func(
additional_msg="Instead, use ``get_vf2_limits``.",
since="0.24.0",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit too quick, we can't deprecate it until after we're included the alternative in the release, see: https://qiskit.org/documentation/deprecation_policy.html#removing-a-feature so can you remove the decorator and the release note for this. We can do this for 0.25.0, but not for this PR which we'll need to backport to 0.24.0 (alternatively you can set pending=True on the decorator and remove the release not to downgrade the warning emitted to PendingDeprecationWarning)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's right, thanks for the reminder. I removed the decorator as well as the release note in 5ef7a94.

mtreinish
mtreinish previously approved these changes May 2, 2023
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this.

@mtreinish mtreinish enabled auto-merge May 2, 2023 14:18
@mtreinish mtreinish added this pull request to the merge queue May 2, 2023
elif optimization_level == 3:
limits = VF2Limits(
int(3e7), # Set call limit to ~60 sec with rustworkx 0.10.2
250000, # Limits layout scoring to < 60 sec on ~400 qubit devices
Copy link
Contributor

@jlapeyre jlapeyre May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the need to count zeros is good. Maybe 25000 is near the readability threshold, but I'd still opt for making it more readable. Moreso for 250000. Some options are

int(2.5e5)
250_000

int(2.5e4)
25_000

It's not obvious to me whether the LHS or RHS of 30_000_000 == int(3e7) is more readable. Some researchers measure comprehension speed with clocks, but I couldn't find a study on this specific case. I guess in this file it makes sense to use one of either underscores or int uniformly.

Copy link
Member

@mtreinish mtreinish May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I personally don't view it as a big deal, the number here is just an arbitrary limit that was found by experimentation, the actual numeric value isn't really relevant. My personal preference would be for 2.5e5 as that is how I reason about this in my head because it's more about the order of magnitude between each optimization level. But I can personally see that just as well with how it's written now (not so much by the absolute zero count but by the relative spacing differences). What I view as far more important here is the comment explaining where the value came from because that's what we're really setting is a wall time limit as a second order effect.

@jlapeyre jlapeyre removed this pull request from the merge queue due to a manual request May 2, 2023
@jlapeyre
Copy link
Contributor

jlapeyre commented May 2, 2023

I removed this from the queue because I made a comment and wanted to look at it a bit more, maybe a short hour. If it is urgent to merge quickly, you can add it again.

@mtreinish mtreinish added this pull request to the merge queue May 2, 2023
@mtreinish
Copy link
Member

I removed this from the queue because I made a comment and wanted to look at it a bit more, maybe a short hour. If it is urgent to merge quickly, you can add it again.

Please don't do this in the future unless it's a bug in the code or something we shouldn't be merging. Removing it from the merge queue causes a queue flush and resets everything enqueued behind it (luckily in this case it's just a single PR so it's an hour delay for merging this). I think a small question about style choices like how we represent large numbers in code really isn't worth that interruption especially since it's simple to change in a followup if there is consensus against what already merged

@jlapeyre
Copy link
Contributor

jlapeyre commented May 2, 2023

I noticed there was only one item in the queue.

@jlapeyre
Copy link
Contributor

jlapeyre commented May 2, 2023

EDIT This is resolved in the following comment.

With this PR, this function
https://github.com/Qiskit/qiskit-terra/blob/112bd6ea29f8605aea79a191ea9e0c40fb1ba5ea/qiskit/transpiler/preset_passmanagers/common.py#L542
is no longer called.

It should probably be removed. Or if it's in the API deprecated.

@mtreinish
Copy link
Member

With this PR, this function

https://github.com/Qiskit/qiskit-terra/blob/112bd6ea29f8605aea79a191ea9e0c40fb1ba5ea/qiskit/transpiler/preset_passmanagers/common.py#L542

is no longer called.

It should probably be removed. Or if it's in the API deprecated.

See the earlier discussion here: #10054 (comment) since it's a public api the intent is to deprecate it in 0.25.0. But we need to release the alternative first before we can mark the old function as deprecated.

Merged via the queue into Qiskit:main with commit f13b1ed May 2, 2023
mergify bot pushed a commit that referenced this pull request May 2, 2023
* Set max_trials for VF2Layout in preset pass managers.

By setting max_trials, we limit the number of layouts
enumerated and scored when iterating through vf2_mapping().

This is necessary for scoring to complete in a reasonable
amount of time for circuits with many connected components
on larger (e.g. 400 qubit) devices.

These limits were chosen using a fake 400 qubit device,
using 200 connected components, where each component is
a single CX gate.

Because layout scoring scales linearly
with the number of qubits in the circuit, 250,000 (O3)
takes abount a minute, 25,000 (O2) takes about 6 seconds,
and 2,500 (O1) takes less than a second.

* Address review comments.

* Return tuple of None instead for finer control and a better interface.

* Add deprecation notice to release note.

* Remove deprecation until 0.25.0.

* Remove unused import.

(cherry picked from commit f13b1ed)
mtreinish pushed a commit that referenced this pull request May 2, 2023
…10061)

* Set max_trials for VF2Layout in preset pass managers.

By setting max_trials, we limit the number of layouts
enumerated and scored when iterating through vf2_mapping().

This is necessary for scoring to complete in a reasonable
amount of time for circuits with many connected components
on larger (e.g. 400 qubit) devices.

These limits were chosen using a fake 400 qubit device,
using 200 connected components, where each component is
a single CX gate.

Because layout scoring scales linearly
with the number of qubits in the circuit, 250,000 (O3)
takes abount a minute, 25,000 (O2) takes about 6 seconds,
and 2,500 (O1) takes less than a second.

* Address review comments.

* Return tuple of None instead for finer control and a better interface.

* Add deprecation notice to release note.

* Remove deprecation until 0.25.0.

* Remove unused import.

(cherry picked from commit f13b1ed)

Co-authored-by: Kevin Hartman <kevin@hart.mn>
king-p3nguin pushed a commit to king-p3nguin/qiskit-terra that referenced this pull request May 22, 2023
* Set max_trials for VF2Layout in preset pass managers.

By setting max_trials, we limit the number of layouts
enumerated and scored when iterating through vf2_mapping().

This is necessary for scoring to complete in a reasonable
amount of time for circuits with many connected components
on larger (e.g. 400 qubit) devices.

These limits were chosen using a fake 400 qubit device,
using 200 connected components, where each component is
a single CX gate.

Because layout scoring scales linearly
with the number of qubits in the circuit, 250,000 (O3)
takes abount a minute, 25,000 (O2) takes about 6 seconds,
and 2,500 (O1) takes less than a second.

* Address review comments.

* Return tuple of None instead for finer control and a better interface.

* Add deprecation notice to release note.

* Remove deprecation until 0.25.0.

* Remove unused import.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: API Change Include in the "Changed" section of the changelog Changelog: Bugfix Include in the "Fixed" section of the changelog priority: high stable backport potential The bug might be minimal and/or import enough to be port to stable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants