Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added target ability for stablizerstate probabilities_dict performance enhacement #11947

Conversation

DerekMiner
Copy link
Contributor

@DerekMiner DerekMiner commented Mar 4, 2024

Summary

fixes #11425

The focus of this enhancement was to increase performance of probabilities_dict method in StabilizerState calculations when a user does not need the results for all possible values. It was discussed and made the most sense to add a new method called probabilities_dict_from_bitstrings for passing in this target list. The user can pass a string or list of strings for the targets to be calculated. The calculations will no longer need to calculate 2^n results, but at maximum of num_of_targets.

Details and comments

There is a huge performance benefit as the number of qubits used increases when using targets. Previously every possibility for possibilities_dict method would need to be calculated which grew exponentially as the number of qubits increases. Now if a user needs only a handful of targets, only the necessary calculations to arrive at those results will be calculated when using the new method probabilities_dict_from_bitstrings()

For example, in my testing when using 12 qubits, which causes up to 4096 targets to be calculated, this would effectively require 2^(n+1) - 1 iterations (8191) at most through the stabilizerstate _get_probabilities helper method, even if a user only wanted to calculate the probabilities for a small list of target values.

A bool parm, use_caching, was added in the probabilities_dict_from_bitstrings() method to let the user enable or disable the use of caching previous probability measurements and quantumstate. This allows for a more strategic starting point when measuring multiple targets, where a previously partially traversed target may have already measured and can speed up the calculation of the target probability. The user can disable this if they do not want to rely on caching or want to compare the performance difference.

With this improvement, now when using targets, if you user passed for example 2 targets to be calculated with caching_setting_disabled, it would require at most 12 iterations per target to be calculated, for a total of only 24 iterations through the StabilizerState _get_probabilities method.

This is only 0.14% of the previous probabilities to be calculated for 1 target, and 0.28% of the previous probabilities measurements for 2 targets.

By default caching is enabled, which increases the performance even more, if previously traversed measured qubits in the same path were previously calculated, and can be used for the next targets.

For 2 targets this would lead to at worst needing (12 + 12) qubits measured if traversing very different targets such as:
"000000111111" and "111111000000"

or at best (12 + 1) qubits measured if traversing measurements of targets very close to another such as:
"111111111111" and "011111111111"

The closer the branches are for the target inputs, the better the method will perform. With this enhancement only the necessary branches required to calculate the probabilities for the targets will be performed, with almost no repetitive measurements.

I added tests specifically to prove there is a performance increase when using targets that compares the time taken to measure the results, and then compares the results of a targeted tests vs non-targeted tests to prove the performance is significantly increased, as well as verifying the results. I found when calculating 2 close targets, I gained with caching enabled for 12 qubits, that it took around 50% of the time to get both results, vs if not using caching. I also added a test that proves that caching increases the performance as less qubits need to be measured in the code path, vs calculating targets without caching.

Caching would not of been needed if only a single target was passed in, but if a user want's a list of target results to return it can significantly help with a better starting point to lower the amount of repetitive measurements.

added new method probabilities_dict_from_bitstrings, moved logic from probabilities_dict to probabilities_dict_from_bitstrings with ability to pass target, probabilities_dict will pass a target of None.
…bability for qubit

Added some timing of the tests to determine if the version with a specfic target runs faster then the non-targetted version (which must calculate all 2^n possibilities for stabilizer state test_probablities_dict_single_qubit. When given a target it can skip branches that were not wanting to be calculated
simplified the method probabilities_dict_from_bitstrings, less checking was needed, simpler handling of no target value passed in
In stabilizerstate, added the ability to cache previously calculated branches so they do not get recalculated when attempting to get target values, gives a better starting point to calculate from instead of starting from an unknown probability value of 1.0
when creating an array for the outcome in probability, it is only built if the cache key is not found
…disabling cache

added more optimization for performance when performing calculations for branches using caching and targets, it will now also store the deterministic node values which gives a very slight performance increase, migrated to most efficient ways to store and retrieve cache values. The performance increase when using a large amount of qubits is huge, when using a very small amount of qubits the very slight over head of caching makes it about the same. Increasing test coverage for using targets with probabilities
implemented more accurate tests that verify probability caching
for some calculations the cached results were in the wrong place which could cause false results for certain test cases with 3+ qubits
overhead was too high compared to calculating
Fixed failing test that would randomly occur, the stabilizer state was not being cached as well, which would sometimes produce a failure. After caching the object to restore when calculating starting at a node down the branch, this failure no longer occurs. Need to optimize and clean up the code a bit, but functioning
DerekMiner and others added 4 commits March 11, 2024 07:26
removed the use of typing module for type hinting, looks like python wants to get away from using this module for type hinting, also fixed some commenting issues in stabilizerstate, test_probabilitycache, test_stabilizerstate, and probabilitycache files
@DerekMiner
Copy link
Contributor Author

Officially done with changes for this unless anything reviewers would like changed, fixed up the last bit of documentation issues I could find

DerekMiner and others added 5 commits March 11, 2024 13:45
…ptable-object)

fixing tests failing with pylint when lint checking is performed. This worked locally and wasn't failing, but when it runs in the CI/CD process it fails, so adding from __future__ import annotations to avoid this failure for test_probabilitycache and test_stabilizerstate as new tests include type hinting
…ed cache calls

Fixed a small error in my commenting for the test_stabilizerstate.py and forced use_caching parm in the call of probabilities_dict_from_bitstrings in the tests, even tho it is used as default, to make sure it is clear in the tests. Simplfied the adding values to Probabilitycache, and simplified the code in StabilizerState._get_probabilities() setting values in cache and retrieving the state when using cache, made cache_key in probabilitycache "private" and changed method to _cache_key
@DerekMiner
Copy link
Contributor Author

Seems when you step away from your code for a few days and look at it again you see some ways to simplify things, I simplified some logic in the StabilizerState _get_probabilities method and redid some commenting

DerekMiner and others added 7 commits March 15, 2024 10:48
Significant code refactoring in stabilizerstate. I was able to condense the use the cache to a smaller area in the _get_probabilities helper method so it is easier to work with and more clear. I removed methods I created that were single use helper methods that were able to be condensed down into simpler code such as _branches_to_measure, _retrieve_deterministic_probability, _branches_to_measure, _is_qubit_deterministic. Since these are 1 time use methods it didn't make sense to keep them around. The _get_probabilities helper method is much simpler and clearer, closer to the original with less tweaks and functioning with the same performance, and more clarity.  Most of the code changed at this point is adding tests, most of this commit was removing unnecessary code and refactoring
there was a check to make sure a key could be found before that was removed in the refactoring, once and awhile you get a target that can't be found in the cache because of the algorithm of looking for a key by increasing the number of X from the left to the right. There are situations where the X would be in the middle such as 0X1 that won't be found. The algorithm in the probability cache could be changed to use BST which would very rarely help performance, the fix is to make sure a key can be found and then only use the cache if that is the case
…ance

there is a slight overhead when recursively iterating through the _get_probabilities helper when having to check if caching should be retrieved. Realistically when you use caching it is just to get you a better starting point, so it only needs to be retrieved once. The cache inserting remains in the _get_probabilities helper to build up the cache, but the logic was significantly simplified in the _get_probabilities helper, and now only a 1 time cache retrieval is performed for each target via the probabilities_dict_from_bitstrings method
@ShellyGarion
Copy link
Member

Seems when you step away from your code for a few days and look at it again you see some ways to simplify things, I simplified some logic in the StabilizerState _get_probabilities method and redid some commenting

There is no need to merge this with main every time. It would be done automatically when this PR will be merged.
Just let us know that you finished updating it, and that it's ready for review.

@DerekMiner
Copy link
Contributor Author

Seems when you step away from your code for a few days and look at it again you see some ways to simplify things, I simplified some logic in the StabilizerState _get_probabilities method and redid some commenting

There is no need to merge this with main every time. It would be done automatically when this PR will be merged. Just let us know that you finished updating it, and that it's ready for review.

Sounds good, I am officially done with code changes at this point and it is ready for review.

I didn't expect to perform so many changes after converting it out of draft, in the future I will keep it in draft until I am 100% ready.

Copy link
Member

@ShellyGarion ShellyGarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DerekMiner - thank you very much for your contribution to Qiskit.
This is just a preliminary feedback, as this PR turned to be more complicated than I originally expected.

Initially I thought that there would be only one outcome that would be retrieved, so in this case the code should be much simpler as no caching is needed.
I agree that this way one can retrieve several outcomes more efficiently using caching, but I would need the help of other Qiskit core team members to review the caching code.

Generally, it's not easy to check the correctness of this code.
The tests include StabilizerState on small number of qubits. It's worth to think of tests that include large number of qubits, and see if they produce the expected results.
For example, a QuantumCircuit with H gates on all qubits, or large states of the form |0...0>+|1....1> or other more complicated Clifford circuits.

self,
qargs: None | list = None,
decimals: None | int = None,
targets: dict[str, any] | list[str] | str | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the targets parameter is a bit confusing, since there is also the transpiler's Target.
I would suggest to find a different name. Perhaps outcome_bitstring ?
(this is relevant to the entire PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, I agree that outcome_bitstring is clearer

from qiskit.quantum_info.states.quantum_state import QuantumState


class ProbabilityCache:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if some of the functionality here could already appears in ProbDistribution ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like possibly this could be used if this PR were to go the route of continuing to allow multiple target bitstrings, functionality could be added to get the most closely measured bitstring to what is stored in the dict here

@DerekMiner
Copy link
Contributor Author

DerekMiner commented Apr 3, 2024

Thanks for the feedback @ShellyGarion !

@DerekMiner - thank you very much for your contribution to Qiskit. This is just a preliminary feedback, as this PR turned to be more complicated than I originally expected.

Initially I thought that there would be only one outcome that would be retrieved, so in this case the code should be much simpler as no caching is needed. I agree that this way one can retrieve several outcomes more efficiently using caching, but I would need the help of other Qiskit core team members to review the caching code.

Looks like I misunderstood this part of the issue. There are 2 routes we could go here, we could continue with what I have and make some of the changes you mentioned, looks like it would require further review from others and possibly some other changes.
Or, I could remove the caching ability (remove ProbabilityCache class) and remove use of the caching in the get_probability_with_targets method, change the method to allow only passing one target str. This would simplify things here a bit. If a user wanted to get multiple targets they could call this method themselves multiple times, they would no longer get the benefit of the caching but not sure how much of a use case there is for wanting multiple measurements of a subset of target bitstrings, vs just a single target bitstring. This change would only requiring removing some of the code that was written, and simplify some of the tests a bit.

Generally, it's not easy to check the correctness of this code. The tests include StabilizerState on small number of qubits. It's worth to think of tests that include large number of qubits, and see if they produce the expected results. For example, a QuantumCircuit with H gates on all qubits, or large states of the form |0...0>+|1....1> or other more complicated Clifford circuits.

I agree, the more tests the better! In the class for testing this new functionality test/python/quantum_info/states/test_stabilizerstate.py, I had looked through how many qubits were currently being tested, every test I looked at were between 1 to a maximum of 5 qubits. I did add a test with 12 qubits which is much larger then what existed which past, but we could explore going higher then that and utilizing H gates in the test.

I think before I were to make any changes in either direction it would be good to get feedback if I should simplify this PR to a single target a user can pass, or continue on the path of the code I have with multiple targets and improving that.

@ShellyGarion ShellyGarion removed the request for review from ikkoham April 3, 2024 15:37
@DerekMiner
Copy link
Contributor Author

DerekMiner commented Apr 3, 2024

@ginsparg as you are the original creator of this issue, It would be good to get your feedback here.

Do you ever have a case where you want measurements of a small subset of bitstring targets (more then 1 but not all of them)? Or, is it only ever the case you would want 1 target bitstring that you would want measured, if you do not want all of the probabilities?

@ShellyGarion
Copy link
Member

The main use-case that I can think of is of an n-qubit StabilizerState when n is large (say, n=100). In this case, the StabilizerState is quadratic in the number of qubits, while obtaining all the outcomes can potentially grow in some cases as 2^n (e.g. where there is an H gate on all qubits). Hence, it makes sense to take only one outcome, or several outcomes whose number grows at most polynomially in n (it doesn't make sense to take half the 2^n outcomes for example). I'm not sure how these outcomes are spread and whether caching is beneficial in this case.

I would suggest to make a simpler PR allowing to take only one outcome (you can continue this PR or close it and open a new one). Later, if we think that there is a use-case perhaps you can open another PR that uses caching.

@DerekMiner
Copy link
Contributor Author

The main use-case that I can think of is of an n-qubit StabilizerState when n is large (say, n=100). In this case, the StabilizerState is quadratic in the number of qubits, while obtaining all the outcomes can potentially grow in some cases as 2^n (e.g. where there is an H gate on all qubits). Hence, it makes sense to take only one outcome, or several outcomes whose number grows at most polynomially in n (it doesn't make sense to take half the 2^n outcomes for example). I'm not sure how these outcomes are spread and whether caching is beneficial in this case.

I would suggest to make a simpler PR allowing to take only one outcome (you can continue this PR or close it and open a new one). Later, if we think that there is a use-case perhaps you can open another PR that uses caching.

Sounds great, I will close this PR and create a new one once I have the changes

@DerekMiner DerekMiner closed this Apr 4, 2024
@DerekMiner
Copy link
Contributor Author

@ShellyGarion what are your thoughts on the performance measuring I had added? should I continue verifying by measuring a single outcome bitstring performs faster then measuring all possible outcomes?

As this is a performance enhancement, it helps prove that it is performing less measurements and giving that performance benefit.
I could simplify it so I just check that the single target outcome is always faster then getting all the measurements which simplifies the checking. Or do you not recommend having this check?

There are issues with using certain time measurement methods with the windows and mac OS automated test environments that the testing process has. After running it through the tests dozens of times, I concluded at least with the automated test environment that the only valid way to get accurate measurements was to decide the method of the time checking with:

    @staticmethod
    def _perf_time_type_based_on_os():
        """Get time for performance checking based on OS

        Returns:
           time to use for compares of performance
        """
        if sys.platform == "win32":
            return time.perf_counter_ns()
        else:
            return time.thread_time_ns()

When using time.thread_time_ns() with the Windows test it would sometimes return the exact same value even after time had elapsed

@ShellyGarion
Copy link
Member

@DerekMiner - thanks for opening a new PR. I don't think we need these performance tests in the case of a single bitsting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community PR PRs from contributors that are not 'members' of the Qiskit repo
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

qiskit.quantum_info.StabilizerState needs get_probability() method
5 participants