Skip to content
This repository has been archived by the owner on Dec 7, 2021. It is now read-only.

Improve the execution time of computing mean and var #787

Closed
AzizNgoueya opened this issue Jan 24, 2020 · 17 comments
Closed

Improve the execution time of computing mean and var #787

AzizNgoueya opened this issue Jan 24, 2020 · 17 comments

Comments

@AzizNgoueya
Copy link

What is the expected enhancement?

Right now, computing the mean and var using a WeightedPauliOperator and results on 25 qubits take long time just for one iteration(1038 seconds). I tried to parallelize but it seems that it's deactivated on Windows according to this code: https://github.com/Qiskit/qiskit-terra/blob/792e1b7f866b9f3685566341fa6b4b54d5ba33e9/qiskit/tools/parallel.py#L115.
There is a reason for this?

Also if we look at this code https://github.com/Qiskit/qiskit-aqua/blob/44a94674e9d3937f277fb19112885fa6073048c4/qiskit/aqua/operators/weighted_pauli_operator.py#L774, the length of list of WeightedPauliOperator and results counts is superior to 1 if there are more than one pauli basis. This important because the parallel_map function can process this in several batch. However in my code, i have only one basis but it's big, so it's executed in one batch and takes so long time.

@woodsp-ibm
Copy link
Member

woodsp-ibm commented Jan 24, 2020

The reason why the paralliation is not done under Windows is due to differences in how its done across the OS platforms and that it could not be used due to that.

basis is described in the WeightedPauliOperator constructor

            basis (list[tuple(object, [int])], optional): the grouping basis, each element is a
                                                          tuple composed of the basis
                                                          and the indices to paulis which are
                                                          belonged to that group.
                                                          e.g., if tpb basis is used, the object
                                                          will be a pauli.
                                                          By default, the group is equal to
                                                          non-grouping, each pauli is its own basis.

By default their is no grouping of the paulis and the expectation value of each is computed seperately. VQE however, by default, will convert a WeightedPauliOperator to a TPBGroupedWeightedPauliOperator which will group them by Tensor Product Basis.

Performance is something we continually are looking to improved. One thing you can try to improved performance is to use the Aer qasm_simulator and set shots=1 (yes a single shot). This will then go via the use_simulator_snapshot_mode path instead. You should find this faster.

We are also looking again at operator and in particular at the way expectation and evolution are used. Performance as I stated is always a concern for us and we are always looking for improvements. Here is the Qiskit/RFCs#8 description of this upcoming effort.

@AzizNgoueya
Copy link
Author

Thanks for your reply, i didn't notice that the WeigthedPauliOperator was converted to a TPBGroupedWeightedPauliOperator. And when i set the shots=1, i can see an improvement. But in my case this Operator has a length 1, so the function parallel_map will not parallelize this list into differents cpus even for a Linux machine according this line: https://github.com/Qiskit/qiskit-terra/blob/4990730a74ab7ae5fa72e494f143cd815ab1e8ae/qiskit/tools/parallel.py#L104.
Also, i tried to divide the TPBGroupedWeightedPauliOperator into multiple batch for computing the mean and variance( ie list1 of paulis + measurements, list2 of paulis + measurements...), and i get results in a better time than using one list of TPBGroupedWeightedPauliOperator (75 seconds against 291). I don't know if it's a good way to manipulate this type of operator but i got the same results for mean and not for the variance(i assume that it's the maner to calculate the variance for multiple batch). Could you tell me if i am wrong?

@woodsp-ibm
Copy link
Member

When you say your operator is length of 1 do you mean number of Paulis in it? In the WeightedPauliOperator each Pauli string gets run separately as a circuit. In the TPBGrouped they are grouped by TensorProductBasis and each group is run in a single go. It you print_details() of the TPBOperator you can see the number of groups and which Pauli(s) are in each group. In this case the parallel map should be able to split by groups.

For parallelized operation the Aer simulator has a backend options for running in parallel too see https://qiskit.org/documentation/api/qiskit.providers.aer.backends.QasmSimulator.html

@AzizNgoueya
Copy link
Author

Ho, it's clear now thanks. The TPBGrouped paulis used in my notebook are splitted in one group of 225 paulis, that explain why the parallelization is not performed.

image

And also separating these TPBGrouped is not a good(as i mentioned above) way because of the calculation of certain covariance is omitted.

I will see how to do the parallelization using directly the Aer simulator

@yaelbh
Copy link
Contributor

yaelbh commented Jan 28, 2020

Aer simulators parallelize by circuits and shots. However there is no parallelization over operator components in the expectation value calculation. @chriseclectic do you think it is worth pursuing, in the statevector, stabilizer, and MPS simulators? VQE instances have many thousands of components.

@AzizNgoueya
Copy link
Author

I received a request from a client on this issue: https://github.ibm.com/IBM-Q-General/client-support/issues/123. Have you any idea of what i can say for now to improve their execution time( Reduce number of shots for example) and if their will be an improvement expected soon in the calculation of expectation value by the operator redesign?

@yaelbh
Copy link
Contributor

yaelbh commented Jan 28, 2020

How much improvement do you see when you set the number of shots to 1? I believe it would lead to significant improvement, and this is what I'd suggest to the clients.

@AzizNgoueya
Copy link
Author

With shots=1 each step take approximately 45 seconds(Circuit creation+transpilation+measurement+energy evaluation), it's much more faster than 1024 shots. But, i'm wondering if the minimum energy is reached in reasonable steps.

@yaelbh
Copy link
Contributor

yaelbh commented Jan 28, 2020

What do you mean by "step"? In VQE circuit creation and transpliation are done only once. Then there are many (300 in the client's snippet) iterations of measurement and energy evaluation.

@AzizNgoueya
Copy link
Author

Sorry i meant the time between two energy evaluation, I use the logging and i notice that for one step you mentioned above there are several energy evaluation is it right? And before each energy evaluation the logging shows a processes of circuit transpilation.

@yaelbh
Copy link
Contributor

yaelbh commented Jan 28, 2020

I guess that the transpilations that you see in the log is just the binding of parameters, which set real values into the parameterized circuit, thus transforming it into a real circuit that can be executed. But it doesn't perform a transpilation in the sense of optimizing the circuit.
I'll try to see if run time can be reduced.

@woodsp-ibm
Copy link
Member

@AzizNgoueya just checking - are you using the newest official release of Aqua i.e. 0.6.2 that was released mid-last December? (You gave no indication of version when creating this issue and the client support issue linked above shows 0.6.0.) Only the newest version has the parameterized cct support in Aqua which speeds up the process by only having to transpile once.

@AzizNgoueya
Copy link
Author

Ok, i used the 0.6.1 version of aqua, i will try with 0.6.2

@woodsp-ibm
Copy link
Member

If you upgrade qiskit pip install -U qiskit that should get you the latest versions of everything so you have newest Aer simulator and Terra too.

@AzizNgoueya
Copy link
Author

@woodsp-ibm the execution time is better with the 0.6.2 version aqua by setting shots=1. @yaelbh made me realize that with this option it's aer that supports the calculation of the expectation value. Thanks

@woodsp-ibm
Copy link
Member

@AzizNgoueya Great, that's good to hear. How much did things improve for you in the end out of curiosity?

@AzizNgoueya
Copy link
Author

@woodsp-ibm the energy evaluation takes approximately 100 seconds with 25 qubits(against more than 1000 seconds with 1024 shots). It's an incredible improvement.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants