Improve the execution time of computing mean and var #787

AzizNgoueya · 2020-01-24T11:09:02Z

What is the expected enhancement?

Right now, computing the mean and var using a WeightedPauliOperator and results on 25 qubits take long time just for one iteration(1038 seconds). I tried to parallelize but it seems that it's deactivated on Windows according to this code: https://github.com/Qiskit/qiskit-terra/blob/792e1b7f866b9f3685566341fa6b4b54d5ba33e9/qiskit/tools/parallel.py#L115.
There is a reason for this?

Also if we look at this code https://github.com/Qiskit/qiskit-aqua/blob/44a94674e9d3937f277fb19112885fa6073048c4/qiskit/aqua/operators/weighted_pauli_operator.py#L774, the length of list of WeightedPauliOperator and results counts is superior to 1 if there are more than one pauli basis. This important because the parallel_map function can process this in several batch. However in my code, i have only one basis but it's big, so it's executed in one batch and takes so long time.

woodsp-ibm · 2020-01-24T12:34:11Z

The reason why the paralliation is not done under Windows is due to differences in how its done across the OS platforms and that it could not be used due to that.

basis is described in the WeightedPauliOperator constructor

            basis (list[tuple(object, [int])], optional): the grouping basis, each element is a
                                                          tuple composed of the basis
                                                          and the indices to paulis which are
                                                          belonged to that group.
                                                          e.g., if tpb basis is used, the object
                                                          will be a pauli.
                                                          By default, the group is equal to
                                                          non-grouping, each pauli is its own basis.

By default their is no grouping of the paulis and the expectation value of each is computed seperately. VQE however, by default, will convert a WeightedPauliOperator to a TPBGroupedWeightedPauliOperator which will group them by Tensor Product Basis.

Performance is something we continually are looking to improved. One thing you can try to improved performance is to use the Aer qasm_simulator and set shots=1 (yes a single shot). This will then go via the use_simulator_snapshot_mode path instead. You should find this faster.

We are also looking again at operator and in particular at the way expectation and evolution are used. Performance as I stated is always a concern for us and we are always looking for improvements. Here is the Qiskit/RFCs#8 description of this upcoming effort.

AzizNgoueya · 2020-01-27T10:56:44Z

Thanks for your reply, i didn't notice that the WeigthedPauliOperator was converted to a TPBGroupedWeightedPauliOperator. And when i set the shots=1, i can see an improvement. But in my case this Operator has a length 1, so the function parallel_map will not parallelize this list into differents cpus even for a Linux machine according this line: https://github.com/Qiskit/qiskit-terra/blob/4990730a74ab7ae5fa72e494f143cd815ab1e8ae/qiskit/tools/parallel.py#L104.
Also, i tried to divide the TPBGroupedWeightedPauliOperator into multiple batch for computing the mean and variance( ie list1 of paulis + measurements, list2 of paulis + measurements...), and i get results in a better time than using one list of TPBGroupedWeightedPauliOperator (75 seconds against 291). I don't know if it's a good way to manipulate this type of operator but i got the same results for mean and not for the variance(i assume that it's the maner to calculate the variance for multiple batch). Could you tell me if i am wrong?

woodsp-ibm · 2020-01-27T16:51:02Z

When you say your operator is length of 1 do you mean number of Paulis in it? In the WeightedPauliOperator each Pauli string gets run separately as a circuit. In the TPBGrouped they are grouped by TensorProductBasis and each group is run in a single go. It you print_details() of the TPBOperator you can see the number of groups and which Pauli(s) are in each group. In this case the parallel map should be able to split by groups.

For parallelized operation the Aer simulator has a backend options for running in parallel too see https://qiskit.org/documentation/api/qiskit.providers.aer.backends.QasmSimulator.html

AzizNgoueya · 2020-01-27T17:30:48Z

Ho, it's clear now thanks. The TPBGrouped paulis used in my notebook are splitted in one group of 225 paulis, that explain why the parallelization is not performed.

And also separating these TPBGrouped is not a good(as i mentioned above) way because of the calculation of certain covariance is omitted.

I will see how to do the parallelization using directly the Aer simulator

yaelbh · 2020-01-28T07:49:02Z

Aer simulators parallelize by circuits and shots. However there is no parallelization over operator components in the expectation value calculation. @chriseclectic do you think it is worth pursuing, in the statevector, stabilizer, and MPS simulators? VQE instances have many thousands of components.

AzizNgoueya · 2020-01-28T09:31:50Z

I received a request from a client on this issue: https://github.ibm.com/IBM-Q-General/client-support/issues/123. Have you any idea of what i can say for now to improve their execution time( Reduce number of shots for example) and if their will be an improvement expected soon in the calculation of expectation value by the operator redesign?

yaelbh · 2020-01-28T09:41:34Z

How much improvement do you see when you set the number of shots to 1? I believe it would lead to significant improvement, and this is what I'd suggest to the clients.

AzizNgoueya · 2020-01-28T10:20:35Z

With shots=1 each step take approximately 45 seconds(Circuit creation+transpilation+measurement+energy evaluation), it's much more faster than 1024 shots. But, i'm wondering if the minimum energy is reached in reasonable steps.

yaelbh · 2020-01-28T10:54:56Z

What do you mean by "step"? In VQE circuit creation and transpliation are done only once. Then there are many (300 in the client's snippet) iterations of measurement and energy evaluation.

AzizNgoueya · 2020-01-28T13:54:42Z

Sorry i meant the time between two energy evaluation, I use the logging and i notice that for one step you mentioned above there are several energy evaluation is it right? And before each energy evaluation the logging shows a processes of circuit transpilation.

yaelbh · 2020-01-28T14:14:39Z

I guess that the transpilations that you see in the log is just the binding of parameters, which set real values into the parameterized circuit, thus transforming it into a real circuit that can be executed. But it doesn't perform a transpilation in the sense of optimizing the circuit.
I'll try to see if run time can be reduced.

woodsp-ibm · 2020-01-28T14:28:14Z

@AzizNgoueya just checking - are you using the newest official release of Aqua i.e. 0.6.2 that was released mid-last December? (You gave no indication of version when creating this issue and the client support issue linked above shows 0.6.0.) Only the newest version has the parameterized cct support in Aqua which speeds up the process by only having to transpile once.

AzizNgoueya · 2020-01-28T14:47:43Z

Ok, i used the 0.6.1 version of aqua, i will try with 0.6.2

woodsp-ibm · 2020-01-28T14:49:41Z

If you upgrade qiskit pip install -U qiskit that should get you the latest versions of everything so you have newest Aer simulator and Terra too.

AzizNgoueya · 2020-01-31T10:53:45Z

@woodsp-ibm the execution time is better with the 0.6.2 version aqua by setting shots=1. @yaelbh made me realize that with this option it's aer that supports the calculation of the expectation value. Thanks

woodsp-ibm · 2020-01-31T13:44:38Z

@AzizNgoueya Great, that's good to hear. How much did things improve for you in the end out of curiosity?

AzizNgoueya · 2020-01-31T14:35:22Z

@woodsp-ibm the energy evaluation takes approximately 100 seconds with 25 qubits(against more than 1000 seconds with 1024 shots). It's an incredible improvement.

AzizNgoueya closed this as completed Jan 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the execution time of computing mean and var #787

Improve the execution time of computing mean and var #787

AzizNgoueya commented Jan 24, 2020

woodsp-ibm commented Jan 24, 2020 •

edited

Loading

AzizNgoueya commented Jan 27, 2020

woodsp-ibm commented Jan 27, 2020

AzizNgoueya commented Jan 27, 2020

yaelbh commented Jan 28, 2020

AzizNgoueya commented Jan 28, 2020

yaelbh commented Jan 28, 2020

AzizNgoueya commented Jan 28, 2020

yaelbh commented Jan 28, 2020

AzizNgoueya commented Jan 28, 2020

yaelbh commented Jan 28, 2020

woodsp-ibm commented Jan 28, 2020

AzizNgoueya commented Jan 28, 2020

woodsp-ibm commented Jan 28, 2020

AzizNgoueya commented Jan 31, 2020

woodsp-ibm commented Jan 31, 2020

AzizNgoueya commented Jan 31, 2020

Improve the execution time of computing mean and var #787

Improve the execution time of computing mean and var #787

Comments

AzizNgoueya commented Jan 24, 2020

What is the expected enhancement?

woodsp-ibm commented Jan 24, 2020 • edited Loading

AzizNgoueya commented Jan 27, 2020

woodsp-ibm commented Jan 27, 2020

AzizNgoueya commented Jan 27, 2020

yaelbh commented Jan 28, 2020

AzizNgoueya commented Jan 28, 2020

yaelbh commented Jan 28, 2020

AzizNgoueya commented Jan 28, 2020

yaelbh commented Jan 28, 2020

AzizNgoueya commented Jan 28, 2020

yaelbh commented Jan 28, 2020

woodsp-ibm commented Jan 28, 2020

AzizNgoueya commented Jan 28, 2020

woodsp-ibm commented Jan 28, 2020

AzizNgoueya commented Jan 31, 2020

woodsp-ibm commented Jan 31, 2020

AzizNgoueya commented Jan 31, 2020

woodsp-ibm commented Jan 24, 2020 •

edited

Loading