Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove threadprivate memaloc in omp #1392

Merged
merged 1 commit into from
Dec 1, 2021

Conversation

hhorii
Copy link
Collaborator

@hhorii hhorii commented Nov 26, 2021

Summary

#1384 is not enough to fix all the cases.
This fix will resolve issues related to allocate temporal memory in avx.

Details and comments

The fix of #1384 is not enough to run following codes.

from qiskit import transpile
from qiskit.circuit.library import QFT
from qiskit.providers.aer import AerSimulator

circ = QFT(20)
circ.measure_all()

simulator = AerSimulator()

result = simulator.run([transpile(circ, simulator, basis_gates=['u3', 'cx'])] * 4,
                       **{'_parallel_experiments': 2,
                          '_parallel_state_update': 2,
                          'shots': 10}).result()

Because of the structure of codes, temporal memory is allocated and freed in the other OMP blocks in avx codes.
There is no guarantee to use the same set of threads to run these blocks, especially when nested omp is enabled.

This PR changes memory allocation to malloc redundant spaces. Though memory is consumed slightly more, memory is allocated and freed always-correctly.

@hhorii hhorii force-pushed the avx_remove_tmp_mem_aloc branch from cae9d0a to 1363940 Compare November 26, 2021 11:08
Comment on lines 1201 to 1195
auto input_data = _load_diagonal_input(in_vec, double_tls, i, qregs, qregs_size, q0_mask);
auto input_data = _load_diagonal_input(in_vec, double_tmp, i, qregs, qregs_size, q0_mask);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the OpenMP case here, how are we ensuring that there's no thread collision in the pointer? I see that double_tmp is allocated space for 2 * omp_max_num_threads complex doubles, but _load_diagonal_input seems to access this pointer at 0 offset. Should the lambda get its thread number outside of the loop, and access double_tmp at an offset of 2 * omp_get_thread_num() (or 0 if not OpenMP)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks... This is my mistake. The reason why memory is allocated with thradprivate was to avoid conflict. Allocating two double or four float are necessary for each thread. double_tmp and float_tmp must be accessed via indexers.

@hhorii hhorii force-pushed the avx_remove_tmp_mem_aloc branch 2 times, most recently from aa10c9c to cddd49c Compare November 29, 2021 06:40
@hhorii hhorii force-pushed the avx_remove_tmp_mem_aloc branch from cddd49c to b11cf37 Compare November 29, 2021 08:12
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. I had wondered if it were possible to allocate double_tmp on the stack instead of on the heap to avoid the unpleasant call to malloc, but I'm guessing the issue is that omp_get_num_threads() is only known at run-time?

@hhorii
Copy link
Collaborator Author

hhorii commented Nov 29, 2021

@jakelishman Thank you so much. Results of omp_get_max_threads() and omp_get_thread_num() depend on OMP settings and thread-scheduling. I think alloca() is not necessary because double_tmp will be always on LLC.

@chriseclectic chriseclectic merged commit 8ac51d8 into Qiskit:main Dec 1, 2021
hhorii added a commit to hhorii/qiskit-aer that referenced this pull request Jan 25, 2022
hhorii added a commit that referenced this pull request Feb 9, 2022
hhorii added a commit to hhorii/qiskit-aer that referenced this pull request Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants