-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove threadprivate memaloc in omp #1392
Conversation
cae9d0a
to
1363940
Compare
auto input_data = _load_diagonal_input(in_vec, double_tls, i, qregs, qregs_size, q0_mask); | ||
auto input_data = _load_diagonal_input(in_vec, double_tmp, i, qregs, qregs_size, q0_mask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the OpenMP case here, how are we ensuring that there's no thread collision in the pointer? I see that double_tmp
is allocated space for 2 * omp_max_num_threads
complex doubles, but _load_diagonal_input
seems to access this pointer at 0 offset. Should the lambda get its thread number outside of the loop, and access double_tmp
at an offset of 2 * omp_get_thread_num()
(or 0 if not OpenMP)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks... This is my mistake. The reason why memory is allocated with thradprivate
was to avoid conflict. Allocating two double
or four float
are necessary for each thread. double_tmp
and float_tmp
must be accessed via indexers.
aa10c9c
to
cddd49c
Compare
cddd49c
to
b11cf37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct to me. I had wondered if it were possible to allocate double_tmp
on the stack instead of on the heap to avoid the unpleasant call to malloc
, but I'm guessing the issue is that omp_get_num_threads()
is only known at run-time?
@jakelishman Thank you so much. Results of |
Summary
#1384 is not enough to fix all the cases.
This fix will resolve issues related to allocate temporal memory in avx.
Details and comments
The fix of #1384 is not enough to run following codes.
Because of the structure of codes, temporal memory is allocated and freed in the other OMP blocks in avx codes.
There is no guarantee to use the same set of threads to run these blocks, especially when nested omp is enabled.
This PR changes memory allocation to malloc redundant spaces. Though memory is consumed slightly more, memory is allocated and freed always-correctly.