-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMP-parallel MPI send #310
Merged
danieljprice
merged 102 commits into
danieljprice:master
from
ADACS-Australia:parallel-send
Feb 17, 2023
Merged
OpenMP-parallel MPI send #310
danieljprice
merged 102 commits into
danieljprice:master
from
ADACS-Australia:parallel-send
Feb 17, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
with this change only, omp threads will still send sequentially, but using separate buffers amend previous commit: typo
d7b5944
to
5978bce
Compare
error occurs on github runners, may be caused by mcmodel=medium
this causes a section type conflict with threadprivate saved variables, and has been unnecessary since dynamic memory allocation was implemented
to avoid conflict between SAVE attribute and automatic arrays
to allow critical omp section to be applied on the recv part only
counters must be passed through as arguments to be threadsafe
and remove unused arguments from check_send_finished_force
amend previous commit
to avoid future namespace clashes when new varibales are introduced
otherwise ifort throws a runtime error during dens and force when using debug flags
f389c5a
to
afe4e47
Compare
they are unnecessary because the routines are not omp parallel
this allows cells to be received even when the working cell is purely local, unblocking other MPI threads
but still requires critical sections because the stack is shared
because data host is gone
s-neilson
pushed a commit
to s-neilson/phantom
that referenced
this pull request
Mar 18, 2023
OpenMP-parallel MPI send
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Type of PR:
modification to existing code
Description:
Profiling of MPI runs has shown that the OMP critical sections for sending cells over MPI is a bottleneck. Threads wait a significant amount of time for other threads to finish sending.
This PR implements an individual send/receive buffer for each OMP thread in the form of a threadprivate variable, allowing it to call the MPI send/receive independently. The receive stacks are still shared, but OMP atomic operations are used to write to them instead of critical sections, which significantly improves performance.
A consequence is that multiple cells may be waiting to be received from a given MPI task, so the receive method is modified to loop over all waiting receives, rather than just receive one cell. The receive method is contained in an OMP single section to prevent threads from attempting to receive back-to-back, since the first thread will have processed all of the pending receives.
The
mcmodel=medium
flag needs to be removed because it causes it causes a section type conflict with the GCC compiler. This flag has not been necessary since dynamic memory allocation was implemented.Testing:
Test suite.
Testing on more than 1 node is difficult to do automatically because the CI can't run multi-node MPI jobs. This is tested by hand on OzSTAR using 2 nodes, 8 MPI tasks per node, 16 threads per task. This is aligned with the 8 NUMA nodes per compute node. In principle, if it works on one node, multiple nodes should be fine because the only difference is the transport layer, which should be independent to the code.
Did you run the bots? yes/no