Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMP-parallel MPI send #310

Merged
merged 102 commits into from
Feb 17, 2023

Conversation

conradtchan
Copy link
Collaborator

@conradtchan conradtchan commented Aug 1, 2022

Type of PR:
modification to existing code

Description:

Profiling of MPI runs has shown that the OMP critical sections for sending cells over MPI is a bottleneck. Threads wait a significant amount of time for other threads to finish sending.

This PR implements an individual send/receive buffer for each OMP thread in the form of a threadprivate variable, allowing it to call the MPI send/receive independently. The receive stacks are still shared, but OMP atomic operations are used to write to them instead of critical sections, which significantly improves performance.

A consequence is that multiple cells may be waiting to be received from a given MPI task, so the receive method is modified to loop over all waiting receives, rather than just receive one cell. The receive method is contained in an OMP single section to prevent threads from attempting to receive back-to-back, since the first thread will have processed all of the pending receives.

The mcmodel=medium flag needs to be removed because it causes it causes a section type conflict with the GCC compiler. This flag has not been necessary since dynamic memory allocation was implemented.

Testing:
Test suite.

Testing on more than 1 node is difficult to do automatically because the CI can't run multi-node MPI jobs. This is tested by hand on OzSTAR using 2 nodes, 8 MPI tasks per node, 16 threads per task. This is aligned with the 8 NUMA nodes per compute node. In principle, if it works on one node, multiple nodes should be fine because the only difference is the transport layer, which should be independent to the code.

Did you run the bots? yes/no

with this change only, omp threads will still send sequentially, but using separate buffers
amend previous commit: typo
error occurs on github runners, may be caused by mcmodel=medium
this causes a section type conflict with threadprivate saved variables, and has been unnecessary since dynamic memory allocation was implemented
to avoid conflict between SAVE attribute and automatic arrays
to allow critical omp section to be applied on the recv part only
counters must be passed through as arguments to be threadsafe
and remove unused arguments from check_send_finished_force
to avoid future namespace clashes when new varibales are introduced
otherwise ifort throws a runtime error during dens and force when using debug flags
they are unnecessary because the routines are not omp parallel
this allows cells to be received even when the working cell is purely local, unblocking other MPI threads
@danieljprice danieljprice marked this pull request as ready for review January 31, 2023 22:55
@conradtchan conradtchan reopened this Feb 17, 2023
@danieljprice danieljprice merged commit 036a67d into danieljprice:master Feb 17, 2023
s-neilson pushed a commit to s-neilson/phantom that referenced this pull request Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants