OpenMP-parallel MPI send #310

conradtchan · 2022-08-01T02:34:22Z

Type of PR:
modification to existing code

Description:

Profiling of MPI runs has shown that the OMP critical sections for sending cells over MPI is a bottleneck. Threads wait a significant amount of time for other threads to finish sending.

This PR implements an individual send/receive buffer for each OMP thread in the form of a threadprivate variable, allowing it to call the MPI send/receive independently. The receive stacks are still shared, but OMP atomic operations are used to write to them instead of critical sections, which significantly improves performance.

A consequence is that multiple cells may be waiting to be received from a given MPI task, so the receive method is modified to loop over all waiting receives, rather than just receive one cell. The receive method is contained in an OMP single section to prevent threads from attempting to receive back-to-back, since the first thread will have processed all of the pending receives.

The mcmodel=medium flag needs to be removed because it causes it causes a section type conflict with the GCC compiler. This flag has not been necessary since dynamic memory allocation was implemented.

Testing:
Test suite.

Testing on more than 1 node is difficult to do automatically because the CI can't run multi-node MPI jobs. This is tested by hand on OzSTAR using 2 nodes, 8 MPI tasks per node, 16 threads per task. This is aligned with the 8 NUMA nodes per compute node. In principle, if it works on one node, multiple nodes should be fine because the only difference is the transport layer, which should be independent to the code.

Did you run the bots? yes/no

with this change only, omp threads will still send sequentially, but using separate buffers amend previous commit: typo

error occurs on github runners, may be caused by mcmodel=medium

this causes a section type conflict with threadprivate saved variables, and has been unnecessary since dynamic memory allocation was implemented

to avoid conflict between SAVE attribute and automatic arrays

to allow critical omp section to be applied on the recv part only

counters must be passed through as arguments to be threadsafe

and remove unused arguments from check_send_finished_force

amend previous commit

to avoid future namespace clashes when new varibales are introduced

otherwise ifort throws a runtime error during dens and force when using debug flags

they are unnecessary because the routines are not omp parallel

this allows cells to be received even when the working cell is purely local, unblocking other MPI threads

but still requires critical sections because the stack is shared

because data host is gone

OpenMP-parallel MPI send

conradtchan assigned dliptai Aug 1, 2022

make xsendbuf a threadprivate variable

5978bce

with this change only, omp threads will still send sequentially, but using separate buffers amend previous commit: typo

conradtchan force-pushed the parallel-send branch from d7b5944 to 5978bce Compare August 1, 2022 05:15

conradtchan added 21 commits August 1, 2022 16:11

debug: section type conflict

d400b86

error occurs on github runners, may be caused by mcmodel=medium

remove all mcmodel=medium compiler flags

d9eed1e

this causes a section type conflict with threadprivate saved variables, and has been unnecessary since dynamic memory allocation was implemented

use omp private instead of threadprivate

ff423e1

to avoid conflict between SAVE attribute and automatic arrays

call MPI_INIT with MPI_THREAD_MULTIPLE

22cf320

(dens) private irequestsend for each omp thread

eeefc63

(force) private irequestsend for each omp thread

0a73dad

(force) fission check_send_finished

51d1efb

to allow critical omp section to be applied on the recv part only

(mpiderivs) make status and mpierr local variables

455b0d6

(force) omp parallel send

2252f30

counters must be passed through as arguments to be threadsafe

(force) bundle all cell counters into one array

c124802

(force) change all single to master

a074ced

and remove unused arguments from check_send_finished_force

bugfix: argument declaration outside of PP flag

d4cd1ee

amend previous commit

(dens) repeat changes onto density routines

174c7d1

rename critical sections to avoid namespace clash

3167b3a

prefix all named critical sections with "crit_"

aa82f0d

to avoid future namespace clashes when new varibales are introduced

allocate a dummy cell_counter array when running in serial

1e1ce9c

otherwise ifort throws a runtime error during dens and force when using debug flags

[indent-bot] standardised indentation

60d25f0

bots

fd1fe52

Merge remote-tracking branch 'origin/master' into parallel-send

9c131c2

bots

eb5541b

deallocate cell_counters when running in serial

afe4e47

conradtchan force-pushed the parallel-send branch from f389c5a to afe4e47 Compare August 11, 2022 05:47

conradtchan added 2 commits August 12, 2022 10:50

(balance) remove omp critical sections

620758d

they are unnecessary because the routines are not omp parallel

(dens,force) call recv_cells from outside do_export conditional

6e31a0a

this allows cells to be received even when the working cell is purely local, unblocking other MPI threads

conradtchan mentioned this pull request Aug 18, 2022

Particle balance optimisation #316

Merged

conradtchan added 2 commits August 22, 2022 13:31

Merge branch 'balance-optimisation' into parallel-send

fba6658

(force) independent separate recv buffers and requests

9d687e4

but still requires critical sections because the stack is shared

conradtchan added 6 commits September 9, 2022 12:24

increase stacksize safety factor to 8

e268563

tidy unused routines

2d1b2c1

(dens) repeat changes in mpi_memory

42d4a85

use one stacksize value for both dens and force

76bfd53

separate tree-related MPI routines into mpi_tree.F90

1d5ad22

bugfix to 76bfd53

3886fcb

dliptai assigned conradtchan Sep 13, 2022

conradtchan added 13 commits September 14, 2022 14:46

cherrypick 02f4f2b

0481a31

parallel recv for density

8629a7f

tidy

0977d3d

Merge remote-tracking branch 'origin/master' into parallel-send

4ec08bd

fix incorrectly merged Makefile

3c19fbd

Merge branch 'master' into shared-stack

b0f99dd

fix missing mpi_tree.F90 in makefile

ea6b146

Merge branch 'shared-stack' into parallel-send

a00ac6f

fix merge

c20952d

threadsafe handling of mpi datatypes

6519940

fix compile for non-mpi

ec8a894

Merge remote-tracking branch 'origin/master' into parallel-send

e01f8f6

use users.monash.edu.au/~dprice/phantom as data file link

0629934

because data host is gone

danieljprice marked this pull request as ready for review January 31, 2023 22:55

conradtchan added 4 commits February 13, 2023 16:22

Merge branch 'master' into parallel-send

011e297

fix makefile deps

519813e

reduce size of density and force caches

7e18944

reduce size of density and force caches

16e54b4

conradtchan closed this Feb 17, 2023

conradtchan reopened this Feb 17, 2023

reduce size of density and force caches

4b81511

danieljprice merged commit 036a67d into danieljprice:master Feb 17, 2023

s-neilson pushed a commit to s-neilson/phantom that referenced this pull request Mar 18, 2023

Merge pull request danieljprice#310 from ADACS-Australia/parallel-send

2729d33

OpenMP-parallel MPI send

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenMP-parallel MPI send #310

OpenMP-parallel MPI send #310

conradtchan commented Aug 1, 2022 •

edited

Loading

OpenMP-parallel MPI send #310

OpenMP-parallel MPI send #310

Conversation

conradtchan commented Aug 1, 2022 • edited Loading

conradtchan commented Aug 1, 2022 •

edited

Loading