Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Type of PR:
modification to existing code
Description:
In the existing implementation of
balancedomains
,recv_part
is called after checking if a particle belongs to a task to check if a particle is waiting to be received. Checking if a particle belongs to a task is an integer comparison, which is fast relative to performingMPI_Test
withinrecv_part
.These repeated
MPI_Test
calls present significant overhead and do not need to be called so frequently. This problem is made worse whenMPI_THREAD_MULTIPLE
is enabled (namely for upcoming optimisation #310), because MPI calls have an additional small overhead.In this PR, the loop is changed to only call
recv_part
everycheck_interval
iterations.check_interval
is dynamically adjusted based on the frequency of actual receives. The frequency is dependent on the number of cells that have remote neighbours, which is problem dependent.check_interval
is initialised as "check every 2 loop iterations", but adjusted to poll at no more than 4x the frequency of actual receives.check_interval
is a saved variable across calls ofbalancedomains
, to improve rate convergence, relying on the assumption that the particle distribution does not change drastically between calls.Testing:
Performance benchmarks were performed for 2 problems, timing the
balancedomains
routine in seconds, and as a percentage of total run time.Disc setup with 2M particles, 1 node, 4 MPI tasks, 4 OMP threads per task
Common envelope setup (self-gravity) with 2M particles, 2 nodes, 8 MPI tasks per node, 4 OMP threads per task
The performance improvement for the self-gravity problem is not as significant because a larger fraction of particles need to be sent. This increases the polling frequency. The limit of maximum polling frequency is equivalent to the old implementation.
Did you run the bots? yes