Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particle balance optimisation #316

Merged
merged 4 commits into from
Aug 18, 2022

Conversation

conradtchan
Copy link
Collaborator

Type of PR:
modification to existing code

Description:
In the existing implementation of balancedomains, recv_part is called after checking if a particle belongs to a task to check if a particle is waiting to be received. Checking if a particle belongs to a task is an integer comparison, which is fast relative to performing MPI_Test within recv_part.

These repeated MPI_Test calls present significant overhead and do not need to be called so frequently. This problem is made worse when MPI_THREAD_MULTIPLE is enabled (namely for upcoming optimisation #310), because MPI calls have an additional small overhead.

In this PR, the loop is changed to only call recv_part every check_interval iterations. check_interval is dynamically adjusted based on the frequency of actual receives. The frequency is dependent on the number of cells that have remote neighbours, which is problem dependent. check_interval is initialised as "check every 2 loop iterations", but adjusted to poll at no more than 4x the frequency of actual receives. check_interval is a saved variable across calls of balancedomains, to improve rate convergence, relying on the assumption that the particle distribution does not change drastically between calls.

Testing:
Performance benchmarks were performed for 2 problems, timing the balancedomains routine in seconds, and as a percentage of total run time.

Disc setup with 2M particles, 1 node, 4 MPI tasks, 4 OMP threads per task

old: 16.81s (18.03%)
new: 0.62s (0.79%)
~27x speedup

Common envelope setup (self-gravity) with 2M particles, 2 nodes, 8 MPI tasks per node, 4 OMP threads per task

old: 0.07min (0.90%)
new: 0.01min (0.10%)
~9x speedup

The performance improvement for the self-gravity problem is not as significant because a larger fraction of particles need to be sent. This increases the polling frequency. The limit of maximum polling frequency is equivalent to the old implementation.

Did you run the bots? yes

Only call it when a particle is being sent. This prevents it from being called on every particle, which gets expensive.
this commit is unrelated to this PR, but is caused because of the bots run, which resolved a formatting issued caused by 9fb018a because the bots were not run in that commit
@conradtchan conradtchan merged commit 55b3096 into danieljprice:master Aug 18, 2022
@conradtchan conradtchan deleted the balance-optimisation branch August 18, 2022 23:20
s-neilson pushed a commit to s-neilson/phantom that referenced this pull request Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant