forked from danieljprice/phantom
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MPI and OpenMP memory allocation optimisations (danieljprice#262)
Type of PR: modification to existing code Description: This PR combines several changes to memory allocation: - Rename `mpi_stack.F90` to `mpi_memory.F90` to reflect the fact that the structure is not strictly a memory stack - Move mpi memory allocation from initial into the main memory allocation subroutine - Set the MPI "stacksize" dynamically, based on `npart` and `nprocs` - Remove `remote_export` from the cell derived types, and adjust send/recv subroutines to match - this allows the `maxprocs` parameter to be removed - Remove `maxprocs` parameter and allocate arrays dynamically using `nprocs` - Automatically expand `stacksize` as required - Add check for if global node refinment exceeds ncellsmax (which is an issue for large numbers of MPI tasks) - Allocate a larger array for the global tree (using `ncellsmaxglobal` instead of `ncellsmax`) - Move dynamic memory size calculation inside the `allocate_memory` subroutine - Tidy up some unused imports - Fix setdisc test failure due to uninitialised (but unused) variables Resolve danieljprice#260 The crash is caused by large `threadprivate` arrays being allocated for certain numbers of MPI tasks and OpenMP threads. The root cause has not been identified, and is potentially a compiler bug. Two possible solutions: 1. Dynamically allocate a separate `xyzcache` array for each thread (or one large array that is split between tasks). This solution essentially makes the memory allocation a manual/explicit process, since `threadprivate` _already_ performs a "dynamic" allocation at runtime based on `OMP_NUM_THREADS`. After discussion with @danieljprice, this solution was not preferred because: - The cache size is fixed per thread (though the total memory allocation still varies with the number of threads) - Performance may be affected. However, a test of the polar benchmark showed no change in performance for 1 and 2 node hybrid jobs. Interestingly, the new version produced slightly faster run times, but not with any significance. - Additional code complexity 2. Reduce the cache size to work around the bug. Performance may not be affected, based on discussion with @danieljprice: _xyzcache does not have to be 50,000 long. Typically the number of “trial” neighbours is more like a few hundred, so something like 10,000 or even 5,000 for xyzcache length is already plenty_ Option 2 has been chosen, with cache size reduced to 10,000. This works around the issue for 2 nodes, but the problem may still exist at other task+thread counts or other systems. This can be revisited if a more robust solution is required. Testing: - Benchmark suite to assess performance impact - Test suite for correctness Did you run the bots? yes
- Loading branch information
1 parent
2f455fe
commit f42bab7
Showing
23 changed files
with
530 additions
and
470 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.