-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable oversubscribing in OpenMPI 2.X #3335
Conversation
Centralize the TEST_NP and MPIEXEC_OVERSUBSCRIBE default value deduction mechanism to the dedicated unit_test module.
The OpenMPI -oversubscribe feature was backported to the 2.X series. Without the -oversubscribe flag, tests running on hyperthreaded CPUs with more threads than cores using OpenMPI 2.X may exit early with an exit code of 0, causing these tests to silently fail.
When no default value is provided for TEST_NP, value `nproc / 2 + 1` is used instead, which can be large on hyperthreaded CPUs. This leads to issues in parallel tests that don't scale well with a large number of cores (e.g. when there aren't enough particles) or with a large odd number of cores (e.g. LB or P3M).
Codecov Report
@@ Coverage Diff @@
## python #3335 +/- ##
======================================
- Coverage 86% 86% -1%
======================================
Files 538 538
Lines 25350 25350
======================================
- Hits 21834 21833 -1
- Misses 3516 3517 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you‘ve checked that OpenMPI 2.0.0 already had this option, then this is fine.
Yes, I checked the source code. I've also now compiled 2.0.0 on Fedora 31 and the |
bors r=fweik |
3334: Update io.rst r=jngrad a=jonaslandsgesell link to h5md plugin for VMD 3335: Enable oversubscribing in OpenMPI 2.X r=fweik a=jngrad Fixes #3333 3336: Fallthrough and remove duplicate line r=jngrad a=hirschsn Description of changes: - Remove a duplicated line in `topology_check_resort` by falling through as done for other cell systems in this function. PR Checklist ------------ - [ ] Tests? - [ ] Interface - [ ] Core - [ ] Docs? Co-authored-by: Jonas Landsgesell <jonaslandsgesell@users.noreply.github.com> Co-authored-by: Jean-Noël Grad <jgrad@icp.uni-stuttgart.de> Co-authored-by: Steffen Hirschmann <steffen.hirschmann@ipvs.uni-stuttgart.de>
Build succeeded |
Fixes #3333