Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nexus: Set workstation thread binding to none #5240

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

prckent
Copy link
Contributor

@prckent prckent commented Nov 22, 2024

Proposed changes

The Nexus workstation configuration, e.g. ws16, implicitly assumes codes are built with OpenMPI for MPI runs. When also used with threading, the threads will be wrongly bound for our uses, giving poor performance. The binding is the default behavior of all recent OpenMPI releases and is easily verified in a system with hyperthreading enabled where the per process CPU% will show as 50% instead of the expected 100%, or in a 16 "core" run, 16 cores will not be used. Performance is consequently halved. Added --bind-to none, similar to what has already been done in some of the Supercomputer job definitions.

Open to changing to more optimal settings or moving the location of the setting, but for everyday workstation runs I think this is close enough and simple.

Noticed while testing #5214.

What type(s) of changes does this code introduce?

  • Bugfix

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

nitrogen2, GCC14 + OpenMPI, standard nightly configuration.

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'
  • NA. Code added or changed in the PR has been clang-formatted
  • NA. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • No. Documentation has been added (if appropriate)

@ye-luo
Copy link
Contributor

ye-luo commented Nov 22, 2024

Is it possible to test if it is OpenMPI in use and add the binding option if the answer is yes?

@prckent
Copy link
Contributor Author

prckent commented Nov 22, 2024

A fair question for @jtkrogel

My take is that since the current workstation class assumes you are using some flavor of MPI, we could have different variants for different MPI implementations (OpenMPI, MPICH, even no MPI [but set threads properly]).

Hopefully this current change can be merged quickly if there is not a near ready-to-go better route. I'll note though that improvements to the Workstation class are not academic -- if we are to run the Nexus examples in the nightlies or even CI, efficiency will be key.

Copy link
Contributor

@ye-luo ye-luo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that both OpenMPI and MPICH accept --bind-to none. Unclear about MVAPICH.

@ye-luo
Copy link
Contributor

ye-luo commented Nov 27, 2024

Test this please

@ye-luo
Copy link
Contributor

ye-luo commented Nov 27, 2024

@prckent nexus tests failing on sulfur

@prckent
Copy link
Contributor Author

prckent commented Nov 27, 2024

But why those sulfur configurations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants