-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2.x: Fix MPI_FINALIZED_HANG #5217
Conversation
Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be invoked during an attribute destruction callback (e.g., during the destruction of keyvals on MPI_COMM_SELF during the very beginning of MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false". Prior to this commit, we hung in FINALIZED if it were invoked during a COMM_SELF attribute destruction callback in FINALIZE. See open-mpi#5084. This commit converts the MPI_INITIALIZED / MPI_FINALIZED infrastructure to use a single enum (ompi_mpi_state, set atomically) to represent the state of MPI: - not initialized - init started - init completed - finalize started - finalize past COMM_SELF destruction - finalize completed The "finalize past COMM_SELF destruction" state is what allows us to return "false" from MPI_FINALIZED before COMM_SELF has been fully destroyed / all attribute callbacks have been invoked. Since this state is checked at nearly every MPI API call (to see if we're outside of the INIT/FINALIZE epoch), care was taken to use atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and ompi_mpi_finalize(), but performance-critical code paths can simply read the variable without needing to use a slow call to an opal_atomic_*() function. Thanks to @AndrewGaspar for reporting the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 35438ae)
@artpol84 @jladd-mlnx Can Mellanox please review at least the OSHMEM parts of this PR? I had to port this from master, and it's a little different. I'd appreciate if someone could actually test that I didn't break OSHMEM. Thanks! |
@@ -70,8 +71,10 @@ int oshmem_shmem_finalize(void) | |||
/* Should be called first because ompi_mpi_finalize makes orte and opal finalization */ | |||
ret = _shmem_finalize(); | |||
|
|||
if ((OSHMEM_SUCCESS == ret) && ompi_mpi_initialized | |||
&& !ompi_mpi_finalized) { | |||
int32_t state = ompi_mpi_state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsquyres I see that atomics are used to change ompi_mpi_state, don't we need to use them when accessing it as well?
Or consideration is made that this will always be the same thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment starting at https://github.com/open-mpi/ompi/pull/5217/files#diff-b109613962ca7154bf7d060024f15fe2R196.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
oshmem/runtime/oshmem_shmem_init.c
Outdated
@@ -146,7 +146,7 @@ int oshmem_shmem_init(int argc, char **argv, int requested, int *provided) | |||
int ret = OSHMEM_SUCCESS; | |||
|
|||
if (!oshmem_shmem_initialized) { | |||
if (!ompi_mpi_initialized && !ompi_mpi_finalized) { | |||
if (ompi_mpi_state < OMPI_MPI_STATE_INIT_COMPLETED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not comparing to a not-initialized state here?
If initialization has started already it supposed to be started in some other thread. And doing ompi_init here will launch a parallel init which is not desired.
I'd expect to have a while look polling on a state (with proper memory barriers / atomics) to wait for the started init to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test is simply doing what the old test did (because ompi_mpi_initialized
was not set until the end of ompi_mpi_init()
). I can change it to be == OMPI_MPI_STATE_INIT_NOT_INITIALIZED
, since the new variable offers finer-grained control.
If you want it to check for _STATE_INIT_STARTED
and have it spin for a while, that should probably be a separate PR.
@jsquyres so this PR will be discarded? |
No, the commits from #5234 will be added to this one (i.e., the commits on this PR are already on master). I just have not done it yet. |
There was a race condition in 35438ae: if multiple threads invoked ompi_mpi_init() simultaneously (which could happen from both MPI and OSHMEM), the code did not catch this condition -- Bad Things would happen. Now use an atomic cmp/set to ensure that only one thread is able to advance ompi_mpi_init from NOT_INITIALIZED to INIT_STARTED. Additionally, change the prototype of ompi_mpi_init() so that oshmem_init() can safely invoke ompi_mpi_init() multiple times (as long as MPI_FINALIZE has not started) without displaying an error. If multiple threads invoke oshmem_init() simultaneously, one of them will actually do the initialization, and the rest will loop waiting for it to complete. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 67ba8da)
(Note: this commit would have been squashed with the prior commit, but open-mpi#5092 was accidentally merged before the 2 commits were squashed together) Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 9b9cb5f)
bot:ibm:retest |
@hppritcha Good to go. |
Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false".
Prior to this commit, we hung in FINALIZED if it were invoked during
a COMM_SELF attribute destruction callback in FINALIZE. See
#5084.
This commit converts the MPI_INITIALIZED / MPI_FINALIZED
infrastructure to use a single enum (ompi_mpi_state, set atomically)
to represent the state of MPI:
The "finalize past COMM_SELF destruction" state is what allows us to
return "false" from MPI_FINALIZED before COMM_SELF has been fully
destroyed / all attribute callbacks have been invoked.
Since this state is checked at nearly every MPI API call (to see if
we're outside of the INIT/FINALIZE epoch), care was taken to use
atomics to set the ompi_mpi_state value in ompi_mpi_init() and
ompi_mpi_finalize(), but performance-critical code paths can simply
read the variable without needing to use a slow call to an
opal_atomic_*() function.
Thanks to @AndrewGaspar for reporting the issue.
Signed-off-by: Jeff Squyres jsquyres@cisco.com
(cherry picked from commit 35438ae)
Refs #5084