v2.x: Fix MPI_FINALIZED_HANG #5217

jsquyres · 2018-06-03T18:17:31Z

Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false".

Prior to this commit, we hung in FINALIZED if it were invoked during
a COMM_SELF attribute destruction callback in FINALIZE. See
#5084.

This commit converts the MPI_INITIALIZED / MPI_FINALIZED
infrastructure to use a single enum (ompi_mpi_state, set atomically)
to represent the state of MPI:

not initialized
init started
init completed
finalize started
finalize past COMM_SELF destruction
finalize completed

The "finalize past COMM_SELF destruction" state is what allows us to
return "false" from MPI_FINALIZED before COMM_SELF has been fully
destroyed / all attribute callbacks have been invoked.

Since this state is checked at nearly every MPI API call (to see if
we're outside of the INIT/FINALIZE epoch), care was taken to use
atomics to set the ompi_mpi_state value in ompi_mpi_init() and
ompi_mpi_finalize(), but performance-critical code paths can simply
read the variable without needing to use a slow call to an
opal_atomic_*() function.

Thanks to @AndrewGaspar for reporting the issue.

Signed-off-by: Jeff Squyres jsquyres@cisco.com
(cherry picked from commit 35438ae)

Refs #5084

@AndrewGaspar

Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be invoked during an attribute destruction callback (e.g., during the destruction of keyvals on MPI_COMM_SELF during the very beginning of MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false". Prior to this commit, we hung in FINALIZED if it were invoked during a COMM_SELF attribute destruction callback in FINALIZE. See open-mpi#5084. This commit converts the MPI_INITIALIZED / MPI_FINALIZED infrastructure to use a single enum (ompi_mpi_state, set atomically) to represent the state of MPI: - not initialized - init started - init completed - finalize started - finalize past COMM_SELF destruction - finalize completed The "finalize past COMM_SELF destruction" state is what allows us to return "false" from MPI_FINALIZED before COMM_SELF has been fully destroyed / all attribute callbacks have been invoked. Since this state is checked at nearly every MPI API call (to see if we're outside of the INIT/FINALIZE epoch), care was taken to use atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and ompi_mpi_finalize(), but performance-critical code paths can simply read the variable without needing to use a slow call to an opal_atomic_*() function. Thanks to @AndrewGaspar for reporting the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 35438ae)

jsquyres · 2018-06-04T18:14:18Z

@artpol84 @jladd-mlnx Can Mellanox please review at least the OSHMEM parts of this PR? I had to port this from master, and it's a little different. I'd appreciate if someone could actually test that I didn't break OSHMEM. Thanks!

artpol84 · 2018-06-05T22:21:39Z

oshmem/runtime/oshmem_shmem_finalize.c

@@ -70,8 +71,10 @@ int oshmem_shmem_finalize(void)
        /* Should be called first because ompi_mpi_finalize makes orte and opal finalization */
        ret = _shmem_finalize();

-        if ((OSHMEM_SUCCESS == ret) && ompi_mpi_initialized
-                && !ompi_mpi_finalized) {
+        int32_t state = ompi_mpi_state;


@jsquyres I see that atomics are used to change ompi_mpi_state, don't we need to use them when accessing it as well?
Or consideration is made that this will always be the same thread?

See the comment starting at https://github.com/open-mpi/ompi/pull/5217/files#diff-b109613962ca7154bf7d060024f15fe2R196.

artpol84 · 2018-06-05T22:30:21Z

oshmem/runtime/oshmem_shmem_init.c

@@ -146,7 +146,7 @@ int oshmem_shmem_init(int argc, char **argv, int requested, int *provided)
    int ret = OSHMEM_SUCCESS;

    if (!oshmem_shmem_initialized) {
-        if (!ompi_mpi_initialized && !ompi_mpi_finalized) {
+        if (ompi_mpi_state < OMPI_MPI_STATE_INIT_COMPLETED) {


Why not comparing to a not-initialized state here?
If initialization has started already it supposed to be started in some other thread. And doing ompi_init here will launch a parallel init which is not desired.
I'd expect to have a while look polling on a state (with proper memory barriers / atomics) to wait for the started init to complete.

The new test is simply doing what the old test did (because ompi_mpi_initialized was not set until the end of ompi_mpi_init()). I can change it to be == OMPI_MPI_STATE_INIT_NOT_INITIALIZED, since the new variable offers finer-grained control.

If you want it to check for _STATE_INIT_STARTED and have it spin for a while, that should probably be a separate PR.

jsquyres · 2018-06-06T01:18:40Z

Potentially awaiting #5234.

@artpol84 #5234 ended up fixing the oshmem_shmem_init() race condition completely -- it made me realize that there was a race when multiple threads invoked ompi_mpi_init() simultaneously.

artpol84 · 2018-06-06T21:28:11Z

@jsquyres so this PR will be discarded?

jsquyres · 2018-06-06T23:02:28Z

No, the commits from #5234 will be added to this one (i.e., the commits on this PR are already on master). I just have not done it yet.

There was a race condition in 35438ae: if multiple threads invoked ompi_mpi_init() simultaneously (which could happen from both MPI and OSHMEM), the code did not catch this condition -- Bad Things would happen. Now use an atomic cmp/set to ensure that only one thread is able to advance ompi_mpi_init from NOT_INITIALIZED to INIT_STARTED. Additionally, change the prototype of ompi_mpi_init() so that oshmem_init() can safely invoke ompi_mpi_init() multiple times (as long as MPI_FINALIZE has not started) without displaying an error. If multiple threads invoke oshmem_init() simultaneously, one of them will actually do the initialization, and the rest will loop waiting for it to complete. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 67ba8da)

(Note: this commit would have been squashed with the prior commit, but open-mpi#5092 was accidentally merged before the 2 commits were squashed together) Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 9b9cb5f)

jsquyres · 2018-06-07T19:31:37Z

@artpol84 @bosilca Ready for review.

jsquyres · 2018-06-08T13:47:24Z

bot:ibm:retest

jsquyres · 2018-06-08T14:13:27Z

@hppritcha Good to go.

jsquyres added the bug label Jun 3, 2018

jsquyres modified the milestones: v2.0.5, v2.1.4 Jun 3, 2018

jsquyres requested a review from bosilca June 3, 2018 18:55

bosilca approved these changes Jun 3, 2018

View reviewed changes

hppritcha added the NEWS label Jun 4, 2018

hppritcha assigned artpol84 and jladd-mlnx Jun 4, 2018

artpol84 reviewed Jun 5, 2018

View reviewed changes

jsquyres added the ⚠️ WIP-DNM! label Jun 6, 2018

jsquyres added 2 commits June 7, 2018 12:29

jsquyres removed the ⚠️ WIP-DNM! label Jun 7, 2018

artpol84 approved these changes Jun 8, 2018

View reviewed changes

jsquyres added the RM approved label Jun 8, 2018

jsquyres merged commit 1a2a4dd into open-mpi:v2.x Jun 12, 2018

jsquyres deleted the pr/v2.x/fix-finalized-hang branch June 12, 2018 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.x: Fix MPI_FINALIZED_HANG #5217

v2.x: Fix MPI_FINALIZED_HANG #5217

jsquyres commented Jun 3, 2018

jsquyres commented Jun 4, 2018

artpol84 Jun 5, 2018

jsquyres Jun 5, 2018

artpol84 Jun 8, 2018

artpol84 Jun 5, 2018

jsquyres Jun 5, 2018

jsquyres commented Jun 6, 2018

artpol84 commented Jun 6, 2018

jsquyres commented Jun 6, 2018 •

edited

Loading

jsquyres commented Jun 7, 2018

jsquyres commented Jun 8, 2018

jsquyres commented Jun 8, 2018

v2.x: Fix MPI_FINALIZED_HANG #5217

v2.x: Fix MPI_FINALIZED_HANG #5217

Conversation

jsquyres commented Jun 3, 2018

jsquyres commented Jun 4, 2018

artpol84 Jun 5, 2018

Choose a reason for hiding this comment

jsquyres Jun 5, 2018

Choose a reason for hiding this comment

artpol84 Jun 8, 2018

Choose a reason for hiding this comment

artpol84 Jun 5, 2018

Choose a reason for hiding this comment

jsquyres Jun 5, 2018

Choose a reason for hiding this comment

jsquyres commented Jun 6, 2018

artpol84 commented Jun 6, 2018

jsquyres commented Jun 6, 2018 • edited Loading

jsquyres commented Jun 7, 2018

jsquyres commented Jun 8, 2018

jsquyres commented Jun 8, 2018

jsquyres commented Jun 6, 2018 •

edited

Loading