Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.x: Fix MPI_FINALIZED_HANG #5217

Merged
merged 3 commits into from
Jun 12, 2018

Conversation

jsquyres
Copy link
Member

@jsquyres jsquyres commented Jun 3, 2018

Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false".

Prior to this commit, we hung in FINALIZED if it were invoked during
a COMM_SELF attribute destruction callback in FINALIZE. See
#5084.

This commit converts the MPI_INITIALIZED / MPI_FINALIZED
infrastructure to use a single enum (ompi_mpi_state, set atomically)
to represent the state of MPI:

  • not initialized
  • init started
  • init completed
  • finalize started
  • finalize past COMM_SELF destruction
  • finalize completed

The "finalize past COMM_SELF destruction" state is what allows us to
return "false" from MPI_FINALIZED before COMM_SELF has been fully
destroyed / all attribute callbacks have been invoked.

Since this state is checked at nearly every MPI API call (to see if
we're outside of the INIT/FINALIZE epoch), care was taken to use
atomics to set the ompi_mpi_state value in ompi_mpi_init() and
ompi_mpi_finalize(), but performance-critical code paths can simply
read the variable without needing to use a slow call to an
opal_atomic_*() function.

Thanks to @AndrewGaspar for reporting the issue.

Signed-off-by: Jeff Squyres jsquyres@cisco.com
(cherry picked from commit 35438ae)

Refs #5084

Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE).  In such cases, MPI_FINALIZED must return "false".

Prior to this commit, we hung in FINALIZED if it were invoked during
a COMM_SELF attribute destruction callback in FINALIZE.  See
open-mpi#5084.

This commit converts the MPI_INITIALIZED / MPI_FINALIZED
infrastructure to use a single enum (ompi_mpi_state, set atomically)
to represent the state of MPI:

- not initialized
- init started
- init completed
- finalize started
- finalize past COMM_SELF destruction
- finalize completed

The "finalize past COMM_SELF destruction" state is what allows us to
return "false" from MPI_FINALIZED before COMM_SELF has been fully
destroyed / all attribute callbacks have been invoked.

Since this state is checked at nearly every MPI API call (to see if
we're outside of the INIT/FINALIZE epoch), care was taken to use
atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and
ompi_mpi_finalize(), but performance-critical code paths can simply
read the variable without needing to use a slow call to an
opal_atomic_*() function.

Thanks to @AndrewGaspar for reporting the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 35438ae)
@jsquyres jsquyres added the bug label Jun 3, 2018
@jsquyres jsquyres modified the milestones: v2.0.5, v2.1.4 Jun 3, 2018
@jsquyres jsquyres requested a review from bosilca June 3, 2018 18:55
@jsquyres
Copy link
Member Author

jsquyres commented Jun 4, 2018

@artpol84 @jladd-mlnx Can Mellanox please review at least the OSHMEM parts of this PR? I had to port this from master, and it's a little different. I'd appreciate if someone could actually test that I didn't break OSHMEM. Thanks!

@@ -70,8 +71,10 @@ int oshmem_shmem_finalize(void)
/* Should be called first because ompi_mpi_finalize makes orte and opal finalization */
ret = _shmem_finalize();

if ((OSHMEM_SUCCESS == ret) && ompi_mpi_initialized
&& !ompi_mpi_finalized) {
int32_t state = ompi_mpi_state;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsquyres I see that atomics are used to change ompi_mpi_state, don't we need to use them when accessing it as well?
Or consideration is made that this will always be the same thread?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -146,7 +146,7 @@ int oshmem_shmem_init(int argc, char **argv, int requested, int *provided)
int ret = OSHMEM_SUCCESS;

if (!oshmem_shmem_initialized) {
if (!ompi_mpi_initialized && !ompi_mpi_finalized) {
if (ompi_mpi_state < OMPI_MPI_STATE_INIT_COMPLETED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not comparing to a not-initialized state here?
If initialization has started already it supposed to be started in some other thread. And doing ompi_init here will launch a parallel init which is not desired.
I'd expect to have a while look polling on a state (with proper memory barriers / atomics) to wait for the started init to complete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test is simply doing what the old test did (because ompi_mpi_initialized was not set until the end of ompi_mpi_init()). I can change it to be == OMPI_MPI_STATE_INIT_NOT_INITIALIZED, since the new variable offers finer-grained control.

If you want it to check for _STATE_INIT_STARTED and have it spin for a while, that should probably be a separate PR.

@jsquyres
Copy link
Member Author

jsquyres commented Jun 6, 2018

Potentially awaiting #5234.

@artpol84 #5234 ended up fixing the oshmem_shmem_init() race condition completely -- it made me realize that there was a race when multiple threads invoked ompi_mpi_init() simultaneously.

@artpol84
Copy link
Contributor

artpol84 commented Jun 6, 2018

@jsquyres so this PR will be discarded?

@jsquyres
Copy link
Member Author

jsquyres commented Jun 6, 2018

No, the commits from #5234 will be added to this one (i.e., the commits on this PR are already on master). I just have not done it yet.

jsquyres added 2 commits June 7, 2018 12:29
There was a race condition in 35438ae: if multiple threads invoked
ompi_mpi_init() simultaneously (which could happen from both MPI and
OSHMEM), the code did not catch this condition -- Bad Things would
happen.

Now use an atomic cmp/set to ensure that only one thread is able to
advance ompi_mpi_init from NOT_INITIALIZED to INIT_STARTED.

Additionally, change the prototype of ompi_mpi_init() so that
oshmem_init() can safely invoke ompi_mpi_init() multiple times (as
long as MPI_FINALIZE has not started) without displaying an error.  If
multiple threads invoke oshmem_init() simultaneously, one of them will
actually do the initialization, and the rest will loop waiting for it
to complete.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 67ba8da)
(Note: this commit would have been squashed with the prior commit, but
open-mpi#5092 was accidentally merged
before the 2 commits were squashed together)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 9b9cb5f)
@jsquyres
Copy link
Member Author

jsquyres commented Jun 7, 2018

@artpol84 @bosilca Ready for review.

@jsquyres
Copy link
Member Author

jsquyres commented Jun 8, 2018

bot:ibm:retest

@jsquyres
Copy link
Member Author

jsquyres commented Jun 8, 2018

@hppritcha Good to go.

@jsquyres jsquyres merged commit 1a2a4dd into open-mpi:v2.x Jun 12, 2018
@jsquyres jsquyres deleted the pr/v2.x/fix-finalized-hang branch June 12, 2018 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants