Incorrect usage of non blocking MPI collectives #17

lukasm91 · 2024-02-29T10:11:44Z

We found an issue in the usage of collectives, for example:

https://github.com/ecmwf-ifs/fiat/blob/main/src/fiat/mpl/internal/mpl_alltoallv_mod.F90#L252

The variables IRECVDISPL and ISENDDISPL are local variables and do not outlive the end of the function. These variables must be valid until the collective completes, see https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node126.htm

Once initiated, all associated send buffers and buffers associated with input arguments (such as arrays of counts, displacements, or datatypes in the vector versions of the collectives) should not be modified, and all associated receive buffers should not be accessed, until the collective operation completes.

Other routines such as MPL_ALLGATHERV have the same issue.

In the latest HPC-X version, we get a segfault due to this. A quick workaround is to disable non blocking communication for collectives, and a proper fix is probably to make these displacements not ALLOCATABLE, but POINTER and require the optional input if we use non blocking communication.

The text was updated successfully, but these errors were encountered:

wdeconinck · 2024-02-29T10:50:40Z

Thanks @lukasm91 for this report! So I understand now that MPI_IALLTOALLV for most MPI implementations is internally calling the blocking version MPI_ALLTOALLV, which is why this problem went under the radar for the past decades. It makes sense that the (optional) arguments KRECVDISPL and KSENDDISPL should stay alive as long as MPI_WAIT is not called.
I would therefore suggest to abort if the optional arguments are not present (breaking change for IFS/Arpege).
This will require changes in ifs-source to make sure this is indeed the case.

An alternative backward-compatible fix (perhaps temporarily, but these things stick) is to manually call the blocking MPI_ALLTOALLV when the arguments are missing, but that would just hide the problem and we'd all mistakenly think we're using the non-blocking version.

Two cents @ioanhadade @marsdeno ?

ioanhadade · 2024-02-29T11:09:41Z

Good catch. We need to work on a fix as we started using non-blocking collectives in some places and plan to use them more. I believe one was introduced (igather) recently in opdis.F90 and in most of the cases, we use local arrays from the calling method as arguments to MPL collective which means that if the wait for completion for the collective is not done in the same scope (e.g., calling routine) but somewhere later, the calling arguments are out of scope.

Thanks @lukasm91.

lukasm91 · 2024-11-01T06:50:07Z

FYI: Alexey from BSC also ran into this when running IFS. For the moment we just disable UCC (OMPI_MCA_coll_ucc_enable=0).

a-v-medvedev · 2024-11-04T10:14:30Z

Thanks @lukasm91 for highlighting this. I ran into this issue while working on DestinE/ClimateDT code to make it run properly on MareNostrum5 machine (we use fiat-1.2.0 there for now).

Out of my previous experience, I'd say that this wrong semantics issue for MPI_IAlltoallv() is quite severe: many MPI implementations move nowadays to more advanced algorithms of non-blocking collectives, so this will lead to random/occasional/persistent crashes (depending on how lucky you are) on many new machines and runtime environments.

I'd suggest considering to implement an urgent hotfix ASAP. I think I have no permission to create branches in this project, so instead of a pull request, I'm attaching here hotfix-alltoall-issue-github#17.patch.gz a variant of a hotfix patch. This is just an idea of a fix (even though I tested it in my environment recently) -- one can implement another type of a fix based on this or similar idea. Any comments and corrections are welcome.
@wdeconinck

ioanhadade · 2024-11-04T11:00:37Z

@a-v-medvedev can you fork fiat, add your patch on top develop/master, and then raise a PR to develop?

wdeconinck · 2024-11-05T15:32:39Z

The hotfix in #29 (thanks @a-v-medvedev) is sufficient to urgently patch the crashing behaviour.
A proper fix will internally be worked on with priority. This issue should not be closed until this proper fix is in place.

a-v-medvedev mentioned this issue Nov 4, 2024

[HOTFIX] quick fix for occasional crashes in MPL_ALLTOALLV #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect usage of non blocking MPI collectives #17

Incorrect usage of non blocking MPI collectives #17

lukasm91 commented Feb 29, 2024

wdeconinck commented Feb 29, 2024

ioanhadade commented Feb 29, 2024

lukasm91 commented Nov 1, 2024 •

edited

Loading

a-v-medvedev commented Nov 4, 2024

ioanhadade commented Nov 4, 2024

wdeconinck commented Nov 5, 2024

Incorrect usage of non blocking MPI collectives #17

Incorrect usage of non blocking MPI collectives #17

Comments

lukasm91 commented Feb 29, 2024

wdeconinck commented Feb 29, 2024

ioanhadade commented Feb 29, 2024

lukasm91 commented Nov 1, 2024 • edited Loading

a-v-medvedev commented Nov 4, 2024

ioanhadade commented Nov 4, 2024

wdeconinck commented Nov 5, 2024

lukasm91 commented Nov 1, 2024 •

edited

Loading