-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
impi/2021.9.0-intel-compilers-2023.1.0 sanity check fails when using RPATH due to missing libfabric #20295
Comments
I encountered the same issue. To resolve it, I installed libfabric/1.18.0-GCCcore-12.3.0. This resolved the problem. I then added export I_MPI_OFI_LIBRARY_INTERNAL=0 to the .bashrc file. Afterwards, I ran the command: mpiexec.hydra -env I_MPI_DEBUG=1 -np 8 ./a.out The result was as follows: |
easybuilders/easybuild-easyblocks#2910 looks relevant here, perhaps? |
Edited and removed most parts of this message, as most of it was not relevant / correct. I suspect the real issue is that we're trying to compile something in the sanity check, but the RPATH wrappers are not available here (anymore)? Adding a |
Alternatively, I guess we could adjust the easyblock and add the |
So the Allicance put compiler configurations in place beside the Intel compilers, the same problem is being discussed on Slack (https://easybuild.slack.com/archives/C34UA1HT7/p1726141669066889) |
The configurations are only enough to use the compat layer though (so not relevant here). The only way to address the particular problem is do elf header modification of the libraries (or indeed use our compiler wrappers when compiling). |
From EESSI/docs#175 (comment) , a hack would indeed be to set |
I think this could be fixed in the EasyBlock for |
Hmm, having the underlying compilers wrapped should actually be enough. The problem is it is shipping it's own Without |
Wait a minute, I don't see |
The setting I_MPI_DEBUG=1 can be used to debug issues with mpirun. mpiexec.hydra -env I_MPI_DEBUG=1 -np 8 ./a.out [0] MPI startup(): Intel(R) MPI Library, Version 2021.9 Build 20230307 (id: d82b3071db) It is evident that libfabric is being utilized. When no additional libfabric module is provided, the Intel MPI Library defaults to using its own embedded libfabric. This appears to be the source of the error. I recommended adding the libfabric/1.18.0-GCCcore-12.3.0 module to the IMPI EasyBuild file. |
@yuke-li filed an issue to our internal ticket system so I had another look. The core issue in the first comment by @cgross95 is the use of RUNPATH instead of RPATH. RUNPATH unlike RPATH isn't transitive, so it does not apply to the
it'll work fine! Alternatively you can patchelf Now I don't know here where the RUNPATH comes from, as the EB RPATH wrappers specifically uses The external vs. internal libfabric is orthogonal to this issue. The advantage of the internal libfabric is that it can use UCX and can benefit from better performance with Mellanox hardware. External libfabric includes a UCX provider again since version 1.18.0 ( |
I'm trying to build
impi/2021.9.0-intel-compilers-2023.1.0
and it fails during the sanity check, I believe due to building with RPATH enabled andlibfabric
not being found.The sanity check fails after a small test has been built with
mpicc -cc=icx ... -o mpi_test
. The compilation succeeds, but running withmpirun -n 8 .../mpi_test
fails with not much help:Some digging (reproducing the install environment and running with
I_MPI_DEBUG=30 mpirun -v -n 2 .../mpi_test
) shows me:I originally thought there might be a problem with
mpicc -cc=icx
not usingicx
with an RPATH wrapper, sincereadelf -d .../mpi_test
showsI tried forcing a compilation of the new copy with the wrapper which gave me
but had the same problem (even though the runpath includes the path to the
libfabric
libs).Eventually running with
succeeds with no errors.
So I'm not sure why the executable is not picking up the
libfabric
libraries when compiled with RPATH. Any help would be greatly appreciated! As a side note, this is being done on top of EESSI, so if there's anything relevant there that I can share, please let me know.The text was updated successfully, but these errors were encountered: