-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI jobs fail with intel toolchains after upgrade of EL8 Linux from 8.5 to 8.6 #15651
Comments
Quoting some discussion we've had on this in Slack:
|
Yikes... @OleHolmNielsen Have you been in touch with Intel support on this? @rscohn2 Any thoughts on this? |
I didn't know that this issue is related to the updated RHEL 8.6 kernel, so I didn't contact Intel support yet. I've never been in touch with Intel compiler/libraries support before, so if someone else knows how to do that, could you kindly open an issue with them? |
We ran into a silent hang issue several years ago too, details in hpcugent/vsc-mympirun#74 Any luck w.r.t. getting output when using |
It seems (although nothing to be seen within the kernel release notes) that numa info has changed within the kernel. |
@boegel mpiexec.hydra does not know the -d parameter:
but it knows --debug, but the only thing you see, is the called command:
|
Looking at https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/bug-mpiexec-segmentation-fault/m-p/1183364, you can influence this with
(Ha, look who is posting the last comment in that link) |
using impi 2021.6.0, everything is working:
|
@ocaisa that doesn't change anything, looping around the same function |
ahh, yes, I forgot, we have had an issue open with intel (case# 05472393) regarding the problem. Their first comment was as usual "are you trying the newest version?" |
Is there any chance that Red Hat will accept a bug report for the older IntelMPI versions not working? |
@OleHolmNielsen kernel updates that break userspace are frwoned upon, so you can try to open a bugreport with redhat. they will at some point as you what they need, or point you to the release notes that say what has changed that broke this. they will probaby blame intel (and sounds like intel already fixed it, but doesn't want to backport it) |
@stdweird Yes, but how do we get any error messages from mpiexec.hydra which can be reported to Red Hat? |
@OleHolmNielsen the error you need to report is that an application is hanging since an upgrade to RHEL8.6 was done. you can already add what was said here (ie it works on 8.5, pstack points to the ipl thingie so they can have some idea in what direction to look). |
@stdweird Thanks for the info. I have made this test: $ module load iimpi/2021b
Now I can execute pstack on the process PID: $ pstack 717906 Do we agree that this is the issue which I should report to Red Hat? Thanks, |
@OleHolmNielsen the issue to report to RHEL is that your application is hanging after an upgrade. RH has no knoweldge about intel mpi itself (and they will most likely not provide a solution, only an explanation) |
I have created an issue in the Red Hat Bugzilla: |
AFAIK, you can add anyone (with their email) to the report, so that they can also read it... |
I anyone would like their E-mail to be added to the Red Hat bug 2095281 you can ask me to do it. |
If there's a regression in the RHEL kernel topology information, you may want to compare the output of lstopo before and after the upgrade. |
@bgoglin I took an EL85 node and copied the output of lstopo to a file. Then I upgraded the node to EL86 and rebooted. The EL86 lstopo output is 100% identical to that of EL85. |
The Intel MPI Release Notes at https://www.intel.com/content/www/us/en/developer/articles/release-notes/mpi-library-release-notes-linux.html don't mention any bugs related to mpixec.hydra, there's only a terse "Bug fixes" line. |
I received a response in Red Hat bug 2095281:
So the conclusion is that Intel MPI prior to 2021.6 is buggy. We cannot use older Intel MPI versions on EL 8.6 kernels then :-( If no workaround is found, it seems that all EB modules iimpi/* prior to 2021.6 have to be discarded after we upgrade from EL 8.5 to 8.6. |
Or the |
Should only be done on a per-site initiative I think. |
For the record: When I load the module iimpi/2021b on an EL 8.6 node running kernel 4.18.0-372.9.1.el8.x86_64, the mpiexec.hydra enters an infinite loop while reading /sys/devices/system/node/node0/cpulist as seen by strace: $ strace -f -e file mpiexec.hydra --version After rebooting the node with the EL 8.5 kernel 4.18.0-348.23.1.el8_5.x86_64 the mpiexec.hydra works correctly. I've now built the EB module iimpi/2022.05 which contains the latest Intel MPI module: $ ml
Running this module on the EL 8.6 node running kernel 4.18.0-372.9.1.el8.x86_64 the mpiexec.hydra works correctly (as observed by others): $ mpiexec.hydra --version |
One additional information is about the Intel MKL library: I've built the latest EB module imkl/2022.1.0 which includes an HPL benchmark executable .../modules/software/imkl/2022.1.0/mkl/2022.1.0/benchmarks/linpack/xlinpack_xeon64 Running the MKL2022.1.0 xlinpack_xeon64 executable also results in multiple copies of mpiexec.hydra in infinite loops, just like with Intel MPI prior to 2021.6. I think there exists a newer MKL 2022.2.0 but I don't know how to make en EB module with it for testing - can anyone help? |
I see 2022.1.0 on https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#inpage-nav-9-7 This is the easyconfig that you've tested: https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/i/imkl/imkl-2022.1.0.eb - to update it you would change:
with the relevant source url and source for the offline Linux installer. |
I have built the intel/2022a toolchain with EB 4.6.0, and I can confirm that with the new module impi/2021.6.0-intel-compilers-2022.1.0 the above issue with all previous Intel MPI versions has been resolved:
Of course, we still face an issue with all software modules that use the Intel MPI module prior to 2021.6.0 being broken on EL8 systems running the latest kernel. |
We got some feedback from intel: The issue was analyzed and the root cause was found. In RHEL8.6 and other OS with recent kernel versions, system files are reported to have 0 bytes size. In previous kernel versions ftell was reporting size == blocksize != 0. Using size==0 lead to a memory leak with the known consequences. I have written a small workaround library that can be used with LD_PRELOAD. This lib will use an "adapted" version of ftell for the startup of IMPI. Once the program is started there should be no issue. It is also possible to switch off LD_PRELOAD for the user mpi program. If this form of workaround is acceptable and you are willing to test it I can attach it to this issue. Preferred methodology is, however, to use the newest version of IMPI. |
@daRecall That's... interesting. Is that a deliberate change, perhaps related to security or something?
I would certainly like to see this, if only to learn more about the underlying issue... Is this library available publicly somewhere?
Both "mangling" existing |
Is this related to the size of the cpulist and cpumap files? If so, there is a kernel fix available: torvalds/linux@7ee951a Ah, I see that cpulist was mentioned in #15651 (comment). So testing this kernel patch seems worth a try. |
Red Hat has issued a Knowledgebase article Intel MPI version 2019 hangs while reading cpulist about this: Issue: Running even a simple mpirun on Intel's MPI version 2019 hangs after reading /sys/devices/system/node/node0/cpulist The issue may be fixed in rhel-8 with kernel-4.18.0-414.el8 (not yet available). The latest available kernel is kernel-4.18.0-372.19.1.el8_6.x86_64. |
@OleHolmNielsen that is probably 8.7 kernel. is there any indication they will backport it to 8.6? (probably a separate BZ ticket will be created for that; but the original BZ you mentioned is not accessible, so i can't check) |
@stdweird You are very likely right about the 8.7 kernel. I have asked once again in https://bugzilla.redhat.com/show_bug.cgi?id=2095281 plus https://bugzilla.redhat.com/show_bug.cgi?id=2089715 if the fix will become available in an 8.6 kernel. |
@OleHolmNielsen thanks a lot for tracking this! |
I received a reply from Red Hat in BZ case https://bugzilla.redhat.com/show_bug.cgi?id=2089715 as follows: the fix for RHEL-8.6 is handled in bz#2112030 (private) and will be released |
@OleHolmNielsen excellent news |
@daRecall can you still attach the LD_PRELOAD workaround? We're mainly using Open MPI, and people programming MPI can easily use the newest version, but it's tougher with commercial packages such as Ansys and Star CCM+ that ship with particular older Intel MPI versions and are very intertwined with it. |
@OleHolmNielsen kernel-4.18.0-372.26.1.el8_6.x86_64 is out, containing the fix |
Yes, the kernel fix for RHEL 8.6 is out, see https://access.redhat.com/errata/RHSA-2022:6460 To verify whether the patch has been applied or not, list this file: $ ls -l /sys/devices/system/node/node0/cpulist The file size must be >0. |
The EL 8.6 kernel I've upgraded an EL 8.6 server and the
and tested all our Intel toolchains on this system:
As you can see, the Intel MPI is now working correctly again :-)) It was OK on EL 8.5, but broken on EL 8.6 until the above listed kernel was released. |
@OleHolmNielsen Thanks a lot for the update, very happy to see that this problem has been resolved properly... I guess we can close this issue then, since i) the issue is resolved by updating to a sufficiently recent kernel, ii) there's nothing to do on the EasyBuild side for this? |
@boegel: I agree with you that the issue has been resolved by Red Hat delivering an RHEL 8.6 kernel update with an appropriate fix. |
I'm testing the upgrade of our compute nodes from Almalinux 8.5 to 8.6 (the RHEL 8 clone similar to Rocky Linux).
We have found that all MPI codes built with any of the Intel toolchains intel/2020b or intel/2021b fail after the 8.5 to 8.6 upgrade. The codes fail also on login nodes, so the Slurm queue system is not involved.
The FOSS toolchains foss/2020b and foss/2021b work perfectly on EL 8.6, however.
My simple test uses the attached trivial MPI Hello World code running on a single node:
Now the mpirun command enters an infinite loop (running many minutes) and we see these processes with "ps":
The mpiexec.hydra process doesn't respond to 15/SIGTERM and I have to kill it with 9/SIGKILL. I've tried to enable debugging output with
but nothing gets printed from this.
Question: Has anyone tried an EL 8.6 Linux with the Intel toolchain and mpiexec.hydra? Can you suggest how I may debug this issue?
OS information:
The text was updated successfully, but these errors were encountered: