The AlmaLinux 8.10 HPC Image includes optimizations and recommended configurations to deliver optimal performance, consistency, and reliability. This image consists of the following HPC tools and libraries:
- Mellanox OFED 24.07-0.6.1
- Pre-configured IPoIB (IP-over-InfiniBand)
- Popular InfiniBand based MPI Libraries
- HPC-X v2.18
- IntelMPI 2021.13.1
- MVAPICH2 2.3.7-1
- OpenMPI 5.0.5
- Communication Runtimes
- Libfabric
- OpenUCX
- NCCL 2.22.3-1
- NCCL RDMA Sharp Plugin
- PMIx 4.2.9-1
- Optimized libraries
- AMD Optimizing C/C++ and Fortran Compilers 4.2.0
- Intel MKL 2024.2.1.105
- GPU Drivers
- Nvidia GPU Driver 560.35.03
- NV Peer Memory (GPU Direct RDMA)
- NVIDIA Fabric Manager
- CUDA 12.6
- GDRCopy 2.5-1 (GitHub master)
- Data Center GPU Manager 3.3.7
- Azure HPC Diagnostics Tool
- SKU based Customizations
- Topology files
- NCCL configuration
- Moby 27.0.3-1.el8
- Docker 27.0.3-1
- Azure Managed Lustre 2.15.4-42-gd6d405d
- Moneo v0.3.5
- Azure HPC Health checks v0.4.2
Deploying HPC VM Images:
The HPC VM images are available from Azure Marketplace, and they can be deployed through a variety of deployment vehicles (CycleCloud, Batch, ARM templates, etc).
AzureHPC scripts provide an easy way to quickly deploy an HPC cluster using an HPC VM image.
The AlmaLinux 8.10 Image is available in the marketplace with the following URIs :
almalinux:almalinux-hpc:8_10-hpc-gen2:8.10.2024101801
almalinux:almalinux-hpc:8-hpc-gen2:8.10.2024101801
What's changed:
- Updated kernel to 4.18.0-553.16.1.el8_10.x86_64
- Updated Nvidia GPU Driver to 560.35.03
- Updated GDRCopy with commit 1366e20d140c5638fcaa6c72b373ac69f7ab2532 from master to fix an issue with GPU driver compatibility.
- Fixed an issue with the HCOLL path in the OpenMPI module
Known issues:
- IMPI osu_bcast performance is poor at larger message sizes. If this is your use case, pull the latest master from the UCX repo, install and append new ucx lib location to LD_LIBRARY_PATH.
- NDv2 & NCv3 SKUs: NVIDIA GPU drivers now install the Open Linux Kernel Modules, which do not support the V100 in the NDv2 and NCv3. If you need to use these SKUs, you'll either need to use a previous image or create a custom image with our scripts but override with NVIDIA's proprietary kernel modules