Skip to content

Azure meeting 2023 04 17

Kenneth Hoste edited this page May 4, 2023 · 1 revision

EESSI/Azure sync meeting 2023-04-17

attending:

  • Laura
  • Martin Brandt (SURF)
  • Alan O'Caisa (CECAM)
  • Hugo Meiland (Azure)
  • Kenneth Hoste (HPC-UGent)
  • Davide Vanzo (Azure)
  • Update on OCRE project grant (by Ivar?)
    • Ivar not here, so not really
  • Status update on sponsored credits
    • Spend still low, AWS spend has increased significantly due to more arch-specific builds
    • Hugo: There are some tricks that can make this happen (CPU-specific partitions)
      • Azure are disabling Haswell today!
  • Optimized software installations in place for aarch64/altra (Arm Ampere Altra instances in Azure)
    • partial, some software installations are still missing due to unsbuild problems, so work-in-progress
      • Have a couple of failing dependencies due to an update in the compat layer (not related to Altra)
      • ARM is important for EESSI because it's getting more attention in the HPC space
        • see upcoming NVIDIA Grace CPUs for example, or A64FX in Fugaku (Japan), Deucalion (EuroHPC, Portugal)
  • MultiXscale EuroHPC project is now on the rails
  • Primary goals for EESSI in coming weeks/months
    • New EESSI pilot version (2023.04)
      • slowly getting past problems that presented themselves in bootstrapping Gentoo Prefix
        • Will manually do it for now, and fix automation problems later
        • Plan to have this in place by end of month
    • Start using build-and-deploy bot for all software builds (software layer + compat layer)
    • Extend software stack
      • more recent toolchains
      • apps relevant for MultiXscale (ESPResSo, waLBerla, LAMMPS), and other EuroHPC CoE's (GROMACS, OpenFOAM, ...)
      • eye catchers like AlphaFold, OpenFold, ...
    • NVIDIA GPU support
      • see https://github.com/EESSI/software-layer/pull/172
      • not knowing which GPU driver version is available implies possible need for CUDA compat libraries
        • compat libraries are installed into /opt/eessi, which gets symlinked into EESSI software stack
        • write permissions to /opt/eessi needed
      • Hugo has a more direct way to figure out GPU driver version, without relying on nvidia-smi
      • Actual problem is that we need a way to deal with a range of potential GPU driver versions
      • We also use nvidia-smi to check which CUDA versions are compatible with current installation
      • Current approach doesn't work in a container - we'll need some help to figure that out
    • EESSI test suite
      • https://github.com/EESSI/test-suite, using ReFrame
      • focus on portability of test suite + performance
      • currently only a GROMACS test, which will serve as a blueprint for other tests
      • working on tests for OSU Microbenchmarks, TensorFlow, ...
    • Add support for customising EESSI initialisation (including enabling tracking support)
      • different variants of EESSI init script can be provided
      • sites (like Azure) providing EESSI can opt-in to using an init script that does usage tracking (by leveraging "variant symlink" feature in CVMFS)
      • /cvmfs/pilot.eessi-hpc.org/latest/init/bash -> /cvmfs/pilot.eessi-hpc.org/latest/init/bash_tracking_azure
      • specific init script can also set additional environment variables required in Azure environment (related to interconnect for example, etc.)
    • Tutorials
      • EESSI introduction at EasyBuikd-EESSI UK workshop next week (27+28 April 2023)
      • EESSI introductory tutorial at HPC Knowledge Meeting 2023 in Barcelona (17-18 May 2023 - https://hpckp.org/annual-meeting)
      • "CernVM-FS Best Practices" online tutorial
        • fall 2023
        • in collaboration with CernVM-FS developers (hopefully)
        • focus on use of CernVM-FS on HPC systems
        • we hope that EESSI configuration is included in CernVM-FS release by then (one step less to deploy EESSI)
  • ISC'23
    • Booth talk/demo at MS Azure booth by Elisabeth (HPCNow!)
    • MultiXscale will have presence in EuroHPC booth (poster, presentation time) - also Elisabeth (HPCNow!)
  • Other
    • Hugo did a talk at HPC Advisory Council 2023, in which EESSI was mentioned

Notes of previous meetings

Notes

  • ...
Clone this wiki locally