-
Notifications
You must be signed in to change notification settings - Fork 0
Azure meeting 2023 04 17
Kenneth Hoste edited this page May 4, 2023
·
1 revision
attending:
- Laura
- Martin Brandt (SURF)
- Alan O'Caisa (CECAM)
- Hugo Meiland (Azure)
- Kenneth Hoste (HPC-UGent)
- Davide Vanzo (Azure)
- Update on OCRE project grant (by Ivar?)
- Ivar not here, so not really
- Status update on sponsored credits
- see slide 15 in https://raw.githubusercontent.com/EESSI/meetings/main/meetings/EESSI_meeting_20230406.pdf
- ~1.3k on sponsored credits spent in Mar'23
- increasing trend last couple of months (but still low usage)
- Spend still low, AWS spend has increased significantly due to more arch-specific builds
- Hugo: There are some tricks that can make this happen (CPU-specific partitions)
- Azure are disabling Haswell today!
- Optimized software installations in place for
aarch64/altra
(Arm Ampere Altra instances in Azure)- partial, some software installations are still missing due to unsbuild problems, so work-in-progress
- Have a couple of failing dependencies due to an update in the compat layer (not related to Altra)
- ARM is important for EESSI because it's getting more attention in the HPC space
- see upcoming NVIDIA Grace CPUs for example, or A64FX in Fugaku (Japan), Deucalion (EuroHPC, Portugal)
- partial, some software installations are still missing due to unsbuild problems, so work-in-progress
- MultiXscale EuroHPC project is now on the rails
- Kickoff meeting was on 20-23 March'23
- See https://www.multixscale.eu/wp-content/uploads/2023/04/MultiXscale-Kick-off-meeting_Press-Release_vf.pdf
- Poster at EuroHPC Summit, mentioning Azure as sponsor: https://www.multixscale.eu/wp-content/uploads/2023/03/32-Poster-MultiXscale.pdf
- We also got involved with CASTIEL2 coordination project for EuroHPC CoEs and NCCs
- Kickoff meeting was on 20-23 March'23
- Primary goals for EESSI in coming weeks/months
- New EESSI pilot version (2023.04)
- slowly getting past problems that presented themselves in bootstrapping Gentoo Prefix
- Will manually do it for now, and fix automation problems later
- Plan to have this in place by end of month
- slowly getting past problems that presented themselves in bootstrapping Gentoo Prefix
- Start using build-and-deploy bot for all software builds (software layer + compat layer)
- see https://github.com/EESSI/eessi-bot-software-layer
- we believe it's ready, lots of tests by NESSI
- Extend software stack
- more recent toolchains
- apps relevant for MultiXscale (ESPResSo, waLBerla, LAMMPS), and other EuroHPC CoE's (GROMACS, OpenFOAM, ...)
- eye catchers like AlphaFold, OpenFold, ...
- NVIDIA GPU support
- see https://github.com/EESSI/software-layer/pull/172
- not knowing which GPU driver version is available implies possible need for CUDA compat libraries
- compat libraries are installed into /opt/eessi, which gets symlinked into EESSI software stack
- write permissions to /opt/eessi needed
- Hugo has a more direct way to figure out GPU driver version, without relying on nvidia-smi
- Actual problem is that we need a way to deal with a range of potential GPU driver versions
- We also use nvidia-smi to check which CUDA versions are compatible with current installation
- Current approach doesn't work in a container - we'll need some help to figure that out
- EESSI test suite
- https://github.com/EESSI/test-suite, using ReFrame
- focus on portability of test suite + performance
- currently only a GROMACS test, which will serve as a blueprint for other tests
- working on tests for OSU Microbenchmarks, TensorFlow, ...
- Add support for customising EESSI initialisation (including enabling tracking support)
- different variants of EESSI init script can be provided
- sites (like Azure) providing EESSI can opt-in to using an init script that does usage tracking (by leveraging "variant symlink" feature in CVMFS)
-
/cvmfs/pilot.eessi-hpc.org/latest/init/bash
->/cvmfs/pilot.eessi-hpc.org/latest/init/bash_tracking_azure
- specific init script can also set additional environment variables required in Azure environment (related to interconnect for example, etc.)
- Tutorials
- EESSI introduction at EasyBuikd-EESSI UK workshop next week (27+28 April 2023)
- EESSI introductory tutorial at HPC Knowledge Meeting 2023 in Barcelona (17-18 May 2023 - https://hpckp.org/annual-meeting)
- "CernVM-FS Best Practices" online tutorial
- fall 2023
- in collaboration with CernVM-FS developers (hopefully)
- focus on use of CernVM-FS on HPC systems
- we hope that EESSI configuration is included in CernVM-FS release by then (one step less to deploy EESSI)
- New EESSI pilot version (2023.04)
- ISC'23
- Booth talk/demo at MS Azure booth by Elisabeth (HPCNow!)
- MultiXscale will have presence in EuroHPC booth (poster, presentation time) - also Elisabeth (HPCNow!)
- Other
- Hugo did a talk at HPC Advisory Council 2023, in which EESSI was mentioned
- https://github.com/EESSI/meetings/wiki/Azure-meeting-2023-02-17
- https://github.com/EESSI/meetings/wiki/Azure-meeting-2023-01-20
- https://github.com/EESSI/meetings/wiki/Azure-meeting-Dec-9-2022
- ...