-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting on EESSI software layer (2023 05 08)
Kenneth Hoste edited this page May 9, 2023
·
1 revision
(copied from https://hackmd.io/58wNKPaCSSS97D7U-5KB4g)
This document collects information about the EESSI/2023.04 software-layer (what toolchains and software packages it should include, how to build the software, what issues are encountered and how they have been resolved). It also serves to coordinate the building of the software layer version and keeps track of meetings/discussions around that.
Attending: Thomas, Kenneth, Alan, Lara, Jure, Bob, Richard
- 1st ingest compat layer
- staging PRs for x86_64 and aarch 2023.04 compat layer are ready to merge
- toolchains
- problem with GCC/10.3 (+ partially for 11.3 / 12.2) fixed by updating easyblock
- see https://github.com/easybuilders/easybuild-easyblocks/pull/2921
- 11.3 / 12.2 builds but fails some sanity check
- GCC 9.3 is a bigger problem, we can drop that for now
- => so use toolchains with GCC/10.3 + more recent
- OpenBLAS, generic CPU target
- requires Terje's easyblock PR
- should be replaced by a smaller PR that we can actually merge
- problem with GCC/10.3 (+ partially for 11.3 / 12.2) fixed by updating easyblock
- initial list of software
- keep same modules? only reason would be that we can still use the same versions as before
- Tensorflow (incl. SciPy-bundle), R, OpenFOAM, GROMACS, WRF
- MultiXscale: espresso, LAMMPS
- other: AlphaFold, bioinfo workflow (see BioHackathon)
- whatever EasyBuild has
- starting with most recent toolchains
- check LICENSE (not so easy with dependencies)
- how about shipping sources
- GPU support
- nothing special needed here, CUDA + UCX-CUDA deps
- CUDA compute capabilities: NVIDIA P100/V100/A100
- focusing on more recent toolchains (like 2022a/2022b)
- CPU targets
- x86_64: haswell, skylake, zen2 (Rome), zen3 (Milan), zen4 (Genoa)
- aarch64: graviton2/graviton3, ampere/altra, a64fx
- generic x86_64 + aarch64
- no ppc64le (dead end) or risc-v (too early)
- how to build software (using the bot)
- only do builds/deploys using bot, no manual builds/deploys!
- what if a package does not build for a single/some architectures
- ok to add
- add template module that explains why module is not there
- idea: bot creates an issue for failing builds (in a separate repo)
- use CVMFS hook to add module files for missing software packages
- make submission resource requirements a parameter for
bot: build
- set up missing bot instances (a64fx, altra)
- who is interested/has time to contribute?
- add software: make a PR + trigger bot to build/deploy
- figure out and fix build problems
- get bot PRs merged
- implement additional features in the bot
- expose build logs in PR
- let bot manage checklist for build targets
- install placeholder module for stuff that doesn't work for specific targets (like aarch64, etc.)
- handle inclusion of sources
- Bob, Kenneth, Lara, Alan, Richard, Thomas
- documentation for using the bot (using this hackmd first)
- what is our goal before summer 2023?
- parity with EESSI/2021.12 + espresso + gpu support (GROMACS)
- how to coordinate the work? regular sync meetings, possibly focused days/sprints or even a hackathon
- next sync: Tuesday, May 16, 09:00 CEST
- future work
- organise view on large stack
- next steps:
- (Kenneth) ingest compat layers
- (Thomas+Kenneth) document how to use the bot
- (Thomas+Kenneth) need GH account & reconfigure bot to trigger builds
- idea: let bot work based on groups rather than individual accounts
- get full toolchains in place