-
Notifications
You must be signed in to change notification settings - Fork 0
meeting 2024 04 04
Kenneth Hoste edited this page Jun 6, 2024
·
3 revisions
- date & time: Thu 4 Apr 2024 - 14:00 CEST (12:00 UTC)
- (every first Thursday of the month)
- venue: (online, see mail for meeting link, or ask in Slack)
- agenda:
- Quick introduction by new people
- EESSI-related meetings and events in last month
- Progress update per EESSI layer
- Update on EESSI production repository software.eessi.io
- Update on EESSI test suite + build-and-deploy bot
- EESSI support portal
- AWS/Azure sponsorship update
- Update on MultiXscale EuroHPC project
- Upcoming/recent events: EuroHPC Summit + EasyBuild User Meeting 2024 + ISC’24
- Q&A
(by Bob, Kenneth)
- Craig Gross: Research Consultant, Michigan State University
- making EESSI available on very heterogeneous cluster (incl. Grace Hopper, using about five EESSI CPU targets)
- rebuilding everything on top of EESSI
(see slides)
- "CernVM-FS + EESSI tutorial for EuroHPC hosting entities" is a nice getting-started tutorial for people who want to make EESSI available on their cluster/infrastructure
- worth looking into making oneAPI available
(see slides)
- maybe an opportunity to work together with Canadian CernVM-FS experts on a public Ansible role for CernVM-FS clients/servers/proxies?
- (Hugo) is exposing EESSI directly through S3/Azure blob possible, or is a CernVM-FS "frontend" required
- 50-60 regions in Azure, would be nice to only do this via blob
- client can directly access repository contents via S3/Azure blob
- would be good to document this use case
-
riscv.eessi.io
repository is a development repository, we retain the freedom to remove stuff from here...
(see slides)
- (Hugo) interest in notes on getting StarFive VisionFive 2 set up
- Thomas has some notes on this that he can share
- good place to ask for help is #riscv channel in EESSI Slack
(see slides)
- we should start actively discouraging use of
pilot.eessi-hpc.org
- init script should refuse to set up environment unless you're actively opting into using it
- ingesting rebuilt software requires manual intervention
- old installation has to be removed first
- this has to be done in the same transaction
- possible use case of the site-specific hooks:
- disable libfabric on systems that are hitting an OpenMPI bug with new versions of OFED
- what do we do with software that doesn't allow you to build fat binaries supporting different CUDA compute capabilities?
- can we detect GPU architecture, supported CUDA compute capabilities?
- support for installing CUDA compatibility libraries when required is not in place yet
- (Hugo) what is the minimal driver version that you need for CUDA software?
- this is not entirely clear, but the compatibility layers are definitely usful here. We know how to do this, but the scripts are not available (yet). May be a nice hackathon task.
- the
--from-commit
options have two advantages over--from-pr
- it's more reproducible (and secure): commits can never change
- it does not use the Github API (it only downloads the tarball for the specified commit): no more issues with hitting the GitHub API rate limits
(see slides)
- the bot should now be fully independent of what it's building
- (Alan) in the development repo we / the bot can be a bit more loose with respect to the policies
- e.g. not required that all builds for all CPU targets have succeeded
(see slides)
(see slides)
- sync server is still missing on the status page
- requires some additional work, because it's using S3 and doesn't have all the JSON files that the scraper is looking for
- overview of available software is generated using a script, so it should be easy to automatically update it (e.g. using a Github Action)
(see slides)
(see slides)
(see slides)
- A new Slurm cluster has been spun up on Azure, and we will start using it for doing Zen 4 builds.
- we will probably set up a separate branch to catch up with all the missing software installations
- NESSI has experience with adding new CPU targets
(see slides)
(see slides)
- Next meeting: Thu 2 May 2024 at 14:00 CEST (12:00 UTC)