-
Notifications
You must be signed in to change notification settings - Fork 0
meeting Nov 4 2021
Bob Dröge edited this page Nov 4, 2021
·
6 revisions
- date & time: Thu November 4th 2021 - 2pm CET (13:00 UTC)
- (every first Thursday of the month)
- venue: (online, see mail for meeting link, or ask in Slack)
- agenda:
- Quick introduction by new people
- EESSI-related meetings in last month
- 2021.06 version of pilot repository
- Testing of pilot version 2021.06
- Progress update per EESSI layer
- Infrastructure status and updates
- AWS/Azure sponsorship update
- Update on EESSI journal paper
- EESSI risk analysis
- Upcoming events
- Q&A
(by Alan, Bob)
- Quick introduction by new people
- Ahmad from SURF
- Making EESSI available by default in the cloud
- Plan is have own repo for SURF
- Michael Hubner from UniBonn
- Bartosz Kostrzewa also from UniBonn
- Plan to contribute on behalf HPC-NRW
- Michael will be doing a lot of the work
- Interested in CUDA compatibility and testing infrastructure
- Hugo Meiland from Azure
- Ahmad from SURF
- EESSI-related meetings in last month
- Oct. 12 CernVM-FS coordination
- New release, 2.9, soon
- Discussion on IP vs. DNS entries in client configuration
- What happens with a DNS outage?
- Both ways have pros and cons
- Better not to have both as you should only have 5-10 Stratum1 servers
- Their main Stratum 1 serves tens of TB a month
- Can we have private Stratum 1 in our own config?
- Yes, this is a good idea
- Should document this (some information already in the wiki, probably needs updating)
- Ahmad willing to help with this
- Oct. 12 CernVM-FS coordination
- 2021.06 version of pilot repository
- 2021.06 is the default (2021.03 is gone)
- Also includes a Zen 3 stack
- Next pilot version
- NVidia/CUDA support
- Some thought and effort gone into this already
- Alan has already built CUDA software on top of EESSI
- Script required to put drivers in the right place so that they are picked up by EESSI
- This process can be useful to also install CUDA and driver compatibility libraries
- Linker or compiler wrappers
- More software
- NVidia/CUDA support
- Testing of pilot version 2021.06
- Hugo has been doing a bit of this recently
- Installation works just fine
- Potential points for end-user improvements
- Initialisation script is good, but you don't get this until you list the directory
- Building WRF
- Some warnings from OpenMPI, which were being picked up by EB and causing the build to fail
- Fixed in 4.1 so newer toolchain will probably fix this
- Some improvements to archspec to detect the interconnect which could be leveraged by EESSI
- Needed some hand-holding to get things started, would be great to get this documented/scripted
- What about the Intel compiler?
- Maybe we can treat that like the CUDA idea?
- Azure has good contacts, could try to make this smoother
- Azure are allowed deliver CUDA in their open images
- Thomas tested installing some R packages
- Using the packages failed "GLIBC_2.33 not found"
- Because we don't have a linker wrapper
- Compute Canada does this
- Using a linker wrapper is a sizeable change in behaviour, worth a dedicated meeting!
- Needed to fix some SELinux stuff to get EESSI to run (but that might be image specific)
- Using the packages failed "GLIBC_2.33 not found"
- Hugo has been doing a bit of this recently
- Progress update per EESSI layer
- filesystem layer
- Another Stratum 1 on FENIX, but not in client configuration yet
-
deb
/yum
repos created for client configuration packages- can also use this for CVMFS client packages for POWER/Arm (which we currently have to build)
- Fixed issue with automated ingestion
- compatibility layer
- No security updates
- Removed 2021.03 tests from GitHub actions
- software layer
- Some improvements to the build script
- Make sure temporary directory is available
-
upper
directory needs to have support for extended attributes, now warns if this is not the case
- Zen3 stack on AMD Milan in Azure
- Some changes in archspec that require changes in our detection scripts
- Some improvements to the build script
- filesystem layer
- Infrastructure status and updates
- Yum and apt repositories now available at
- Node is completely ephemeral (pulls from GitHub on creation and updates every hour)
- No meta package for deb (yet)
- Note that config repo does not require
cvmfs
package, we should probably look at that- If CernVM-FS don't want to distribute packages for certain archs, we could do that though
- Could also be used to create an EESSI init package that puts something in
/usr/local/bin
- Yum and apt repositories now available at
- AWS/Azure sponsorship update
- AWS
- Spent about 861$ of AWS credits
- ~7k spent
- 18K remaining, credits expire at end of January!
- Azure
- Spent about 550eur
- AWS
- Update on EESSI journal paper
- No major updates
- Deadline approaching quickly, progress being made
- If you want to read it, let us know!
- EESSI risk analysis (Thomas)
- Starting looking at this for NESSI (Norwegian project)
- First assessment of risk plus initial feedback collected
- Long list
- nothing sensitive, can share this (just ask)
- lots of risks relate primarily to CernVM-FS
- FENIX
- Got resources last year at the second attempt
- This year we rehashed last year's proposal
- Swift storage only became available a few weeks ago
- If it gets approved we will get access from the 1st of January
- Upcoming events
- EESSI talk at SC21 during HPC System Testing BoF
- Computing Insight UK in December
- Waiting for decision
- Compute Canada will present at the Packing conference (EESSI get's a mention)
- Q&A
- Resources on AWS
- Can we use those to emulate ARM/Power?
- AWS has ARM
- No Power though, could use Qemu
- We have access to Power VMs in the US
- Alan: Could repeat the scaling tests that we did for GROMACS for the paper
- Create a big Magic Castle cluster which includes EFA fabric support for this
- Can we use those to emulate ARM/Power?
- Will send out doodle for meeting regarding next pilot version
- Resources on AWS