Skip to content

Azure meeting Nov 19 2021

Kenneth Hoste edited this page Nov 19, 2021 · 1 revision

EESSI/Azure/SURF sync meeting 2021-11-19

Agenda

Attendees

  • Bob Dröge (RUG)
  • Kenneth Hoste (HPC-UGent)
  • Hugo Meiland (MS Azure)
  • Davide Vanzo (MS Azure)
  • Martin Brandt (SURF)
  • Alan O'Cais (JSC)

Notes

Credits

  • close to 1,000 euro worth of credits spent so far (Sept - mid Nov'21)
    • Stratum-1: ~42%
    • GitHub Action runners: ~40%
      • were shut down recently by Bob
        • Martin: seems like it's still accumulating some cost (only stopped, not destroyed or decallocated)
        • Bob: would be nice to still keep them around to spin up easily
          • => deallocating is best option, only small cost for dettached disk
      • mostly used to build containers via QEMU for aarch64/ppc64le
    • Zen3 build node: ~13%

AMD zen3 software stack

  • installation for zen3 is in place and part of the 2021.06 pilot
  • would be interesting to look into Milan-X, and maybe build a separate stack
    • Hugo: in current preview, can only provide access by pinning ALL instances in that subscription to Milan-X in specific region
    • other option could be to give SSH access to a VM spun up by Hugo/Davide
    • could become separate task for Dec'21 hackathon

Plans for 2021.12 pilot

  • One of the main points is to support GPUs, in particular NVIDIA
    • AMD support requires some more work in Easybuild
    • Will use both AWS and Azure for acccess to GPU nodes
  • Better support for and documentation on building your own software on top of EESSI
  • Can also be used for the hackathon

Hackathon

  • https://hackmd.io/L763hQgRS5Shn04rAbVmWA
  • Split into two weeks, one in Dec and one in Jan
  • Three sessions per week:
    • kickoff on Monday morning
    • sync meeting on Wednesday morning
    • show+tell on Friday afternoon
  • People can pick what they would like to work on
  • AWS credits are about to expire, so it's a good opportunity to spend them on a good purpose
  • Hugo: make sure to get your quota up in AWS/Azure, especially for GPU nodes

CitC

  • good progress on adding support for Azure in CitC
  • could be another task for the hackathon?

Stratum-1 network traffic

  • definitely within limits
  • total 2GB received, 10GB sent (Oct 20-today)
  • probably mostly due to GitHub runners + Hugo's testing in Azure
  • Davide: setting up squids in the future?
    • maybe based on monitorings stats
    • spin up squids on demand based on load

Including EESSI in default Azure HPC images

  • migration to Lmod is under way
  • next up is to include CernVM-FS
  • work on variable symlink to pick up a specific subtree of EESSI would be helpful here too
  • eessi-init command to more easily get started with EESSI
    • which just sources the init script
    • options to pick up a specific CPU arch
    • could be a RPM/.deb package, Python package (to let users install it themselves)

Next meeting

  • overlaps with EESSI hackathon
  • let's give priority to that and catch up via other channels if needed
  • next EESSI/Azure/SURF meeting in Jan'22
Clone this wiki locally