Skip to content

Azure meeting 2024 09 18

Kenneth Hoste edited this page Sep 18, 2024 · 1 revision

EESSI/Azure sync meeting 2024-09-18

Attendees

  • Laura Parry, Davide Vanzo (MS)
  • Kenneth Hoste (HPC-UGent, EESSI)
  • Alan O'Cais (CECAM, EESSI)
  • Martin Brandt (SURF)

Notes

  • current status on EESSI (end of Aug'24):
    • ~388 different software projects installed
    • ~730 software installations per CPU target (multiple software versions/configurations)
    • ~6,500 software installations in total (across all CPU targets)
    • actively looking into builds of CUDA software for NVIDIA GPUs
      • weeks not months away!
  • sponsored credits usage
    • significant increase in recent months: in Aug'24 ~3.2k EUR
    • used for:
      • bot build clusters (Slurm cluster via Magic Castle)
      • Stratum-1 mirror server for EESSI in us-east
      • GitHub runners for building container images (when needed)
  • 2 Slurm clusters for EESSI in Azure (created with Magic Castle)
    • one for building for x86_64/amd/zen4 CPU target in EESSI (AMD Genoa)
    • one for dev.eessi.io use case
  • which SKU to use to get always get AMD Rome (zen2)?
    • for AMD Genoa (zen4), we use Standard_HB176-24rs_v4
    • is there an architecture mapping available somehow/somewhere?
      • yes: Standard_HB120rs_v2
      • HPv3 is Milan
      • all HPC specific SKUs (HB,HC) guarantee a single CPU uarch
      • HC is guaranteed to be Skylake (no Icelake)
  • Sometimes seeing failures to bring up Zen4 nodes
  • Alan was looking into splitting up things more fine-grained
  • Martin: how will we proceed beyond 2024?
    • ~35k of ~50k budget allocated for EESSI spent for now
    • Martin will also ask Ivar @ SURF
    • send input to Laura to transform into a compelling business case
  • Supercomputing'24
    • attended by HPCNow! (Elisabeth & co)
  • ideas/contributions for EESSI blog are welcome: https://www.eessi.io/docs/blog
  • some vague plans to do a scaling study with OpenFOAM
    • large multi-node runs, evaluate impact of providing OpenFOAM via CernVM-FS (OS jitter effect?)
    • could also run performance scaling experiments in an Azure cluster
    • using inputs available via https://exafoam.eu/benchmarks
  • next call: Mon 21 Oct 2024 16:30 CEST
    • clashes with EuroHPC User Day
    • (Laura) reschedule to Thu or Fri 24-25 Oct
    • Mon 18 Nov clashes with SC24, will also need to reschedule

Notes of previous meetings

Clone this wiki locally