Skip to content

Brainstorm meeting software deployment Nov 24th 2021

Kenneth Hoste edited this page Nov 26, 2021 · 3 revisions

Attendees

  • Caspar van Leeuwen (SURF)
  • Kenneth Hoste (HPC-UGent)
  • Ahmad Hesam (SURF)
  • Alan O'Caisa (JSC)
  • Bob Dröge (RUG)
  • Frank Everdij (TU Delft)
  • Hugo Meiland (Microsoft)
  • Jure Pečar (EMBL)
  • Sabry Razick (Univ. of Oslo)
  • Jörg Sassmannshausen (GSTT-NHS, UK)
  • Sébastien Moretti (SIB, Lausanne)
  • Terje Kvernes (Univ. of Oslo)
  • Victor Holanda (CSCS)
  • Thomas Röblitz (Univ. of Bern)
  • Guilherme Amadio (CERN, Gentoo)

Notes

slides available at https://github.com/EESSI/meetings/blob/main/meetings/EESSI_brainstorm_software_deployment_testing_2021-11-24.pdf

Software tests

  • writing our own tests vs test suite that comes with software

    • test suite is often not reliable enough (see PyTorch)
    • we aim for a good test (or a handful) to assess whether the installation works
      • not all possible application features
  • deployment pipeline

    • testing on build node
      • mostly smoke tests (certainly single-node)
        • essentially: rerun EasyBuild sanity checks (eb --sanity-check-only)
        • this should boil down to a single parameterized test implementation in ReFrame!
          • ReFrame can talk to EasyBuild as a library (from easybuild import ...)
        • do we try and get smoke tests contributed to EasyBuild, so we can just leverage them in EESSI?
        • where's the line for tests done during sanity check in EasyBuild?
          • 10min is rather long, but there are examples in that range (OpenFOAM, WRF)
          • most commands run during sanity check are things like --help, --version, python -c 'import example'
      • also small application tests
        • rule of thumb: single-node, runs in < 10min
      • launch with couple of different containers (different OS's)
        • build/install in CentOS, test in Ubuntu container
      • test logs should be available publicly so people can debug/fix problems
      • can we give contributors access to VMs to debug problems we run into?
  • policies

    • do we make providing a test a strict requirement?
      • we can only assess whether the installation works if there's a (good) test
      • where do we draw the line (user-facing apps vs dependencies, libraries, build tools, ...)
    • overview of supported architectures and software
      • what works where, and what not (+ pointer to issue with details on the problem)
  • tag of tests

    • uses to filter the tests: what should be run
    • at CSCS: migrating to daily/weekly/monthly
    • different groups of tags:
      • build to run on build node
      • daily/weekly/monthly
      • gpu-nvidia, gpu-amd, multi-node, single-node, single-core
      • CPU family: arch:x86_64, arch:aarch64, arch:ppc64le
        • what does it actually mean to tag a test with x86_64?
  • How can systems contribute to testing?

    • We develop a github App that can run on a login node/VM
      • Waits for events e.g. approved PRs
    • We have a login node in the cloud
      • CitC setup for aarch64
  • GitHub App to react to "events" (pull requests, etc.)

  • TODO before hackathon

    • set up resources
      • CitC cluster in AWS
      • Magic Castle cluster in Azure
      • CPU-only Slurm VM cluster at CSCS (Victor)
      • GPU nodes in AWS/Azure
    • break-down into smaller tasks
      • implementing tests: EB sanity check, GROMACS, TensorFlow, OpenFOAM, R, compat layer
      • define tags
      • GitHub Apps for build node, running tests
      • create issues for these tasks
    • flesh out template code for GitHub Apps (Kenneth, Bob)
      • empty Python functions to fill in
    • prepare a good example/template test for GROMACS (Caspar, Victor)
    • prepare presentation on how to write portable tests (Victor or Vasilious)
    • who works on what during EESSI hackathon:
      • tests: Victor (Jan'22), Caspar, Hugo (larger tests), Frank (Jan'22)
      • GitHub Apps: Kenneth, Bob
      • CitC on Azure: Hugo
      • Jure: archiving of installations into container, etc.

Links

Clone this wiki locally