-
Notifications
You must be signed in to change notification settings - Fork 0
Brainstorm meeting software deployment Nov 24th 2021
Kenneth Hoste edited this page Nov 26, 2021
·
3 revisions
- Caspar van Leeuwen (SURF)
- Kenneth Hoste (HPC-UGent)
- Ahmad Hesam (SURF)
- Alan O'Caisa (JSC)
- Bob Dröge (RUG)
- Frank Everdij (TU Delft)
- Hugo Meiland (Microsoft)
- Jure Pečar (EMBL)
- Sabry Razick (Univ. of Oslo)
- Jörg Sassmannshausen (GSTT-NHS, UK)
- Sébastien Moretti (SIB, Lausanne)
- Terje Kvernes (Univ. of Oslo)
- Victor Holanda (CSCS)
- Thomas Röblitz (Univ. of Bern)
- Guilherme Amadio (CERN, Gentoo)
slides available at https://github.com/EESSI/meetings/blob/main/meetings/EESSI_brainstorm_software_deployment_testing_2021-11-24.pdf
-
writing our own tests vs test suite that comes with software
- test suite is often not reliable enough (see PyTorch)
- we aim for a good test (or a handful) to assess whether the installation works
- not all possible application features
-
deployment pipeline
- testing on build node
- mostly smoke tests (certainly single-node)
- essentially: rerun EasyBuild sanity checks (
eb --sanity-check-only
) - this should boil down to a single parameterized test implementation in ReFrame!
- ReFrame can talk to EasyBuild as a library (
from easybuild import ...
)
- ReFrame can talk to EasyBuild as a library (
- do we try and get smoke tests contributed to EasyBuild, so we can just leverage them in EESSI?
- where's the line for tests done during sanity check in EasyBuild?
- 10min is rather long, but there are examples in that range (OpenFOAM, WRF)
- most commands run during sanity check are things like
--help
,--version
,python -c 'import example'
- essentially: rerun EasyBuild sanity checks (
- also small application tests
- rule of thumb: single-node, runs in < 10min
- launch with couple of different containers (different OS's)
- build/install in CentOS, test in Ubuntu container
- test logs should be available publicly so people can debug/fix problems
- can we give contributors access to VMs to debug problems we run into?
- mostly smoke tests (certainly single-node)
- testing on build node
-
policies
- do we make providing a test a strict requirement?
- we can only assess whether the installation works if there's a (good) test
- where do we draw the line (user-facing apps vs dependencies, libraries, build tools, ...)
- overview of supported architectures and software
- what works where, and what not (+ pointer to issue with details on the problem)
- do we make providing a test a strict requirement?
-
tag of tests
- uses to filter the tests: what should be run
- at CSCS: migrating to
daily
/weekly
/monthly
- different groups of tags:
-
build
to run on build node -
daily
/weekly
/monthly
-
gpu-nvidia
,gpu-amd
,multi-node
,single-node
,single-core
- CPU family:
arch:x86_64
,arch:aarch64
,arch:ppc64le
- what does it actually mean to tag a test with
x86_64
?
- what does it actually mean to tag a test with
-
-
How can systems contribute to testing?
- We develop a github App that can run on a login node/VM
- Waits for events e.g. approved PRs
- We have a login node in the cloud
- CitC setup for aarch64
- We develop a github App that can run on a login node/VM
-
GitHub App to react to "events" (pull requests, etc.)
- based on @boegel's work for
boegelbot
test bot for EasyBuild - https://github.com/boegel/boegelbot/blob/main/app/app.py
- separate apps for:
- build node
- running ReFrame
- break down into smaller tests
- run EasyBuild to install
- create tarball
- run ReFrame
- stage tarball
- based on @boegel's work for
-
TODO before hackathon
- set up resources
- CitC cluster in AWS
- Magic Castle cluster in Azure
- CPU-only Slurm VM cluster at CSCS (Victor)
- GPU nodes in AWS/Azure
- break-down into smaller tasks
- implementing tests: EB sanity check, GROMACS, TensorFlow, OpenFOAM, R, compat layer
- define tags
- GitHub Apps for build node, running tests
- create issues for these tasks
- flesh out template code for GitHub Apps (Kenneth, Bob)
- empty Python functions to fill in
- prepare a good example/template test for GROMACS (Caspar, Victor)
- prepare presentation on how to write portable tests (Victor or Vasilious)
- who works on what during EESSI hackathon:
- tests: Victor (Jan'22), Caspar, Hugo (larger tests), Frank (Jan'22)
- GitHub Apps: Kenneth, Bob
- CitC on Azure: Hugo
- Jure: archiving of installations into container, etc.
- set up resources
- hosting of input files for tests in CerVM-FS?
- testing for compat layer:
- Jörg's repository with containers for software installations
- ReFrame test suite