Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated CI driving scripts. #93

Merged
merged 23 commits into from
Dec 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
35815e8
Updated CI driving scripts. Take config out of the scripts that drive…
Oct 9, 2023
4d207a0
Altered README instructions since cronjobs don't read
Oct 9, 2023
fb7ec70
Make the version of the EESSI test suite to checkout configurable
Oct 9, 2023
032354a
Correct error in printing TEMPDIR
Oct 9, 2023
d8ab990
Added description for EESSI_CI_TESTSUITE_VERSION config item
Oct 9, 2023
b179d2d
Fixed environment variable name printed by error
Oct 10, 2023
ba9e969
We cannot just select a different version in CI, it HAS to match what…
Oct 10, 2023
09193a3
Make version of EESSI that is used by the configs configurable at run…
Oct 10, 2023
30cb4fe
Corrected typo
Oct 12, 2023
e128508
Added aws_mc configuration for ReFrame and CI
Oct 19, 2023
7717d77
Updated cluster name
Oct 20, 2023
f12cba8
Added a common EESSI init function, that returns the path to be sourc…
Oct 20, 2023
9a0fb3d
Pip upgrade pip, to resolve https://github.com/EESSI/test-suite/issue…
Oct 20, 2023
03c45ef
Define defaults in the script, since a lot of it overlaps between co…
Oct 20, 2023
187522f
Updated config files to use common initialization of the EESSI enviro…
Oct 20, 2023
827381c
Export ReFrame variables, otherwise they are not correctly picked up…
Oct 20, 2023
49b401c
forgot to remove this part, it is replaced by common_eessi_init()
Oct 20, 2023
c6b3ebc
Added Snellius CI cnofig
Nov 1, 2023
56285b7
Updated to newer reframe, for better autodetect support
Nov 10, 2023
2a5303e
Added Karolina ReFrame config, and CI config
casparvl Nov 22, 2023
15d6123
Fixing the same catch-22 as we had with aws_mc, the config first need…
casparvl Nov 23, 2023
235f580
Updated TAGS to REFRAME_ARGS, as this was also changed in run_reframe.sh
casparvl Nov 27, 2023
a6ee21f
Updated node names for Karolina
casparvl Nov 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions CI/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Setting up EESSI test suite CI

To set up regular runs for the EESSI test suite on a system, four things are needed:

1. The variable `EESSI_CI_SYSTEM_NAME` needs to be set in the environment
2. A local checkout of the `CI` subdirectory of the EESSI test suite repository needs to be present
3. The EESSI test suite repository needs to contain a file `CI/${EESSI_CI_SYSTEM_NAME}/ci_config.sh` with the configuration for the CI on that system
4. Add running the `run_reframe_wrapper.sh` to your `crontab`

## Checking out the CI folder from the EESSI test-suite
You can clone the full EESSI test suite
```
git clone https://github.com/EESSI/test-suite.git
```
Or do a sparse checkout
```
git clone -n --depth=1 --filter=tree:0 https://github.com/EESSI/test-suite.git
cd test-suite
git sparse-checkout set --no-cone CI
git checkout
```

## Creating a CI configuration file
If you are adding CI on a new system, first, pick a name for that system (we'll refer to this as `EESSI_CI_SYSTEM_NAME`). The CI config should then be in `CI/${EESSI_CI_SYSTEM_NAME}/ci_config.sh`. You can use the example in `CI/aws_mc/ci_config.sh`, and adapt it to your needs.
It should define:
- `TEMPDIR` (optional): the temporary directory in which the CI pipeline can check out repositories and install ReFrame. Default: `$(mktemp --directory --tmpdir=/tmp -t rfm.XXXXXXXXXX)`.
- `REFRAME_ARGS` (optional): additional arguments to pass to the `reframe` command. Typically, you'll use this to specify `--tag` arguments to run a subset of tests. Default: `"--tag CI --tag 1_node"`.
- `REFRAME_VERSION` (mandatory): the version of ReFrame you'd like to use to drive the EESSI test suite in the CI pipeline.
- `REFRAME_URL` (optional): the URL that will be used to `git clone` the ReFrame repository (in order to provide the `hpctestlib`). Typically this points to the official repository, but you may want to use another URL from a fork for development purposes. Default: `https://github.com/reframe-hpc/reframe.git`.
- `REFRAME_BRANCH` (optional): the branch name to be cloned for the ReFrame repository (in order to provide the `hpctestlib`). Typically this points to the branch corresponding with `${REFRAME_VERSION}`, unless you want to run from a feature branch for development purposes. Default: `v${REFRAME_VERSION}`.
- `EESSI_VERSION` (mandatory): the version of the EESSI software stack you would like to be loaded & tested in the CI pipeline.
- `EESSI_TESTSUITE_URL` (optional): the URL that will be used to `git clone` the `EESSI/test-suite` repository. Typically this points to the official repository, but you may want to use another URL from a fork for development purposes. Default: `https://github.com/EESSI/test-suite.git`.
- `EESSI_TESTSUITE_VERSION` (optional): the version of the EESSI test-suite repository you want to use in the CI pipeline. Default: latest release.
- `RFM_CONFIG_FILES` (optional): the location of the ReFrame configuration file to be used for this system. Default: `${TEMPDIR}/test-suite/config/${EESSI_CI_SYSTEM_NAME}.py`.
- `RFM_CHECK_SEARCH_PATH` (optional): the search path where ReFrame should search for tests to run in this CI pipeline. Default: `${TEMPDIR}/test-suite/eessi/testsuite/tests/`.
- `RFM_CHECK_SEARCH_RECURSIVE` (optional): whether ReFrame should search `RFM_CHECK_SEARCH_PATH` recursively. Default: `1`.
- `RFM_PREFIX` (optional): the prefix in which ReFrame stores all the files. Default: `${HOME}/reframe_CI_runs`.

## Creating the `crontab` entry and specifying `EESSI_CI_SYSTEM_NAME`
This line depends on how often you want to run the tests, and where the `run_reframe_wrapper.sh` is located exactly. We also define the EESSI_CI_SYSTEM_NAME in this entry, as cronjobs don't normally read your `.bashrc` (and thus we need a different way of specifying this environment variable).
Assuming you checked out the EESSI test suite repository in your home dir:
```
echo "0 0 * * SUN EESSI_CI_SYSTEM_NAME=aws_citc ${HOME}/test-suite/CI/run_reframe_wrapper.sh" | crontab -
```
Would create a cronjob running weekly on Sundays. See the crontab manual for other schedules.

## Output of the CI pipeline
The whole point of the `run_reframe_wrapper.sh` script is to easily get the stdout and stderr from your `run_reframe.sh` in a time-stamped logfile. By default, these are stored in `${HOME}/EESSI_CI_LOGS`. This can be changed by setting the environment variable `EESSI_CI_LOGDIR`. Again, you'd have to set this when creating your `crontab` file, e.g.
```
echo "0 0 * * SUN EESSI_CI_SYSTEM_NAME=aws_citc EESSI_CI_LOGDIR=${HOME}/my_custom_logdir ${HOME}/test-suite/CI/run_reframe_wrapper.sh" | crontab -
```
3 changes: 3 additions & 0 deletions CI/aws_citc/ci_config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Configurable items
REFRAME_ARGS="--tag CI --tag 1_node|2_nodes"
REFRAME_VERSION=4.4.1 # ReFrame version that will be pip-installed to drive the test suite
7 changes: 7 additions & 0 deletions CI/aws_mc/ci_config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Configurable items
REFRAME_ARGS="--tag CI --tag 1_node|2_nodes"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are not enough resources on aws_mc to do the 2_nodes tests.

REFRAME_VERSION=4.4.1 # ReFrame version that will be pip-installed to drive the test suite
# Latest release does not contain the `aws_mc.py` ReFrame config yet
# The custom EESSI_TESTSUITE_URL and EESSI_TESTSUITE_BRANCH can be removed in a follow-up PR
EESSI_TESTSUITE_URL='https://github.com/casparvl/test-suite.git'
EESSI_TESTSUITE_BRANCH='CI'
7 changes: 7 additions & 0 deletions CI/it4i_karolina/ci_config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Configurable items
REFRAME_ARGS="--tag CI --tag 1_node|2_nodes"
REFRAME_VERSION=4.4.1 # ReFrame version that will be pip-installed to drive the test suite
# Latest release does not contain the `aws_mc.py` ReFrame config yet
# The custom EESSI_TESTSUITE_URL and EESSI_TESTSUITE_BRANCH can be removed in a follow-up PR
EESSI_TESTSUITE_URL='https://github.com/casparvl/test-suite.git'
EESSI_TESTSUITE_BRANCH='CI'
3 changes: 3 additions & 0 deletions CI/izum_vega/ci_config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Configurable items
REFRAME_ARGS="--tag CI --tag 1_node|2_nodes"
REFRAME_VERSION=4.4.1 # ReFrame version that will be pip-installed to drive the test suite
117 changes: 117 additions & 0 deletions CI/run_reframe.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
#!/bin/bash
# Author: Caspar van Leeuwen
# Description: This script can be used to do regular runs of the ReFrame test suite, e.g. from a cronjob.
# Setup instructions: make sure you have your github access key configured in your .ssh/config
# i.e. configure an entry with HostName github.com and IdentityFile pointing to the ssh key registered with Github

# Get directory of the current script
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

# Check if EESSI_CI_SYSTEM_NAME is defined
if [ -z "${EESSI_CI_SYSTEM_NAME}" ]; then
echo "You have to define the EESSI_CI_SYSTEM_NAME environment variable in order to run the EESSI test suite CI" > /dev/stderr
exit 1
fi

# Check if CI_CONFIG file file exists
CI_CONFIG="${SCRIPT_DIR}/${EESSI_CI_SYSTEM_NAME}/ci_config.sh"
if [ ! -f "${CI_CONFIG}" ]; then
echo "File ${CI_CONFIG} does not exist. Please check your RFM_CI_SYSTEM_NAME (${EESSI_CI_SYSTEM_NAME}) and make sure the directory in which the current script resides (${SCRIPT_DIR}) contains a subdirectory with that name, and a CI configuration file (ci_config.sh) inside". > /dev/stderr
exit 1
fi

# Set the CI configuration for this system
source "${CI_CONFIG}"

# Set default configuration
if [ -z "${TEMPDIR}" ]; then
TEMPDIR=$(mktemp --directory --tmpdir=/tmp -t rfm.XXXXXXXXXX)
fi
if [ -z "${REFRAME_ARGS}" ]; then
REFRAME_ARGS="--tag CI --tag 1_node"
fi
if [ -z "${REFRAME_URL}" ]; then
REFRAME_URL='https://github.com/reframe-hpc/reframe.git'
fi
if [ -z "${REFRAME_BRANCH}" ]; then
REFRAME_BRANCH="v${REFRAME_VERSION}"
fi
if [ -z "${EESSI_TESTSUITE_URL}" ]; then
EESSI_TESTSUITE_URL='https://github.com/EESSI/test-suite.git'
fi
if [ -z "${EESSI_TESTSUITE_BRANCH}" ]; then
EESSI_TESTSUITE_BRANCH='v0.1.0'
fi
if [ -z "${EESSI_VERSION}" ]; then
EESSI_VERSION='latest'
fi
if [ -z "${RFM_CONFIG_FILES}" ]; then
export RFM_CONFIG_FILES="${TEMPDIR}/test-suite/config/${EESSI_CI_SYSTEM_NAME}.py"
fi
if [ -z "${RFM_CHECK_SEARCH_PATH}" ]; then
export RFM_CHECK_SEARCH_PATH="${TEMPDIR}/test-suite/eessi/testsuite/tests/"
fi
if [ -z "${RFM_CHECK_SEARCH_RECURSIVE}" ]; then
export RFM_CHECK_SEARCH_RECURSIVE=1
fi
if [ -z "${RFM_PREFIX}" ]; then
export RFM_PREFIX="${HOME}/reframe_CI_runs"
fi

# Create virtualenv for ReFrame using system python
python3 -m venv "${TEMPDIR}"/reframe_venv
source "${TEMPDIR}"/reframe_venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install reframe-hpc=="${REFRAME_VERSION}"

# Clone reframe repo to have the hpctestlib:
git clone "${REFRAME_URL}" --branch "${REFRAME_BRANCH}" "${TEMPDIR}"/reframe
export PYTHONPATH="${PYTHONPATH}":"${TEMPDIR}"/reframe

# Clone test suite repo
git clone "${EESSI_TESTSUITE_URL}" --branch "${EESSI_TESTSUITE_BRANCH}" "${TEMPDIR}"/test-suite
export PYTHONPATH="${PYTHONPATH}":"${TEMPDIR}"/test-suite/

# Start the EESSI environment
unset MODULEPATH
if [ "${EESSI_VERSION}" = 'latest' ]; then
eessi_init_path=/cvmfs/pilot.eessi-hpc.org/latest/init/bash
else
eessi_init_path=/cvmfs/pilot.eessi-hpc.org/versions/"${EESSI_VERSION}"/init/bash
fi
source "${eessi_init_path}"

# Needed in order to make sure the reframe from our TEMPDIR is first on the PATH,
# prior to the one shipped with the 2021.12 compat layer
# Probably no longer needed with newer compat layer that doesn't include ReFrame
deactivate
source "${TEMPDIR}"/reframe_venv/bin/activate

# Print ReFrame config
echo "Starting CI run with the follwing settings:"
echo ""
echo "TEMPDIR: ${TEMPDIR}"
echo "PYTHONPATH: ${PYTHONPATH}"
echo "EESSI test suite URL: ${EESSI_TESTSUITE_URL}"
echo "EESSI test suite version: ${EESSI_TESTSUITE_VERSION}"
echo "HPCtestlib from ReFrame URL: ${REFRAME_URL}"
echo "HPCtestlib from ReFrame branch: ${REFRAME_BRANCH}"
echo "ReFrame executable: $(which reframe)"
echo "ReFrame version: $(reframe --version)"
echo "ReFrame config file: ${RFM_CONFIG_FILES}"
echo "ReFrame check search path: ${RFM_CHECK_SEARCH_PATH}"
echo "ReFrame check search recursive: ${RFM_CHECK_SEARCH_RECURSIVE}"
echo "ReFrame prefix: ${RFM_PREFIX}"
echo "ReFrame args: ${REFRAME_ARGS}"
echo ""

# List tests
echo "Listing tests:"
reframe ${REFRAME_ARGS} --list

# Run
echo "Run tests:"
reframe ${REFRAME_ARGS} --run

# Cleanup
rm -rf "${TEMPDIR}"
22 changes: 22 additions & 0 deletions CI/run_reframe_wrapper.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash
# Author: Caspar van Leeuwen
# Description: wraps the run_reframe.sh script so that all stdout and stderr is easily be collected in a logfile
# which has a datestamp in the name.

# logfile
if [ ! -z ${EESSI_CI_LOGDIR} ]; then
LOGDIR=${EESSI_CI_LOGDIR}
else
LOGDIR=${HOME}/EESSI_CI_LOGS
fi
mkdir -p ${LOGDIR}

datestamp=$(date +%Y%m%d_%H%M%S)
LOGFILE=${LOGDIR}/rfm_${datestamp}.log
touch $LOGFILE

# Get directory of the current script
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

# Execute run_reframe.sh, which should be in the same directory as the current script
${SCRIPT_DIR}/run_reframe.sh > $LOGFILE 2>&1
3 changes: 3 additions & 0 deletions CI/surf_snellius/ci_config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Configurable items
REFRAME_ARGS="--tag CI --tag 1_node|2_nodes"
REFRAME_VERSION=4.4.1 # ReFrame version that will be pip-installed to drive the test suite
4 changes: 2 additions & 2 deletions config/aws_citc.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

import os

from eessi.testsuite.common_config import common_logging_config
from eessi.testsuite.common_config import common_logging_config, common_eessi_init
from eessi.testsuite.constants import FEATURES

# This config will write all staging, output and logging to subdirs under this prefix
Expand Down Expand Up @@ -134,7 +134,7 @@
FEATURES['CPU']
],
'prepare_cmds': [
'source /cvmfs/pilot.eessi-hpc.org/latest/init/bash',
'source %s' % common_eessi_init(),
# Required when using srun as launcher with --export=NONE in partition access, in order to ensure job
# steps inherit environment. It doesn't hurt to define this even if srun is not used
'export SLURM_EXPORT_ENV=ALL'
Expand Down
110 changes: 110 additions & 0 deletions config/aws_mc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# WARNING: for CPU autodetect to work correctly you need to
# 1. Either use ReFrame >= 4.3.3 or temporarily change the 'launcher' for each partition to srun
# 2. Either use ReFrame >= 4.3.3 or run from a clone of the ReFrame repository

# Without this, the autodetect job fails because
# 1. A missing mpirun command
# 2. An incorrect directory structure is assumed when preparing the stagedir for the autodetect job

# Related issues
# 1. https://github.com/reframe-hpc/reframe/issues/2926
# 2. https://github.com/reframe-hpc/reframe/issues/2914

import os

from eessi.testsuite.common_config import common_logging_config, common_eessi_init
from eessi.testsuite.constants import FEATURES

# This config will write all staging, output and logging to subdirs under this prefix
# Override with RFM_PREFIX environment variable
reframe_prefix = os.path.join(os.environ['HOME'], 'reframe_runs')

# AWS CITC site configuration
site_configuration = {
'systems': [
{
'name': 'Magic_Castle',
'descr': 'Magic Castle build and test environment on AWS',
'modules_system': 'lmod',
'hostnames': ['login*', '*-node'],
'prefix': reframe_prefix,
'partitions': [
{
'name': 'x86_64-generic-16c-30gb',
'access': ['--partition=x86-64-generic-node', '--export=NONE'],
'descr': 'Generic (Haswell), 16 cores, 30 GB',
},
{
'name': 'x86_64-haswell-16c-30gb',
'access': ['--partition=x86-64-intel-haswell-node', '--export=NONE'],
'descr': 'Haswell, 16 cores, 30 GB',
},
{
'name': 'x86_64-skylake-16c-30gb',
'access': ['--partition=x86-64-intel-skylake-node', '--export=NONE'],
'descr': 'Skylake, 16 cores, 30 GB',
},
{
'name': 'x86_64-zen2-16c-30gb',
'access': ['--partition=x86-64-amd-zen2-node', '--export=NONE'],
'descr': 'Zen2, 16 cores, 30 GB',
},
{
'name': 'x86_64-zen3-16c-30gb',
'access': ['--partition=x86-64-amd-zen3-node', '--export=NONE'],
'descr': 'Zen3, 16 cores, 30 GiB',
},
{
'name': 'aarch64-generic-16c-32gb',
'access': ['--partition=aarch64-generic-node', '--export=NONE'],
'descr': 'Generic (Neoverse N1), 16 cores, 32 GB',
},
{
'name': 'aarch64-neoverse-V1-16c-32gb',
'access': ['--partition=aarch64-neoverse-v1-node', '--export=NONE'],
'descr': 'Neoverse V1, 16 cores, 32 GB',
},
{
'name': 'aarch64-neoverse-N1-16c-32gb',
'access': ['--partition=aarch64-neoverse-n1-node', '--export=NONE'],
'descr': 'Neoverse N1, 16 cores, 32 GiB',
},
]
},
],
'environments': [
{
'name': 'default',
'cc': 'cc',
'cxx': '',
'ftn': '',
},
],
'logging': common_logging_config(reframe_prefix),
'general': [
{
# Enable automatic detection of CPU architecture for each partition
# See https://reframe-hpc.readthedocs.io/en/stable/configure.html#auto-detecting-processor-information
'remote_detect': True,
}
],
}

# Add default things to each partition:
partition_defaults = {
'scheduler': 'slurm',
'launcher': 'mpirun',
'environs': ['default'],
'features': [
FEATURES['CPU']
],
'prepare_cmds': [
'source %s' % common_eessi_init(),
# Required when using srun as launcher with --export=NONE in partition access, in order to ensure job
# steps inherit environment. It doesn't hurt to define this even if srun is not used
'export SLURM_EXPORT_ENV=ALL'
],
}
for system in site_configuration['systems']:
for partition in system['partitions']:
partition.update(partition_defaults)
Loading
Loading