Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use reframe features + fixes for various issues #11

Merged
merged 31 commits into from
Mar 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
6352623
use reframe features to select valid_systems
Dec 6, 2022
4767211
add support for specifying (a list of) modules
Dec 8, 2022
d78e2dc
add support for specifying tasks per node
Dec 8, 2022
45531b2
fix code style for eessi_utils/utils.py
Dec 12, 2022
1cac4d4
add support for specifying valid_systems
Dec 12, 2022
1ea9e18
set omp_num_threads equal to cpus_per_task
Dec 12, 2022
1c7043c
add support for setting custom environment variables
Dec 13, 2022
9d71af9
simplify valid systems logic
Dec 14, 2022
c0d2333
fix code style for eessi_utils/hooks.py
Dec 14, 2022
cc73421
also add support for specifying num_cpus_per_task
Dec 14, 2022
c4b2858
rearrange valid_systems filtering logic
Dec 14, 2022
0f77a7a
support testing non-gpu jobs on gpu nodes
Feb 12, 2023
aa7a9a7
Merge branch 'main' into features
Feb 12, 2023
33324af
use env_vars rather than variables for Reframe 4
Feb 12, 2023
7853ff1
update readme
Feb 12, 2023
2d93a78
improve variable name
Feb 12, 2023
4f9ae42
update readme
Feb 12, 2023
55088e9
Updated the logic that selects the default valid partition, module an…
Feb 14, 2023
dca8ff8
Removed comments
Feb 14, 2023
c767151
Merge pull request #1 from casparvl/updated_defaults_pr11
smoors Feb 17, 2023
b1e4c86
replace 'builtin' prog env with 'default' to avoid reframe-4 warning
Feb 18, 2023
cabf751
add custom variables module_regex_select, module_regex_skip, run_mode
Feb 18, 2023
24f22d9
revert commit adding custom variables
Feb 23, 2023
c986198
remove backup files
Feb 23, 2023
ded8cc8
properly handle gpus-per-node
Feb 23, 2023
65fcb42
update readme
Feb 23, 2023
0d2554b
limit gpus per node to the maximum available
Feb 25, 2023
3835a87
scale number of cpus per node with number of GPUs requested
Feb 28, 2023
01378f0
properly scale num_tasks_per_node and num_cpus_per_task for cpu jobs
Mar 3, 2023
fa889ec
fix typo
Mar 4, 2023
0972cc1
make resource assignment more future proof against new features
Mar 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__pycache__/
67 changes: 50 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# test-suite
A portable test suite for software installations, using ReFrame

## Getting started (@casparvl, commited 2022-12-06)
## Getting started

- install ReFrame >=3.11, <4
- install ReFrame >=4.0

- clone the test suite

Expand All @@ -18,11 +18,10 @@ git clone git@github.com:EESSI/test-suite.git
- run the tests

the example below runs a gromacs simulation using GROMACS modules available in the system,
in combination with all available system:partitions as defined in the site config file,
but skips CUDA modules in non-GPU nodes, and skips non-CUDA modules in GPU nodes
in combination with all available system:partitions as defined in the site config file

```
module load ReFrame/3.12.0
module load ReFrame/4.0.1

eessiroot=<path_to_test-suite>
eessihome=$eessiroot/eessi/reframe
Expand All @@ -34,16 +33,50 @@ PYTHONPATH=$PYTHONPATH:$EBROOTREFRAME:$eessihome reframe \
-r --performance-report
```

## Improvements in PR #11 (2022-12-14)

- features to filter out CUDA modules in non-GPU nodes and non-CUDA modules in GPU nodes
- requires adding `features` `cpu` and/or `gpu` to the partitions in the site config file
- support for specifying modules
- via `--setvar modules=<modulename>`
- support for specifying systems:partitions
- via `--setvar valid_systems=<comma-separated-list>`
- support for overriding tasks, cpus
- via `--setvar num_tasks_per_node=<x>` and/or `--setvar num_cpus_per_task=<y>`
- support for setting additional environment variables
- via `--setvar variables=<envar>:<value>`
## Configuring GPU/non-GPU partitions in your site config file:

- running GPU jobs in GPU nodes
- add feature `gpu` to the GPU partitions

- running non-GPU jobs in non-GPU nodes
- add feature `cpu` to the non-GPU partitions

- running GPU jobs and non-GPU jobs on gpu nodes
- add both features `cpu` and `gpu` to the GPU partitions
```
'features': ['cpu', 'gpu'],
```

- setting the number of GPUS per node <x> for a partition:
```
'access': ['-p <partition_name>'],
'devices': [
{'type': 'gpu', 'num_devices': <x>}
],
```
- requesting GPUs per node for a partition:
```
'resources': [
{
'name': '_rfm_gpu',
'options': ['--gpus-per-node={num_gpus_per_node}'],
}
],
```

## Changing the default test behavior on the cmd line

- specifying modules
- `--setvar modules=<modulename>`

- specifying systems:partitions
- `--setvar valid_systems=<comma-separated-list>`

- overriding tasks, cpus, gpus
- `--setvar num_tasks_per_node=<x>`
- `--setvar num_cpus_per_task=<y>`
- `--setvar num_gpus_per_node=<x>`

- setting additional environment variables
- `--setvar env_vars=<envar>:<value>`

24 changes: 16 additions & 8 deletions eessi/reframe/config/settings_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,49 +17,57 @@
'scheduler': 'slurm',
'launcher': 'mpirun',
'access': ['-p cpu'],
'environs': ['builtin'],
'environs': ['default'],
'max_jobs': 4,
'processor': {
'num_cpus': 128,
'num_sockets': 2,
'num_cpus_per_socket': 64,
'arch': 'znver2',
},
'features': ['cpu'],
'descr': 'CPU partition'
},
{
'name': 'gpu',
'scheduler': 'slurm',
'launcher': 'mpirun',
'access': ['-p gpu'],
'environs': ['builtin'],
'environs': ['default'],
'max_jobs': 4,
'processor': {
'num_cpus': 72,
'num_sockets': 2,
'num_cpus_per_socket': 36,
'arch': 'icelake',
},
'resources': [
{
'name': '_rfm_gpu',
'options': ['--gpus-per-node={num_gpus_per_node}'],
}
],
'devices': [
{
'type': 'gpu',
'num_devices': 4,
}
],
'features': ['cpu', 'gpu'],
'descr': 'GPU partition'
},
]
},
],
]
},
],
'environments': [
{
'name': 'builtin',
'name': 'default',
'cc': 'cc',
'cxx': '',
'ftn': '',
},
],
'logging': [
],
'logging': [
{
'level': 'debug',
'handlers': [
Expand Down
94 changes: 56 additions & 38 deletions eessi/reframe/eessi-checks/applications/gromacs_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,25 @@
# SPDX-License-Identifier: BSD-3-Clause

import reframe as rfm
from reframe.utility import find_modules
import reframe.core.runtime as rt
from reframe.utility import OrderedSet

from hpctestlib.sciapps.gromacs.benchmarks import gromacs_check
import eessi_utils.hooks as hooks
import eessi_utils.utils as utils


def my_find_modules(substr):
"""Return all modules in the current system that contain ``substr`` in their name."""
if not isinstance(substr, str):
raise TypeError("'substr' argument must be a string")

ms = rt.runtime().modules_system
modules = OrderedSet(ms.available_modules(substr))
for m in modules:
yield m


@rfm.simple_test
class GROMACS_EESSI(gromacs_check):

Expand All @@ -18,22 +31,42 @@ class GROMACS_EESSI(gromacs_check):
('n_small', 2),
('n_medium', 8),
('n_large', 16)])
module_info = parameter(find_modules('GROMACS', environ_mapping={r'.*': 'builtin'}))

omp_num_threads = 1
executable_opts += ['-dlb yes', '-ntomp %s' % omp_num_threads, '-npme -1']
variables = {
'OMP_NUM_THREADS': '%s' % omp_num_threads,
}
module_name = parameter(my_find_modules('GROMACS'))
valid_prog_environs = ['default']
valid_systems = []

time_limit = '30m'

@run_after('init')
def apply_module_info(self):
self.s, self.e, self.m = self.module_info
self.valid_systems = [self.s]
self.modules = [self.m]
self.valid_prog_environs = [self.e]
def filter_tests(self):
# filter valid_systems, unless specified with --setvar valid_systems=<comma-separated-list>
if not self.valid_systems:
is_cuda_module = utils.is_cuda_required_module(self.module_name)
valid_systems = ''

if is_cuda_module and self.nb_impl == 'gpu':
# CUDA modules and when using a GPU for non-bonded interactions require partitions with 'gpu' feature
valid_systems = '+gpu'

elif self.nb_impl == 'cpu':
# Non-bonded interactions on the CPU require partitions with 'cpu' feature
# Note: making 'cpu' an explicit feature allows e.g. skipping CPU-based tests on GPU partitions
valid_systems = '+cpu'

elif not is_cuda_module and self.nb_impl == 'gpu':
# Invalid combination: a module without GPU support cannot compute non-bonded interactions on GPU
valid_systems = ''

if valid_systems:
self.valid_systems = [valid_systems]

# skip this test if the module is not among a list of manually specified modules
# modules can be specified with --setvar modules=<comma-separated-list>
if self.modules and self.module_name not in self.modules:
self.valid_systems = []

self.modules = [self.module_name]

@run_after('init')
def set_test_scale(self):
Expand All @@ -49,32 +82,17 @@ def set_test_purpose(self):
if self.benchmark_info[0] == 'HECBioSim/hEGFRDimer':
self.tags.add('CI')

# Skip testing for when nb_impl=gpu and this is not a GPU node
@run_after('setup')
def skip_nb_impl_gpu_on_cpu_nodes(self):
self.skip_if(
(self.nb_impl == 'gpu' and not utils.is_gpu_present(self)),
"Skipping test variant with non-bonded interactions on GPUs, as this partition (%s) does not have GPU nodes" % self.current_partition.name
)

# Sckip testing when nb_impl=gpu and this is not a GPU build of GROMACS
# Assign default values for num_tasks, num_tasks_per_node, num_cpus_per_task, and num_gpus_per_node,
# based on current partition's num_cpus and gpus
# when running nb_impl on CPU, we request one task per CPU
# when running nb_impl on GPU, we request one task per GPU
@run_after('setup')
def skip_nb_impl_gpu_on_non_cuda_builds(self):
self.skip_if(
(self.nb_impl == 'gpu' and not utils.is_cuda_required(self)),
"Skipping test variant with non-bonded interaction on GPUs, as this GROMACS was not build with GPU support"
)

# Skip testing GPU-based modules on CPU-based nodes
@run_after('setup')
def skip_gpu_test_on_cpu_nodes(self):
hooks.skip_gpu_test_on_cpu_nodes(self)
def set_num_tasks(self):
hooks.assign_one_task_per_feature(test=self, feature=self.nb_impl)

# Assign num_tasks, num_tasks_per_node and num_cpus_per_task automatically based on current partition's num_cpus and gpus
# Only when running nb_impl on GPU do we want one task per GPU
@run_after('setup')
def set_num_tasks(self):
if(self.nb_impl == 'gpu'):
hooks.assign_one_task_per_gpu(test = self, num_nodes = self.num_nodes)
else:
hooks.assign_one_task_per_cpu(test = self, num_nodes = self.num_nodes)
def set_omp_num_threads(self):
omp_num_threads = self.num_cpus_per_task
# set both OMP_NUM_THREADS and -ntomp explicitly to avoid conflicting values
self.executable_opts += ['-dlb yes', f'-ntomp {omp_num_threads}', '-npme -1']
self.env_vars['OMP_NUM_THREADS'] = f'{omp_num_threads}'
Loading