Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new hook to filter invalid scales based on features set in the config file #111

Merged
merged 16 commits into from
Feb 13, 2024

Conversation

casparvl
Copy link
Collaborator

@casparvl casparvl commented Feb 1, 2024

This avoids issues on Snellius GPU, where partially allocating multiple nodes is not allowed by the Slurm configuration.

@casparvl casparvl marked this pull request as draft February 1, 2024 17:04
@casparvl
Copy link
Collaborator Author

casparvl commented Feb 1, 2024

It's ready to be reviewed/tested, but not merged, since I didn't change all the other configs yet (they should all get a + list(SCALES.keys()), on their features). Whoever tests this can add that the the config of their system manually, and/or filter like I did for the Snellius GPU partition to see if that works properly for you.

@casparvl
Copy link
Collaborator Author

casparvl commented Feb 1, 2024

Ah, and we should also include it in the config for the CI (which is why it is now failing :))

@casparvl
Copy link
Collaborator Author

casparvl commented Feb 2, 2024

ok, I updated the features for all configs, including for github-actions. Let's see if the CI passes now again.

Copy link
Collaborator

@smoors smoors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tested this for the gromacs test, and it seems to work nicely.

however, the way we are currently using valid_systems is not very logical, and prone to mistakes, because we use [] both for the begin state (no filtering yet) and the end state (this test should not run).

i propose to change it as follows:

  • we start out with setting valid_systems to ['*'] in the class attribute of the test i.e. no filtering.
  • in the hooks whenever valid_systems is [] we know the test should be filtered out (not run).
  • in the hooks, if valid_systems is [*] and we want to filter, we replace it with [<filter>]

eessi/testsuite/hooks.py Outdated Show resolved Hide resolved
@casparvl
Copy link
Collaborator Author

casparvl commented Feb 3, 2024

in the hooks, if valid_systems is [*] and we want to filter, we replace it with []

Yeah I actually thought about something like this as well, and then thought "What if someone is stupid enough to set this as system name?". But I agree, it's probably a better way to set it to a non-empty value. And note to self: I think the elegant way to do this is to set it to a constant that we define in eessi.testsuite.constants. That way we can name the constant something sensible (like FILTER) and make the value of that constant something that no one would ever set as system name ("FILTER=thisystemnamewillalwaysbefiltered" or whatever :D)

@smoors
Copy link
Collaborator

smoors commented Feb 3, 2024

Yeah I actually thought about something like this as well, and then thought "What if someone is stupid enough to set this as system name?". But I agree, it's probably a better way to set it to a non-empty value. And note to self: I think the elegant way to do this is to set it to a constant that we define in eessi.testsuite.constants. That way we can name the constant something sensible (like FILTER) and make the value of that constant something that no one would ever set as system name ("FILTER=thisystemnamewillalwaysbefiltered" or whatever :D)

i'm not sure i understand your point.

what i meant is to change this line to valid_systems = ['*']:

['*'] has a special meaning in reframe: The test is valid for any system, which is exactly what it should be initially because we didn't do any filtering yet.
also, for system names Only alphanumeric characters, dashes (-) and underscores (_) are allowed.

@casparvl
Copy link
Collaborator Author

casparvl commented Feb 5, 2024

My point was mostly about your third point:

in the hooks, if valid_systems is [*] and we want to filter, we replace it with []

I.e. instead of setting it to empty [] to filter, set it to a value that will never ever be a valid system name (as in my example). But, now I also understood what you meant with

we start out with setting valid_systems to ['*'] in the class attribute of the test i.e. no filtering.

I.e. you want to actually change the 'default' valid_systems as set in the class attribute for all tests.

I'm actually fine with both changes. I'll give a try to implementing it :)

@smoors
Copy link
Collaborator

smoors commented Feb 5, 2024

I.e. instead of setting it to empty [] to filter, set it to a value that will never ever be a valid system name (as in my example).

ok, now i understand. we could actually do both to avoid possible confusion :)

Caspar van Leeuwen added 5 commits February 5, 2024 15:58
… empty, explicitely set it equal to this constant. That way we know (and can test in other hooks) that it was explicitely filtered out as an invalid test by one of our hooks
@casparvl
Copy link
Collaborator Author

casparvl commented Feb 5, 2024

@smoors how about this? I'm not sure changing the default really helps much, it just gives me one more thing to check on (which I now generalized in the internal helper function _set_or_append_valid_systems). The reason you need to overwrite '*' in these cases btw is that appending a feature to it leads to an invalid system name. I.e. [* +gpu] does not match any system and all my tests were filtered.

In any case, the correct set of tests gets generated for me. For the sake of testing, I excluded 1_cpn_2_nodes from the valid scales for the rome partition in my ReFrame config:

valid_scales_snellius_rome = [s for s in SCALES if s not in ['1_cpn_2_nodes', '1_cpn_4_nodes']]
...
'partitions': [
    {
        'name': 'rome',
        ...
       'features': [
           FEATURES[CPU],
       ] + valid_scales_snellius_rome,
...

I then get:

$ reframe -n GROMACS.*2021a$ --tag CI --checkpath eessi/testsuite/tests/apps/gromacs.py --tag 2_nodes --run
[ RUN      ] GROMACS_EESSI %benchmark_info=HECBioSim/Crambin %nb_impl=cpu %scale=2_nodes %module_name=GROMACS/2021.3-foss-2021a /d597cff4 @snellius:rome+default
[ RUN      ] GROMACS_EESSI %benchmark_info=HECBioSim/Crambin %nb_impl=cpu %scale=2_nodes %module_name=GROMACS/2021.3-foss-2021a /d597cff4 @snellius:genoa+default
[ RUN      ] GROMACS_EESSI %benchmark_info=HECBioSim/Crambin %nb_impl=cpu %scale=1_cpn_2_nodes %module_name=GROMACS/2021.3-foss-2021a /f4194106 @snellius:genoa+default

As you can see, the 2_cpn_2_nodes scale is still generated for genoa (since I didn't exclude it there), but not for rome.

Comment on lines 299 to 300
if len(test.valid_systems) == 0:
test.valid_systems = [valid_systems]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid_systems is a required field, which means that it is always set, and we are now setting it to ['*'] in the class attribute. this means we can never have 0 items, and we can remove the first if condition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if it couldn't be set to [] on the command line, but it seems not (at least I wasn't able to do it). Nevertheless: while we now have [*]as default valid systems by convention, this is something we set in the test class itself. If someone creates a test class with [] as default (which can easily happen), at least the current hook still works.

I guess it depends on how 'hard' we want to enforce ['*'] as default. If we really want to enforce that, having this hook fail could be one way to figure it out. But then the == 0 case would just have to lead to a hard error or something.

What do you think, do we want to be that strict? Or do you prefer to keep the current version, which would also make the [] default work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i’m thinking of the case where someone sets it to [] somewhere outside the class attributes, meaning that the test should not run due to some condition. in that case we should respect that, and not override it or fail. so i think it should always be ['*'] in the class attributes, but we should not enforce it in this hook.

i don’t think it is a big problem if someone sets it accidentally to [] in the class attribute, because in that case no test will run, so it’s clear something is wrong with the test.

if we really want to enforce it, a solution could be to set it to ['*'] in a class that inherits from RegressionMixin and make sure every test class inherits from both RunOnlyRegressionTest and the custom RegressionMixin child class, but of course then the question is how to enforce that, and maybe it makes things overly complex.
see here for an example: https://reframe-hpc.readthedocs.io/en/stable/tutorial_advanced.html#grouping-parameter-packs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don’t think it is a big problem if someone sets it accidentally to [] in the class attribute, because in that case no test will run, so it’s clear something is wrong with the test.

That's a good point actually

if we really want to enforce it, a solution could be to set it to ['*'] in a class that inherits from RegressionMixin and make sure every test class inherits from both RunOnlyRegressionTest and the custom RegressionMixin child class, but of course then the question is how to enforce that, and maybe it makes things overly complex.
see here for an example: https://reframe-hpc.readthedocs.io/en/stable/tutorial_advanced.html#grouping-parameter-packs

Yeah, I also thought maybe we should use Mixin classes more, after hearing e.g. Vasileios mention he also did that for his test library. I guess it might avoid more code duplication compared to the hooks and allows you to do default stuff like this. Enforcing it could be reasonably simple in the CI of the test suite by parsing the test file (it has to contain something like class classname(..., my_mixin_class). Well, not something I'm going to dive into now, but it might make things cleaner. If we want to move in that direction, we should do it before we have a million tests...

eessi/testsuite/hooks.py Outdated Show resolved Hide resolved
@smoors
Copy link
Collaborator

smoors commented Feb 5, 2024

I'm not sure changing the default really helps much

that's true, but my motivation for this change is to avoid future mistakes because it's not much more logical now.

btw, i do like the addition of the helper function, as we have only 1 place to check for.

eessi/testsuite/hooks.py Outdated Show resolved Hide resolved
eessi/testsuite/constants.py Outdated Show resolved Hide resolved
eessi/testsuite/hooks.py Outdated Show resolved Hide resolved
eessi/testsuite/hooks.py Outdated Show resolved Hide resolved
eessi/testsuite/hooks.py Outdated Show resolved Hide resolved
casparvl and others added 2 commits February 13, 2024 15:00
eessi/testsuite/hooks.py Outdated Show resolved Hide resolved
Co-authored-by: Sam Moors <smoors@users.noreply.github.com>
@casparvl casparvl marked this pull request as ready for review February 13, 2024 15:35
Copy link
Collaborator

@smoors smoors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@smoors smoors merged commit d516f05 into EESSI:main Feb 13, 2024
9 checks passed
@boegel
Copy link
Contributor

boegel commented Feb 13, 2024

@casparvl Seems worthwhile to update the EESSI docs page on the test suite accordingly?

casparvl pushed a commit to casparvl/docs that referenced this pull request Feb 14, 2024
@casparvl
Copy link
Collaborator Author

Good point, done in EESSI/docs#156

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants