Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access configuration objects before the setup phase #3323

Open
gkaf89 opened this issue Nov 24, 2024 · 2 comments
Open

Access configuration objects before the setup phase #3323

gkaf89 opened this issue Nov 24, 2024 · 2 comments

Comments

@gkaf89
Copy link

gkaf89 commented Nov 24, 2024

When defining tests it can be useful to have access to the contents of the setup file. Consider for instance the following site configuration.

site_configuration = { 
    'systems': [
        {
            'name': 'aion',
            'hostnames': [r'aion-[0-9]{4}'],
            'modules_system': 'lmod',
            'partitions': [
                {
                    'name': 'batch',
                    'scheduler': 'slurm',
                    'launcher': 'srun',
                    'access': ['--partition=batch', '--qos=normal'],
                    'max_jobs': 8,
                    'environs': ['builtin', 'foss2023b'],
                    'extras' : {
                        'admissible_omp_num_threads' : [1, 2, 4, 8, 16],
                    },
                },
            ],
        },
    ]
}

We want to configure a test for the performance of some software based on the number of OpenMP threads:

class performance_test(rfm.RunOnlyRegressionTest):
    num_omp_threads = parameter(current_partition.extras['admissible_omp_num_threads'])

As far as I understand the parameters are expanded before reading the configuration file, and the resulting tests are filter with the contends of the configuration file. Could we somehow use the contends of the configuration file earlier, for instance by setting a callback in the parameter definition?

@vkarak
Copy link
Contributor

vkarak commented Nov 26, 2024

As far as I understand the parameters are expanded before reading the configuration file, and the resulting tests are filter with the contends of the configuration file.

Nope, configuration is very first thing that is being resolved, before even tests are ever loaded. You can have access to the actual partition/environment combinations at the parameter definition, but this is currently through an internal interface. The plan is to expose this and add examples in the documentation. This is how you can achieve your goal:

from reframe.core.runtime import valid_sysenv_comb


def admissible_omp_num_threads(valid_systems, valid_prog_environs):
    for part, _ in valid_sysenv_comb(valid_systems, valid_prog_environs):
        yield part.extras.get('admissible_omp_num_threads', []), part


class performance_test(rfm.RunOnlyRegressionTest):
    valid_systems = ['...']
    valid_prog_environs = ['...']
    num_omp_threads = parameter(admissible_omp_num_threads(valid_systems, valid_prog_environs), fmt=lambda x: x[0])

    @run_after('init')
    def restrict_valid_systems(self):
        self.valid_systems = admissible_omp_num_threads[1]

The valid_sysenv_comb interprets the partition/environment constraints and gives you all the valid combinations for this test.
However, since the extras value will likely be different for each of the valid partitions, you need to store this information and restrict in a post-init hook the particular test variant to its corresponding system.

Since this is a recurring pattern, e.g., wanting to parameterise a test over some other system info (such as sockets, number of GPUs), it's something we would like to expose in an easier way.

@gkaf89
Copy link
Author

gkaf89 commented Nov 29, 2024

Thanks for the pointers!

The need to account for the partition complicates the process, but the valid_sysenv_comb function exports all the necessary information. I am not sure how the process can be simplified. Here is an example of how I used the interface exposed by valid_sysenv_comb.

1. The system configuration

site_configuration = {
    'general': [
        {
            'use_login_shell': True,
        }
    ],
    'systems': [
        {
            'name': 'aion',
            'descr': 'Aion cluster',
            'hostnames': [r'aion-[0-9]{4}'],
            'modules_system': 'lmod',
            'partitions': [
                {
                    'name': 'batch',
                    'descr': 'Aion batch partition',
                    'scheduler': 'slurm',
                    'launcher': 'srun',
                    'access': ['--partition=batch', '--qos=normal'],
                    'max_jobs':  8,
                    'environs': ['builtin', 'foss2023b'],
                    'extras' : {
                        'sockets_per_node' : 8,
                        'cores_per_socket' : 16,
                        'admissible_setups' : {
                          'omp_num_threads' : [1, 2, 4, 8, 16],
                          'num_nodes' : [1, 2, 4, 8, 16],
                        },
                    },
                },
            ],
        },
        {
            'name': 'iris',
            'descr': 'Iris cluster',
            'hostnames': [r'iris-[0-9]{3}'],
            'modules_system': 'lmod',
            'partitions': [
                {
                    'name': 'batch',
                    'descr': 'Iris batch partition',
                    'scheduler': 'slurm',
                    'launcher': 'srun',
                    'access': ['--partition=batch', '--qos=normal'],
                    'max_jobs':  8,
                    'environs': ['builtin', 'foss2023b'],
                    'extras' : {
                        'sockets_per_node' : 2,
                        'cores_per_socket' : 14,
                        'admissible_setups' : {
                          'omp_num_threads' : [1, 7, 14],
                          'num_nodes' : [1, 2, 4, 8, 16],
                        },
                    },
                },
            ],
        },
     ],
     ...
}

2. The tests

class PartitionExtraProperty:
  def __init__(self, part, val):
    self.partition = part
    self.value = val

  def __str__(self):
    return f"{self.value}"

def parametrize_system_partition_property(
    valid_systems,
    valid_prog_environs,
    get_system_partition_property
  ):

  partition_extra_properties = []

  for part in valid_sysenv_comb(valid_systems, valid_prog_environs):
    prop = get_system_partition_property(part)
    partition_extra_properties.append( PartitionExtraProperty(part.name, prop) )

  return partition_extra_properties

def expand_partition_property_list( partition_extra_properties_list, reduce_list ):
  partition_property_list = []
  for partition_extra_property in partition_extra_properties_list:
    partition = partition_extra_property.partition
    value_list = partition_extra_property.value
    reduced_list = reduce_list(value_list)
    for prop in reduced_list:
      yield PartitionExtraProperty( partition, prop)

def get_admissible_omp_num_threads(partition):
  return partition.extras.get('admissible_setups', None).get('omp_num_threads', [])

def get_admissible_num_nodes(partition):
  return partition.extras.get('admissible_setups', None).get('num_nodes', [])

class performance_test(rfm.RunOnlyRegressionTest):
  valid_systems = ['*']
  valid_prog_environs = ['+openmp +mpi']

  test_case = parameter()
  test_type = parameter()
  
  num_nodes = parameter()
  cpus_per_task = parameter()

  partition_num_nodes = parametrize_system_partition_property(
    valid_systems,
    valid_prog_environs,
    get_admissible_num_nodes
  )
  partition_cpus_per_task =  parametrize_system_partition_property(
    valid_systems,
    valid_prog_environs,
    get_admissible_omp_num_threads
  )

  @run_after('init')
  def restrict_valid_systems(self):
    valid_partitions = { self.num_nodes.partition } & { self.cpus_per_task.partition }
    self.valid_systems = [ f'*:{partition}' for partition in valid_partitions ]

    self.num_nodes = self.num_nodes.value
    self.cpus_per_task = self.cpus_per_task.value
...

@rfm.simple_test
class problem_size_scaling_test(performance_test):
  test_type = parameter( ['opt', 'dmc', 'vmc'] )
  test_case = parameter( ['W1', 'W5', 'W10', 'W15', 'W20', 'W25', 'W30'] )

  num_nodes = parameter(
    expand_partition_property_list(
      performance_test.partition_num_nodes,
      lambda x : x
    )
  )
  cpus_per_task = parameter(
    expand_partition_property_list(
      performance_test.partition_cpus_per_task,
      lambda x : [max(x)]
    )
  )

@rfm.simple_test
class ompmpi_ratio_test(performance_test):
  test_type = parameter( ['vmc'] )
  test_case = parameter( ['W1', 'W5', 'W10', 'W15', 'W20', 'W25', 'W30'] )

  num_nodes = parameter(
    expand_partition_property_list(
      performance_test.partition_num_nodes,
      lambda x : x
    )
  )
  cpus_per_task = parameter(
    expand_partition_property_list(
      performance_test.partition_cpus_per_task,
      lambda x : x
    )
  )

Notes

For the test parameters, I am abusing the system a bit by resetting the value of the parameter in restrict_valid_systems to remove the information about the partition and keep only the value of interest. I noticed that with this method setting the fmt entry of parameter results in errors; it seems that fmt is called before and after the @run_after('init') hook, so it will have to handle both formats. I chose to created a class PartitionExtraProperty to print the values of the parameters with its __str__ function instead of handling multiple types in fmt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

2 participants