Skip to content

Sync meeting on EESSI test suite (2024 02 01)

Kenneth Hoste edited this page Feb 1, 2024 · 1 revision

EESSI test suite sync meetings

Planning

  • every 2 weeks on Thursday at 14:00 CE(S)T
  • next meetings:
    • Thu 1 Feb'24 14:00 CET
    • Thu 15 Feb'24 14:00 CET
    • Thu 29 Feb'24 14:00 CET
    • Thu 14 Mar'24 14:00 CET

Meeting (2024-02-01)

  • test-suite on Hortense

    • Issue 2970 was opened in ReFrame 24 Aug by @vkarak.
      • Will be addressed by @boegel and Lara.
    • Linked to the following issue 68 in the test-suite.
      • Lara submits for every partition but will be discussed further.
      • Lara will make more clear in this issue that passing --partition to sbatch on command line actually solved our issue, and that we request them if ReFrame can do this (based on the --access configuration) and/or if it could be configured separately
  • OSU tests

  • CUDA modules on pure CPU partitions #101

    • Isn't that test specific? GROMACS works fine, because it does a dlopen in the code path. Executables that are dynamically linked to CUDA can not be run on non-GPU nodes.
    • Yes, but the easiest is just to say 'we never use CUDA modules on pure CPU nodes'. And its a very clear rule.
      • Satish will implement and roll back the LD_LIBRARY_PATH in OSU test
  • bot now picks up on bot/test.sh and bot/check-test.sh script in target repo

    • How do we proceed? Who?
      • We will just run TF & OSU, 1 node (or 2 core in case of OSU pt to pt), irrespective of which software was installed
      • Next step will be to filter on relevant tests that are related to the actual change in the software-layer PR.
      • Caspar will have a look at this
  • Test suite doc improvements from xin here

    • Needs another review? => Lara will have another look
  • Filter out incompatible scales #100

    • Good idea. Any idea how? Who can do it?
    • Caspar will take a stab at this
  • Discuss ReFrame meeting yesterday

    • Two options:
      • Option 1: We use the perflog mechanism that's already there, and add a field to indicate if the result should be used as reference.
        • Challenge: what if you upgrade your system? You'll have to alter the field that indicates if results are used as reference and put all those to
      • Option 2: Have ReFrame export/add performance numbers from a run to a database (e.g. passing reframe --export-references=<my_sql_database>), together with the test hash + system + partition. Then, have ReFrame read those performance number (or an average) from a query on that sql database (reframe --use-reference=<my_sql_database>)
  • Kenneth will create an issue to update the common_config so that it picks up on EESSI_CVMFS_REPO to select the right repository (based on the current environment)

  • Satish is working on OpenFOAM + ESPResSo

    • Have a look at the example for fixtures, it also contains examples of how to reuse the stage dir in the dependent tests
  • Sam will look into the httpjson perflog handler

    • docs see logging.handlers_perflog.type¶

TODO before 0.2.0 release:

  • common_config update to use EESSI_CVMFS_REPO, so that it can be used with software.eessi.io
  • GROMACS PR to software-layer
  • OSU test LD_LIBRARY_PATH removed and implement hook for filtering CUDA module based tests on pure CPU nodes

Meeting (2024-01-18)

  • test-suite on Hortense

    • Issue 2970 was opened in ReFrame 24 Aug by @vkarak.
      • Will be addressed by @boegel and Lara.
    • Linked to the following issue 68 in the test-suite.
      • Lara submits for every partition but will be discussed further.
  • Merged #96 which adds --mem to configuration files

    • This was done for each partition and can be done commonly for all partition.
      • Caspar: Agreed, should be in some common config. Who will do it? How?
        • Maybe a options: eessi.testsuite.common_config.get_common_options() would be enough?
  • OSU tests

    • Sam reviewed, comments need to be checked by Satish
      • https://github.com/EESSI/test-suite/pull/54#discussion_r1451741808 .
      • Running CUDA modules on the pure CPU nodes using stubs:
        • Currently, CUDA module generating pure cpu test will fail on cpu nodes.
        • Currently, remove the cpu tests from CUDA modules.
        • GROMACS CUDA module runs on CPU devices without complaining where as OSU crashes.?
        • Should we not allow running CUDA modules on CPU nodes at all?
        • Currently not a blocker, but open an issue.
      • 32 GB of memory for point to point tests is too much.
        • Contact OSU for checking this and also better error reporting.
        • Currently not a blocker, but open an issue.
        • Play with this option: -M, --mem-limit SIZE set per process maximum memory consumption to SIZE bytes
      • Install CUDA OSU module, talk to Snellius system admins and get an update on Caspar's request.
    • Lara tested on Hortense CPU, had issues on GPU but those seemed not specific to OSU.
    • Merge now including collectives and figure out the problems later.
    • Hand the test-suite to other partners.
  • bot now picks up on bot/test.sh and bot/check-test.sh script in target repo

    • currently as part of the build phase, in build environment
    • bot is ready but not doing anything for now: OSU and TensorFlow good candidates.
    • GROMACS tests have been failing.
  • Xin tested docs to see if it was clear how to run (tested on Snellius)

  • MultiXscale deliverable finished and is online.

  • goals for next weeks

    • Sam/Satish: finish OSU PR
    • Sam
      • CUDA samples
      • maybe port over test from VUB test suite to EESSI test suite
    • Kenneth:
      • maybe look into GROMACS CI test
    • Xin:
      • docs
      • Espresso test
    • Satish
    • fix GROMACS CI test when there's too many cores
      • skip if there's too many cores available per node
      • print message that there's too many cores available, give useful suggestion

Previous meetings

Clone this wiki locally