Skip to content

Sync meeting on EESSI test suite (2023 11 08)

Kenneth Hoste edited this page Nov 8, 2023 · 1 revision

EESSI test suite sync meetings

Planning

  • every 2 weeks on Thursday at 14:00 CE(S)T
  • next meetings:
    • Wed 22 Nov'23 10:00 CET: OK for all
    • Wed 6 Dec'23 15:30 CET: OK for all
    • Wed 20 Dec'23 14:00 CET: OK for those who want to be there
    • Wed 3 Jan'24 15:30 CET: unclear for some, to be confirmed
    • Thu 18 Jan'24 14:00 CET

Meeting (2023-11-08)

  • OSU test (PR #54)
    • Sam reviewed it, Satish still needs to take review comments into account
    • Biggest blocker is still the memory-issue. We agreed last time that we'll go for the option of using job options as described here. We will ask for --mem, i.e. total memory, because that is supported on any job scheduler
  • Updated CI driving scripts (PR #93)
    • Todo: Caspar updates REFRAME_VERSION in all the ci-config.sh files to 4.3.3
  • CPU autodetect failing due to failing "pip install reframe-hpc==4.3.3" (ReFrame issue #3023)
    • will be fixed in upcoming ReFrame release (it's currently on the 4.5 milestone)
  • add scales 1_cpn_2_nodes and 1_cpn_4_nodes (PR #94)
    • someone should test this and make sure it works => Satish
    • job script that is generated by ReFrame can be checked via dry run
  • how can we collect/provide/dynamically determine performance reference numbers so tests can also be used for performance regression?
    • step-by-step
      • come up with a structure for storing/retrieving reference performance numbers (+ upper/lower bound thresholds) for a particular system
        • incl. relevant metadata of the system (CPU, storage, network, ...)
        • just use ReFrame perf logging for this, configured to store the perf log data like we want it to
      • provide an automated way to harvest initial reference perf numbers from recent runs of test suite
        • Create a function that produces a perf_ref + upper + lower over all entries for this unique combination of test hash, system, and $EESSI_TESTSUITE_PERF_DATA_LABEL (based on some statistics, average, SD, etc)
      • check if this could become a feature in ReFrame itself
      • nice to have: automatically collect initial perf refs if none are available
        • based on similarity of current system with systems for which data is available
        • that's likely quite difficult to do...
export EESSI_TESTSUITE_PERF_DATA_LABEL='eessi-2023.06-nov2023'
eessi/testsuite/tests/apps/tensorflow

eessi/testsuite/perf_data/apps/tensorflow/
        README.txt
        hashes.txt => mapping of hashes to test parameters
        hortense/
            rome/
                eessi-2023.06-nov2023.csv
                    test_hash,perf_var,perf_value,perf_lower_tresh,perf_upper_thres
                    /deadb33f,ns_day,100,95,105
            milan/
                eessi-2023.06-nov2023.csv
        vega/
        hydra/
        snellius/
  • Should we try and set up a meeting with the ReFrame developers on this perf data logging/harvesting idea?
    • Kenneth can contact Vasileios on this via ReFrame Slack


Previous meetings

Clone this wiki locally