-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting on EESSI test suite (2024 02 01)
- every 2 weeks on Thursday at 14:00 CE(S)T
- next meetings:
- Thu 1 Feb'24 14:00 CET
- Thu 15 Feb'24 14:00 CET
- Thu 29 Feb'24 14:00 CET
- Thu 14 Mar'24 14:00 CET
-
test-suite on Hortense
- Issue 2970 was opened in ReFrame 24 Aug by @vkarak.
- Will be addressed by @boegel and Lara.
- Linked to the following issue 68 in the test-suite.
- Lara submits for every partition but will be discussed further.
- Lara will make more clear in this issue that passing
--partition
to sbatch on command line actually solved our issue, and that we request them if ReFrame can do this (based on the--access
configuration) and/or if it could be configured separately
- Issue 2970 was opened in ReFrame 24 Aug by @vkarak.
-
OSU tests
- It's merged!
- Three updates to be made:
- LD_LIBRARY_PATH removed (see below)[Satish]
- Docs[Lara]
- Changing name of the test: OSU_Microbenchmark_EESSI[Kenneth], TensorFlow_EESSI[Kenneth] => https://github.com/EESSI/test-suite/pull/108
- Three updates to be made:
- It's merged!
-
CUDA modules on pure CPU partitions #101
- Isn't that test specific? GROMACS works fine, because it does a dlopen in the code path. Executables that are dynamically linked to CUDA can not be run on non-GPU nodes.
- Yes, but the easiest is just to say 'we never use CUDA modules on pure CPU nodes'. And its a very clear rule.
- Satish will implement and roll back the
LD_LIBRARY_PATH
in OSU test
- Satish will implement and roll back the
-
bot now picks up on
bot/test.sh
andbot/check-test.sh
script in target repo- How do we proceed? Who?
- We will just run TF & OSU, 1 node (or 2 core in case of OSU pt to pt), irrespective of which software was installed
- Next step will be to filter on relevant tests that are related to the actual change in the software-layer PR.
- Caspar will have a look at this
- How do we proceed? Who?
-
Test suite doc improvements from xin here
- Needs another review? => Lara will have another look
-
Filter out incompatible scales #100
- Good idea. Any idea how? Who can do it?
- Caspar will take a stab at this
-
Discuss ReFrame meeting yesterday
- Two options:
- Option 1: We use the perflog mechanism that's already there, and add a field to indicate if the result should be used as reference.
- Challenge: what if you upgrade your system? You'll have to alter the field that indicates if results are used as reference and put all those to
- Option 2: Have ReFrame export/add performance numbers from a run to a database (e.g. passing
reframe --export-references=<my_sql_database>
), together with the test hash + system + partition. Then, have ReFrame read those performance number (or an average) from a query on that sql database (reframe --use-reference=<my_sql_database>
)
- Option 1: We use the perflog mechanism that's already there, and add a field to indicate if the result should be used as reference.
- Two options:
-
Kenneth will create an issue to update the common_config so that it picks up on
EESSI_CVMFS_REPO
to select the right repository (based on the current environment) -
Satish is working on OpenFOAM + ESPResSo
- Have a look at the example for fixtures, it also contains examples of how to reuse the stage dir in the dependent tests
-
Sam will look into the httpjson perflog handler
- docs see logging.handlers_perflog.type¶
TODO before 0.2.0 release:
- common_config update to use EESSI_CVMFS_REPO, so that it can be used with
software.eessi.io
- GROMACS PR to software-layer
- OSU test LD_LIBRARY_PATH removed and implement hook for filtering CUDA module based tests on pure CPU nodes
-
test-suite on Hortense
-
Merged #96 which adds
--mem
to configuration files- This was done for each partition and can be done commonly for all partition.
- Caspar: Agreed, should be in some common config. Who will do it? How?
- Maybe a
options: eessi.testsuite.common_config.get_common_options()
would be enough?
- Maybe a
- Caspar: Agreed, should be in some common config. Who will do it? How?
- This was done for each partition and can be done commonly for all partition.
-
OSU tests
- Sam reviewed, comments need to be checked by Satish
- https://github.com/EESSI/test-suite/pull/54#discussion_r1451741808 .
- Running CUDA modules on the pure CPU nodes using stubs:
- Currently, CUDA module generating pure
cpu
test will fail oncpu
nodes. - Currently, remove the
cpu
tests from CUDA modules. - GROMACS CUDA module runs on CPU devices without complaining where as OSU crashes.?
- Should we not allow running CUDA modules on CPU nodes at all?
- Currently not a blocker, but open an issue.
- Currently, CUDA module generating pure
- 32 GB of memory for point to point tests is too much.
- Contact OSU for checking this and also better error reporting.
- Currently not a blocker, but open an issue.
- Play with this option: -M, --mem-limit SIZE set per process maximum memory consumption to SIZE bytes
- Install CUDA OSU module, talk to Snellius system admins and get an update on Caspar's request.
- Lara tested on Hortense CPU, had issues on GPU but those seemed not specific to OSU.
- Merge now including collectives and figure out the problems later.
- Hand the test-suite to other partners.
- Sam reviewed, comments need to be checked by Satish
-
bot now picks up on
bot/test.sh
andbot/check-test.sh
script in target repo- currently as part of the build phase, in build environment
- bot is ready but not doing anything for now: OSU and TensorFlow good candidates.
- GROMACS tests have been failing.
-
Xin tested docs to see if it was clear how to run (tested on Snellius)
- Some issues, you have to be quite careful in what you do
- Xin will create issue / PR with suggestions for improvement
- PR is opened and WIP.
-
MultiXscale deliverable finished and is online.
-
goals for next weeks
- Sam/Satish: finish OSU PR
- Sam
- CUDA samples
- maybe port over test from VUB test suite to EESSI test suite
- Kenneth:
- maybe look into GROMACS CI test
- Xin:
- docs
- Espresso test
- Satish
- Espresso test along with Xin.
- OpenFOAM test.(https://github.com/eessi/eessi-demo)
- fix GROMACS CI test when there's too many cores
- skip if there's too many cores available per node
- print message that there's too many cores available, give useful suggestion
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2024-01-18)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-12-06)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-11-22)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-11-08)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-10-19)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-10-04)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-09-20)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-09-06)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-08-25)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-08-09)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-software-testing-(27%E2%80%9007%E2%80%902023)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-06-28)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-06-15)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-05-31)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-05-17)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-04-20)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-03-30)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-03-10) (incl. 2023-02-23)
- https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-test-suite-(2023-02-09)