Skip to content

Sync meeting on EESSI test suite (2024 03 28)

Kenneth Hoste edited this page Mar 28, 2024 · 1 revision

EESSI test suite sync meetings

Planning

  • every 2 weeks on Thursday at 14:00 CE(S)T
  • next meetings:
    • Thu 18th April 2024 14:00 CEST
      • clash with EESSI/AWS sync on ISC'24 for Kenneth (+ Lara?)
      • => rescheduled to 13:00 CEST
    • Thu 2nd May 2024
      • clash with EESSI update meeting
      • => rescheduled to Fri 3 May 2024 13:00 CEST
    • 16 May 2024
      • clash with ISC'24 (for Lara+Kenneth)
      • reschedule to 23 May 2024 14:00
    • 30 May 2024 => cancelled
    • 13 June 2024 => clash with AWS/EESSI sync for Kenneth?

Meeting (2024-03-28)

attending: Caspar, Sam, Lara, Kenneth

  • ESPResSo test

    • Satish + Xin are in touch with Jean-Noël
    • due M18 in MultiXscale
  • Issue on Karolina with daily runs: * Reason: file not found error: [Errno 2] No such file or directory: 'sbatch': 'sbatch'

    • Not sure how I didn't notice, but I'm not sure it ever worked on Karolina...
    • Issue is because cronjob runs in non-login shell
    • Working solution: source /etc/bashrc, i.e.
      0 0 * * * source /etc/bashrc; EESSI_CI_SYSTEM_NAME=it4i_karolina REFRAME_ARGS="--tag 1_node|2_nodes" /home/it4i-casparl/EESSI_CI/CI/run_reframe_wrapper.sh
      0 0 * * SUN source /etc/bashrc; EESSI_CI_SYSTEM_NAME=it4i_karolina REFRAME_ARGS="--tag 1_node|2_nodes|4_nodes|8_nodes|16_nodes" /home/it4i-casparl/EESSI_CI/CI/run_reframe_wrapper.sh
      
    • other option could be to use bash -l -c "..." in cron job
  • First PR from someone not involved in the initial EESSI test suite design (but is a MultiXscale partner): #128

    • A few changes were needed to make it run, but not many: encouraging, shows our hooks generalize reasonable well
    • test for Quantum ESPRESSO, built on top of test that was added to ReFrame hpctestlib (see ReFrame PR #3134)
      • implies that we need a new ReFrame release that includes this
    • Shows need to good documentation on writing a (portable) test
      • incl. info on available hooks, how to use them, etc.
  • Started PyTorch PR #130

    • Started with a torchvision-based test
    • currently relying on $SLURM_* environment variables, can we avoid that somehow?
    • Failing because I need environment variables set - how do I do that in a launcher-aspecific way?
      • => question for ReFrame?
        • see ReFrame Slack
      • => TODO CASPAR open issue on this
    • TODO: make a torch-only basic test (or expand the torchvision test with one predefined model?)
  • Tiny update to Readme on how to test PRs #129

  • Satish: OpenFOAM test

    • Status? (unclear)
    • Lots of single node benchmarks for OpenFOAM are available from https://exafoam.eu/benchmarks
      • deliverable with more details (incl. performance references) should be available soon
      • The 'waterdrop' benchmark is very scalable, you can adjust the grid size. Probably also scalable to multinode.
      • Shows collaboration between MultiXscale and other CoEs, so would be good to use
  • ESPResSo

    • More interaction with Jean-Noel as application expert. Have a meeting after this with Xin, Satish, Caspar to make a start.
  • LAMMPS

    • Added the neccessary files and example scripts to a LAMMPS directory (see Lara's branch)
      • open PR to get easy feedback from others?
  • Test httpjson log handler in ReFrame → (Sam)

  • CP2K (Sam)

    • Status? no updates
  • Create a "How to contribute to the test-suite?" in EESSI documentation.

    • Few examples, one with MPI.
    • Document our hooks
      • API style (cfr. EasyBuild)
    • probably also tutorial on how to implement a portable test
      • starting very simple (single-core hello world?)
      • gradually build up to more complex/specific things
        • multi-core/node (how many cores/nodes?)
        • supporting both CPU vs GPU
    • Who will do this?
      • Kenneth can help, but not before mid May
      • Organise 1h "hacking session" on this via HackMD, incl. Satish => Ask Satish to organize

Next steps?

...


Previous meetings

Clone this wiki locally