Skip to content

EESSI hackathon list of potential tasks (Nov'21)

Kenneth Hoste edited this page Dec 17, 2021 · 1 revision

[01] Nice overview of EESSI software stack

task lead: ??? people working on this: ...

  • table included in documentation at https://eessi.github.io/docs
  • columns: supported CPU targets
  • rows: software
  • automatically generated based on available modules
  • regularly updated (GitHub Actions workflow?)
  • separate page per EESSI version
  • maybe even dynamic page?

[02] Installing software on top of EESSI

task lead: ??? people working on this: Martin Errenst, ...

  • clear step-by-step documentation on how to install software on top of EESSI
  • attention points:
    • ensure compat layer is used (don't link to host libraries)
    • RPATH linking
  • different use cases:
    • using EasyBuild (--sysroot, --rpath, ...)
    • manually
  • partially to highlights potential points for improvement
    • providing script or module file to automate/hide away some manual/tedious stuff
    • include wrapper scripts with compiler installations?
  • relevant issues:

[03] Workflow to propose additions to EESSI software stack

task lead: ??? (Bob? Kenneth?) people working on this: ... max: 5 people (who are familiar Python)

  • Requirements?
    • EasyBuild support
    • tests
  • Via pull request to EESSI/software-layer repository
    • Format? (easyconfigs, easystack, ...)
  • Semi-automatic building of missing software installations + modules
    • only when requirements are met (tests!)
    • after approval by human reviewer
    • fully autonomous build cycle (no human interaction, no startprefix)
  • Automated ingestion into EESSI repository when PR is merged
    • via EESSI/staging repository
  • relevant issues:

[04] Expand EESSI software stack

task lead: ??? people working on this: ...

  • Look for "interesting" software to include into EESSI
  • Work with software developers interested in EESSI?
  • Evaluate compatibility with different CPU targets (x86_64, aarch64, ppc64le)
  • Test installation in EESSI
    • Collect info on problems that pop up
    • Try to fix them (PRs to EasyBuild)
  • examples:
    • PyTorch
    • NWChem
    • bioinformatics pipelines
  • relevant issues:

[05] GPU support

task lead: ??? people working on this: ...

[06] EESSI test suite

task lead: ??? people working on this: ...

  • kickstart (or expand) EESSI test suite
  • using ReFrame
  • different types: smoke tests, app tests, ...
  • separate repository?
  • cfr. Caspar's PRs

[07] Monitoring

task lead: ??? people working on this: ...

  • start setting up proper monitoring + alerting
  • daily running of smoke tests, send alerts on failures
  • regularly (hourly?) checking on Stratum-1 health
    • still reachable
    • actively synchronising with Stratum-0
    • serving correct revision of EESSI repository
    • outgoing network traffic (bandwidth)
    • latency?
  • performance monitoring
    • requires baseline first
    • single-node, multi-node
  • relevant issues:

[08] Setting up a (private) Stratum-1

task lead: ??? people working on this: ...

  • Evaluate Ansible playbook
  • Review & extend documentation
  • Cluster setup: proxy, caching, node configuration, ...

[09] Risk analysis (continued)

task lead: ??? people working on this: ...

  • Document risks of adopting EESSI
  • Also outline mitigations, etc.

[10] Performance evaluation

task lead: ??? people working on this: ...

  • Comparing running of applications provided by EESSI with other options (conda, containers, system installations, ...)
    • ease of adoption, required effort
    • performance
  • Using EESSI for large-scale runs
    • start-up performance
    • jitter?

[11] Distribute building of software across multiple nodes

task lead: ??? people working on this: ...

[12] EasyBuild issues & PRs related to EESSI

task lead: ??? people working on this: ...

[13] Document resources available to EESSI

task lead: ??? people working on this: ...

[14] More user-friendly documentation on accessing EESSI

task lead: ??? people working on this: @wpoely86, ...

  • using pre-built CernVM-FS packages
  • on different Linux distros
  • with containers vs native
  • with alien cache or direct or using proxy
  • troubleshooting tips
  • different use cases
    • end user on personal workstation or Linux VM => install CernVM-FS for native access
    • end user on HPC system (no admin privileges)
      • => use container
      • also set up alien cache for multi-node jobs
    • HPC support team
      • install CernVM-FS
      • set up private Stratum-1
      • worker node/cache configuration

[15] Set up autoscaling self-hosted GitHub Runners

task lead: ??? people working on this: ...

[16] Export a version of the EESSI stack to a tarball and/or container image

task lead: ??? people working on this: jpecar, ...

  • Develop script(s) to dump an existing EESSI stack into a tarball and/or container image
Clone this wiki locally