EESSI hackathon list of potential tasks (Nov'21)

[01] Nice overview of EESSI software stack

task lead: ??? people working on this: ...

table included in documentation at https://eessi.github.io/docs
columns: supported CPU targets
rows: software
automatically generated based on available modules
regularly updated (GitHub Actions workflow?)
separate page per EESSI version
maybe even dynamic page?

[02] Installing software on top of EESSI

task lead: ??? people working on this: Martin Errenst, ...

clear step-by-step documentation on how to install software on top of EESSI
attention points:
- ensure compat layer is used (don't link to host libraries)
- RPATH linking
different use cases:
- using EasyBuild (--sysroot, --rpath, ...)
- manually
partially to highlights potential points for improvement
- providing script or module file to automate/hide away some manual/tedious stuff
- include wrapper scripts with compiler installations?
relevant issues:
- https://github.com/EESSI/software-layer/issues/59

[03] Workflow to propose additions to EESSI software stack

task lead: ??? (Bob? Kenneth?) people working on this: ... max: 5 people (who are familiar Python)

Requirements?
- EasyBuild support
- tests
Via pull request to EESSI/software-layer repository
- Format? (easyconfigs, easystack, ...)
Semi-automatic building of missing software installations + modules
- only when requirements are met (tests!)
- after approval by human reviewer
- fully autonomous build cycle (no human interaction, no startprefix)
Automated ingestion into EESSI repository when PR is merged
- via EESSI/staging repository
relevant issues:
- https://github.com/EESSI/docs/issues/72

[04] Expand EESSI software stack

task lead: ??? people working on this: ...

Look for "interesting" software to include into EESSI
Work with software developers interested in EESSI?
Evaluate compatibility with different CPU targets (x86_64, aarch64, ppc64le)
Test installation in EESSI
- Collect info on problems that pop up
- Try to fix them (PRs to EasyBuild)
examples:
- PyTorch
- NWChem
- bioinformatics pipelines
relevant issues:
- https://github.com/EESSI/software-layer/issues/44

[05] GPU support

task lead: ??? people working on this: ...

Evaluate options for providing GPU software
- Without including CUDA itself in EESSI (for now)
- Consider AMD GPUs here (or in a separate task?)
TensorFlow, GROMACS
relevant issues:
- https://github.com/EESSI/compatibility-layer/issues/71

[06] EESSI test suite

task lead: ??? people working on this: ...

kickstart (or expand) EESSI test suite
using ReFrame
different types: smoke tests, app tests, ...
separate repository?
cfr. Caspar's PRs

[07] Monitoring

task lead: ??? people working on this: ...

start setting up proper monitoring + alerting
daily running of smoke tests, send alerts on failures
regularly (hourly?) checking on Stratum-1 health
- still reachable
- actively synchronising with Stratum-0
- serving correct revision of EESSI repository
- outgoing network traffic (bandwidth)
- latency?
performance monitoring
- requires baseline first
- single-node, multi-node
relevant issues:
- https://github.com/EESSI/filesystem-layer/issues/67

[08] Setting up a (private) Stratum-1

task lead: ??? people working on this: ...

Evaluate Ansible playbook
Review & extend documentation
Cluster setup: proxy, caching, node configuration, ...

[09] Risk analysis (continued)

task lead: ??? people working on this: ...

Document risks of adopting EESSI
Also outline mitigations, etc.

[10] Performance evaluation

task lead: ??? people working on this: ...

Comparing running of applications provided by EESSI with other options (conda, containers, system installations, ...)
- ease of adoption, required effort
- performance
Using EESSI for large-scale runs
- start-up performance
- jitter?

[11] Distribute building of software across multiple nodes

task lead: ??? people working on this: ...

see https://github.com/EESSI/software-layer/issues/144

[12] EasyBuild issues & PRs related to EESSI

task lead: ??? people working on this: ...

see:

[13] Document resources available to EESSI

task lead: ??? people working on this: ...

including:
- AWS
- Azure
- Fenix
- ...
how to access, who to contact, ...
see https://github.com/EESSI/docs/issues/61

[14] More user-friendly documentation on accessing EESSI

task lead: ??? people working on this: @wpoely86, ...

using pre-built CernVM-FS packages
on different Linux distros
with containers vs native
with alien cache or direct or using proxy
troubleshooting tips
different use cases
- end user on personal workstation or Linux VM => install CernVM-FS for native access
- end user on HPC system (no admin privileges)
  - => use container
  - also set up alien cache for multi-node jobs
- HPC support team
  - install CernVM-FS
  - set up private Stratum-1
  - worker node/cache configuration

[15] Set up autoscaling self-hosted GitHub Runners

task lead: ??? people working on this: ...

Automatically start/stop GitHub Runners based on the available workload, see:
- https://docs.github.com/en/actions/hosting-your-own-runners/autoscaling-with-self-hosted-runners
Can be done with VMs using Terraform:
- https://dev.to/npalm/scaling-github-action-runners-on-aws-spot-instances-3j5l
Or, using an autoscaling Kubernetes cluster:
- https://github.com/evryfs/github-actions-runner-operator/
- https://github.com/actions-runner-controller/actions-runner-controller#installation
Other useful links:

[16] Export a version of the EESSI stack to a tarball and/or container image

task lead: ??? people working on this: jpecar, ...

Develop script(s) to dump an existing EESSI stack into a tarball and/or container image
- See: https://github.com/EESSI/filesystem-layer/issues/102
- One tarball/image per microarchitecture, including the required compatability layer
- A variant symlink in the CVMFS repository will allow users to point to the location where they extracted the tarball

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EESSI hackathon list of potential tasks (Nov'21)

[01] Nice overview of EESSI software stack

[02] Installing software on top of EESSI

[03] Workflow to propose additions to EESSI software stack

[04] Expand EESSI software stack

[05] GPU support

[06] EESSI test suite

[07] Monitoring

[08] Setting up a (private) Stratum-1

[09] Risk analysis (continued)

[10] Performance evaluation

[11] Distribute building of software across multiple nodes

[12] EasyBuild issues & PRs related to EESSI

[13] Document resources available to EESSI

[14] More user-friendly documentation on accessing EESSI

[15] Set up autoscaling self-hosted GitHub Runners

[16] Export a version of the EESSI stack to a tarball and/or container image

Clone this wiki locally