-
Notifications
You must be signed in to change notification settings - Fork 0
EESSI hackathon list of potential tasks (Nov'21)
task lead: ??? people working on this: ...
- table included in documentation at https://eessi.github.io/docs
- columns: supported CPU targets
- rows: software
- automatically generated based on available modules
- regularly updated (GitHub Actions workflow?)
- separate page per EESSI version
- maybe even dynamic page?
task lead: ??? people working on this: Martin Errenst, ...
- clear step-by-step documentation on how to install software on top of EESSI
- attention points:
- ensure compat layer is used (don't link to host libraries)
- RPATH linking
- different use cases:
- using EasyBuild (
--sysroot
,--rpath
, ...) - manually
- using EasyBuild (
- partially to highlights potential points for improvement
- providing script or module file to automate/hide away some manual/tedious stuff
- include wrapper scripts with compiler installations?
- relevant issues:
task lead: ??? (Bob? Kenneth?) people working on this: ... max: 5 people (who are familiar Python)
- Requirements?
- EasyBuild support
- tests
- Via pull request to
EESSI/software-layer
repository- Format? (easyconfigs, easystack, ...)
- Semi-automatic building of missing software installations + modules
- only when requirements are met (tests!)
- after approval by human reviewer
- fully autonomous build cycle (no human interaction, no
startprefix
)
- Automated ingestion into EESSI repository when PR is merged
- via
EESSI/staging
repository
- via
- relevant issues:
task lead: ??? people working on this: ...
- Look for "interesting" software to include into EESSI
- Work with software developers interested in EESSI?
- Evaluate compatibility with different CPU targets (x86_64, aarch64, ppc64le)
- Test installation in EESSI
- Collect info on problems that pop up
- Try to fix them (PRs to EasyBuild)
- examples:
- PyTorch
- NWChem
- bioinformatics pipelines
- relevant issues:
task lead: ??? people working on this: ...
- Evaluate options for providing GPU software
- Without including CUDA itself in EESSI (for now)
- Consider AMD GPUs here (or in a separate task?)
- TensorFlow, GROMACS
- relevant issues:
task lead: ??? people working on this: ...
- kickstart (or expand) EESSI test suite
- using ReFrame
- different types: smoke tests, app tests, ...
- separate repository?
- cfr. Caspar's PRs
task lead: ??? people working on this: ...
- start setting up proper monitoring + alerting
- daily running of smoke tests, send alerts on failures
- regularly (hourly?) checking on Stratum-1 health
- still reachable
- actively synchronising with Stratum-0
- serving correct revision of EESSI repository
- outgoing network traffic (bandwidth)
- latency?
- performance monitoring
- requires baseline first
- single-node, multi-node
- relevant issues:
task lead: ??? people working on this: ...
- Evaluate Ansible playbook
- Review & extend documentation
- Cluster setup: proxy, caching, node configuration, ...
task lead: ??? people working on this: ...
- Document risks of adopting EESSI
- Also outline mitigations, etc.
task lead: ??? people working on this: ...
- Comparing running of applications provided by EESSI with other options (conda, containers, system installations, ...)
- ease of adoption, required effort
- performance
- Using EESSI for large-scale runs
- start-up performance
- jitter?
task lead: ??? people working on this: ...
task lead: ??? people working on this: ...
- see:
task lead: ??? people working on this: ...
- including:
- AWS
- Azure
- Fenix
- ...
- how to access, who to contact, ...
- see https://github.com/EESSI/docs/issues/61
task lead: ???
people working on this: @wpoely86
, ...
- using pre-built CernVM-FS packages
- on different Linux distros
- with containers vs native
- with alien cache or direct or using proxy
- troubleshooting tips
- different use cases
- end user on personal workstation or Linux VM => install CernVM-FS for native access
- end user on HPC system (no admin privileges)
- => use container
- also set up alien cache for multi-node jobs
- HPC support team
- install CernVM-FS
- set up private Stratum-1
- worker node/cache configuration
task lead: ??? people working on this: ...
- Automatically start/stop GitHub Runners based on the available workload, see:
- Can be done with VMs using Terraform:
- Or, using an autoscaling Kubernetes cluster:
- Other useful links:
task lead: ??? people working on this: jpecar, ...
- Develop script(s) to dump an existing EESSI stack into a tarball and/or container image
- See: https://github.com/EESSI/filesystem-layer/issues/102
- One tarball/image per microarchitecture, including the required compatability layer
- A variant symlink in the CVMFS repository will allow users to point to the location where they extracted the tarball