Skip to content
/ sw Public

User Software Management on the UL HPC Platform based on RESIF v3 and streamline Easybuild. BEWARE: public export in progress, some files missing until this operation is completed.

Notifications You must be signed in to change notification settings

ULHPC/sw

Repository files navigation

IMPORTANT: Public export toward this repository IN PROGRESS, some files will be missing until this operation is completed and this banner disappears

By ULHPC Licence GitHub issues Github GitHub forks

  _   _ _       _   _ ____   ____   ____  _____ ____ ___ _____   _____  ___
 | | | | |     | | | |  _ \ / ___| |  _ \| ____/ ___|_ _|  ___| |___ / / _ \
 | | | | |     | |_| | |_) | |     | |_) |  _| \___ \| || |_      |_ \| | | |
 | |_| | |___  |  _  |  __/| |___  |  _ <| |___ ___) | ||  _|    ___) | |_| |
  \___/|_____| |_| |_|_|    \____| |_| \_\_____|____/___|_|     |____(_)___/

   Copyright (c) 2020-2021 UL HPC Team <hpc-team@uni.lu>

User Software Management for Uni.lu HPC Facility based on the RESIF 3.0 framework

This is the Public repository exposing the main scripts, concepts and documentation facilitating the dissemination of our concepts.

Accepted paper describing RESIF3 concepts and architecture was presented during ACM PEARC'21 [1] -- doi | orbilu

[1] S. Varrette, E. Kieffer, F. Pinel, E. Krishnasamy, S. Peter, H. Cartiaux, and X. Besseron, "RESIF 3.0: Toward a Flexible & Automated Management of User Software Environment on HPC facility", in ACM Practice and Experience in Advanced Research Computing (PEARC'21), Virtual Event, 2021.

BibTex entry

  ```bibtex
  @InProceedings{VKPKPCB_PEARC21,
    author =       {S. Varrette and E. Kieffer and F. Pinel and E. Krishnasamy and S. Peter and H. Cartiaux and X. Besseron},
    title =        {{RESIF 3.0: Toward a Flexible & Automated Management of User Software Environment on HPC facility}},
    booktitle =    {ACM Practice and Experience in Advanced Research Computing (PEARC'21)},
    year =         {2021},
    publisher =    {Association for Computing Machinery (ACM)},
    OPTseries =    {{PEARC'21}},
    month =        {July},
    address =      {Virtual Event},
    doi =          {10.1145/3437359.3465600},
    isbn =         {978-1-4503-8292-2/21/07},
    url =          {https://pearc.acm.org/pearc21/},
  },
  ```

Installation / Repository Setup

To clone this repository, proceed as follows (adapt accordingly):

$ mkdir -p ~/git/github.com/ULHPC
$ cd ~/git/github.com/ULHPC
$ git clone https://github.com/ULHPC/sw.git

/!\ IMPORTANT: Once cloned and after following Preliminaries guidelines (LMod install, virtualenv setup, Github token etc. -- see docs/setup.md and docs/contributing/setup-github-integration.md), initiate your local copy of the repository by running:

$ cd ~/git/github.com/ULHPC/sw
$ make setup

Post setup checks (laptop)

From that point, you should be able to load Easybuild from your laptop, and all the following commands should succeed:

### On your laptop
source settings/default.sh
eb --version
eb --show-config
# check that you are able to interact/update the ULHPC fork copy
make fork-easyconfigs-update
# check that you can interact with github
eb --check-github   # All checks PASSed!

Post setup checks (on supercomputer login node)

When repeating the setup on the cluster, you can check that you are ready if the following commands should succeed:

### On iris, this will go on a broadwell node
./scripts/get-interactive-job
source settings/${ULHPC_CLUSTER}.sh
# enable SSH agent
eval "$(ssh-agent)"
ssh-add ~/.ssh/id_rsa

eb --version
eb --show-config
# check that you are able to interact/update the ULHPC fork copy
make fork-easyconfigs-update
# check that you can interact with github
eb --check-github   # ONLY new-pr and update-pr should FAIL
# reason is that most probably you don't want the ssh key on the cluster authorized
# to push on ULHPC fork
eval "$(ssh-agent -k)"  # ONLY new-pr and update-pr should FAIL
# reason is that most probably you don't want the ssh key on the cluster authorized
# to push on ULHPC fork

Documentation

See docs/.

The documentation for this project is handled by mkdocs. You might wish to generate locally the docs:

  • Install mkdocs
  • Preview your documentation from the project root by running mkdocs serve and visit with your favorite browser the URL http://localhost:8000
    • Alternatively, you can run make doc at the root of the repository.
  • (eventually) build the full documentation locally (in the site/ directory) by running mkdocs build.

Software set organizations

See docs/swsets.md.

Software sets holds a categorised list of software, defined as Module Bundle for the ULHPC environment holding the dependencies of hierarchical bundles structured under easyconfigs/u as follows:

├── ULHPC/[...]-<version>.eb    #### === Default global bundle for 'regular' nodes ===
│   ├── ULHPC-toolchains/[...]-<version>.eb ### Toolchains, compilers, debuggers, programming languages...
│   ├── ULHPC-bio/[...]-<version>.eb        ### Bioinformatics, biology and biomedical
│   ├── ULHPC-cs/[...]-<version>.eb         ### Computational science, including:
│   └── [...]
└── ULHPC-gpu/[...]-<version>.eb #### === Specific GPU versions compiled under {foss,intel}cuda toolchains ===

See easyconfigs/u/ULHPC*

A strong versioning policy is enforced, which fix the core component versions of the bundles.

User Software builds

Slurm launchers are provided under scripts/ to facilitate software builds.

/!\ IMPORTANT RESIF 3 supports 3 operation modes depicted below:

Operation Mode Architecture Launcher script
Easybuild bootstrap/update * setup.sh
Home/Testing builds default [sbatch] ./scripts/[<version>]/launcher-test-build-cpu.sh
CPU non-default [sbatch] ./scripts/[<version>]/launcher-test-build-cpu-<arch>.sh
GPU optimized [sbatch] ./scripts/[<version>]/launcher-test-build-gpu.sh
Production <version> builds default [sbatch] ./scripts/prod/launcher-resif-prod-build-cpu.sh -v <version>
CPU non-default [sbatch] ./scripts/prod/launcher-resif-prod-build-cpu-<arch>.sh -v <version>
GPU optimized [sbatch] ./scripts/prod/launcher-resif-prod-build-gpu.sh -v <version>

See docs/build.md for more details

Note: for convenience, a GNU screen configuration file config/screenrc is provided to quickly bootstrap the appropriate tabs:

screen -c config/screenrc
# 'SW' tab meant for git / sync operations. To enable the ssh agent:
#    eval "$(ssh-agent)"
#    ssh-add ~/.ssh/id_rsa
#    make up
#    make fork-easyconfigs-update
# 'broadwell' tab for associated build. Ex interactive job:
#    ./scripts/get-interactive-job
# 'skylake' tab for associated build. Ex interactive job:
#    ./scripts/get-interactive-job-skylake
# 'gpu' tab for associated build. Ex interactive job:
#    ./scripts/get-interactive-job-gpu
# 'epyc' tab  for aion builds
#    ssh aion
#    ./scripts/get-interactive-job

Don't forget to kill your ssh agent when you have finish: eval "$(ssh-agent -k)"

In all cases, production builds MUST be run as resif using the launcher scripts under scripts/prod/*. Software and modules will be installed in that case under /opt/apps/resif ($LOCAL_RESIF_ROOT_DIR) -- See Technical Docs. You MUST BE VERY CAREFUL when running these scripts as they alter the production environment.

The final organizations of the software and modules is depicted below:

Workflow and ULHPC Bundle Development guidelines

You first need to review the expected Git workflow

To add a new software to one of the ULHPC bundle module, you need to find and eventually adapt an existing Easyconfig file. Searching such files can be done using either eb -S <pattern>, or via the provided script ./scripts/suggest-easyconfigs [-v <version>] <pattern> which

  1. search for Easyconfigs matching the proposed pattern, sorted by increasing version (sort -V)
  2. check among those easyconfigs is any would be available for the target toolchain as that's probably the one you should use
  3. suggest a single easyconfig to try (most recent version)

Example:

$> ./scripts/suggest-easyconfigs -h
$> ./scripts/suggest-easyconfigs PAPI
=> Searching Easyconfigs matching pattern 'PAPI'
PAPI-5.4.3-foss-2016a.eb
PAPI-5.5.1-GCCcore-6.3.0.eb
PAPI-5.5.1-GCCcore-6.4.0.eb
PAPI-5.6.0-GCCcore-6.4.0.eb
PAPI-5.7.0-GCCcore-7.3.0.eb
PAPI-5.7.0-GCCcore-8.2.0.eb
PAPI-6.0.0-GCCcore-8.3.0.eb
Total:        7 entries

... potential exact match for 2019b toolchain
PAPI-6.0.0-GCCcore-8.3.0.eb
 --> suggesting 'PAPI-6.0.0-GCCcore-8.3.0.eb'

See also docs/workflow.md for more details.

Submitting working Easyconfigs to easybuilders

See docs/contributing/

To limit the explosion of custom easyconfigs as was done in the past, the key objective of this project is to minimize the number of custom easyconfigs to the strict minimum and thus to submit a maximum of easyconfigs to the community for integration in the official easybuilders/easybuild-easyconfigs repository. A set of helper scripts are provided to facilitate this operation -- Typical workflow:

# Creating a new pull requests (typically ion your laptop)
./scripts/PR-create -n easyconfigs/<letter>/<software>/<filename>.eb    # Dry-run
./scripts/PR-create easyconfigs/<letter>/<software>/<filename>.eb
# Complete it with a successfull test report ON IRIS/AION
sbatch ./scripts/PR-rebuild-upload-test-report.sh <ID>

# (eventually) Update/complete the pull-request with new version/additional EB files
eb --update-pr <ID> <file>.eb --pr-commit-msg "<message>" # use native easybuild command here
#  Update your local easyconfigs from remote PR commits
./scripts/update-from-PR [-n] <ID>

# Repo cleanup upon merged pull-request
./scripts/PR-close [-n] <ID>

Issues / Feature request

You can submit bug / issues / feature request using the ULHPC/sw Project Tracker

About

User Software Management on the UL HPC Platform based on RESIF v3 and streamline Easybuild. BEWARE: public export in progress, some files missing until this operation is completed.

Resources

Stars

Watchers

Forks

Packages

No packages published