Releases: stackhpc/ansible-role-openhpc
v0.17.0
v0.16.0
Fix openhpc_job_maxtime default
What's Changed
In previous releases the default maximum job lifetime set by openhpc_job_maxtime
was intended to be 24 hours and was documented as such. However due to Ansible/Jinja type conversion this became 60 days on the actual running system. This release changes openhpc_job_maxtime
to explicitly be 60 days for backward-compatibility and updates the documentation.
See #136 for full details of the bug.
Full Changelog: v0.13.0...v0.15.0
v0.13.0
v0.12.0
Slurmdbd startup improvements and support for any partition configuration options
What's Changed
- Slurm startup should be more reliable; the slurmdbd retarts now wait for the relevant port to become available, as the systemd unit just returns when the binary starts. (#129)
- Any desired partition parameters can now be set, see
openhpc_slurm_partitions.partition_params
(#130)
New Contributors
Full Changelog: v0.10.0...v0.11.0
Support autoscale
This release enables the definition of nodes not controlled by this role, using a new attribute extra_nodes
on role variable openhpc_slurm_partitions
. This supports e.g. defining autoscaling nodes (using additional logic outside this role).
It also:
- Improves the templating of slurm.conf so that nodes are specified by pattern rather than individually. This will result in much shorter configuration files for large clusters and improve slurmctld startup time.
- Clarifies role variable documentation in the main README.md.
Support Rocky Linux / slurm.conf parameters / slurm state location
New features:
- Add support for Rocky Linux 8.5 (as well as CentOS 7.9). CentOS 8.x is no longer tested.
- Supply additional parameters to
slurm.conf
- see role variableopenhpc_config
. - Specify directory to save Slurm state (e.g. to put this on a persistent volume) - see role variable
openhpc_state_save_location
.
Major fixes:
- Fix errors in logfile due to login node config: #115
Support OpenHPC v2.1
Support for OpenHPC v2.1 (released 6th April 2021).
As packages for OpenHPC v2.0 and v2.1 are provided by the same OpenHPC repos, this role now creates an OpenHPC v2.1 node when using CentOS 8.x (see docs for role var openhpc_release_repo
). However the Slurm versions provided by OpenHPC v2.0 and v2.1 (v20.02.5 and v20.11.3) are not compatible. Therefore all new builds of nodes using CentOS 8.x should use this release of this role, and it will be necessary to upgrade entire clusters at once.
Changes:
- Optional role variable
openhpc_munge_key_path
specifying a path to a munge key has been replaced byopenhpc_munge_key
which specifies its content instead. - Accounting storage now disabled by default, as Slurm 20.11 does not support the previous default storage type
accounting_storage/filetxt
(see docs for role varopenhpc_slurm_accounting_storage_type
). This meanssacct
returns no information. Either setup the slurm database daemon or configure job accounting. The latter is simpler to enable but only captures limited information about job completion (viewable viasacct -c
). - Ensures default job completion logfile is writable.
- Fixes owner/group on slurmdbd configuration file (only an issue for Slurm v20.11.3).
- Adds molecule tests for job accounting and node deletion.
Add support for configless mode and slurmdbd
All changes should be backwards-compatible with v0.6.0. Major enhancements are:
- Role itself now installs appropriate OpenHPC release repo depending on OS version, rather than this being a prerequisite.
- When using OpenHPCv2 adds support for slurm's "configless" mode where
slurm.conf
is present only on the control node (see parameteropenhpc_slurm_configless
). - Can optionally configure
slurmdb
and accounting to provide enhanced accounting/sacct
functionality (see "Accounting" in README). - Flexibility added to support image-based approaches to deployment, e.g. can now configure only a control node or only a compute node, or configure but not start services.
- Cluster munge key can optionally be user-supplied.
- Node
RealMemory
parameter inslurm.conf
now defaults to ansible-derived value instead of 1MB (see parametersram_mb
andopenhpc_ram_multiplier
). - Testing in CI massively expanded (see directory
molecule/
). - Adds
slurm-libpmi-ohpc
package by default to to support use ofsrun
with Intel MPI (see Slurm docs). - Can skip installing the module system (see parameter
openhpc_module_system_install
).