Skip to content

AWS ParallelCluster v2.9.0

Compare
Choose a tag to compare
@demartinofra demartinofra released this 11 Sep 18:52
· 10 commits to release-2.9 since this release

We're excited to announce the release of AWS ParallelCluster 2.9.0.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add support for multiple queues and multiple instance types feature with the Slurm scheduler.
  • Extend NICE DCV support to ARM instances.
  • Extend support to disable hyperthreading on instances (like *.metal) that don't support CpuOptions in LaunchTemplate.
  • Enable support for NFS 4 for the filesystems shared from the head node.
  • Add CLI utility to convert configuration files with Slurm scheduler to new format to support multiple queues configuration.
  • Add script wrapper to support Torque-like commands with the Slurm scheduler.
  • Remove dependency on cfn-init in compute nodes bootstrap in order to avoid throttling and delays caused by CloudFormation when a large number of compute nodes join the cluster.

CHANGES

  • Introduce new configuration sections and parameters to support multiple queues and multiple instance types.
  • Optimize scaling logic with Slurm scheduler, no longer based on Auto Scaling groups.
  • A Route53 private hosted zone is now created together with the cluster and used in DNS resolution inside cluster nodes when using Slurm scheduler.
  • Upgrade EFA installer to version 1.9.5:
    • EFA configuration: efa-config-1.4 (from efa-config-1.3)
    • EFA profile: efa-profile-1.0.0
    • EFA kernel module: efa-1.6.0 (no change)
    • RDMA core: rdma-core-28.amzn0 (no change)
    • Libfabric: libfabric-1.10.1amazon1.1 (no change)
    • Open MPI: openmpi40-aws-4.0.3 (no change)
  • Upgrade Slurm to version 20.02.4.
  • Apply the following changes to Slurm configuration:
    • Assign a range of 10 ports to Slurmctld in order to better perform with large cluster settings
    • Configure cloud scheduling logic
    • Set ReconfigFlags=KeepPartState
    • Set MessageTimeout=60
    • Set TaskPlugin=task/affinity,task/cgroup together with TaskAffinity=no and ConstrainCores=yes in cgroup.conf
  • Upgrade NICE DCV to version 2020.1-9012.
  • Use private IP instead of master node hostname when mounting shared NFS drives.
  • Add new log streams to CloudWatch: chef-client, clustermgtd, computemgtd, slurm_resume, slurm_suspend.
  • Add support for queue names in pre/post install scripts.
  • Use PAY_PER_REQUEST billing mode for DynamoDb table in govcloud regions.

BUG FIXES

  • Solve dpkg lock issue with Ubuntu that prevented custom AMI creation in some cases.
  • Add/improve sanity checks for some configuration parameters.
  • Prevent ignored changes from being reported in pcluster update output.
  • Fix incompatibility issues with python 2.7 for pcluster update.
  • Fix SNS Topic Subscriptions not being deleted with cluster's CloudFormation stack.

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192