Skip to content

(3.9.0‐current) Cluster creation fails on Rocky 9.4

Giacomo Marciani edited this page Sep 24, 2024 · 1 revision

The issue

When using a custom AMI based on Rocky 9.4, the cluster creation will fail because the Munge service fails to start with an error message like the one below:

munged: Error: Keyfile is insecure: group-writable permissions without sticky bit set on "/etc"

This is caused by a known issue in Rocky 9.4 vanilla AMI where the permissions of the folder /etc have been changed from 0755 to 0777.

Affected versions (OSes, schedulers)

All versions of ParallelCluster 3.9.x and later using Rocky 9.4 are impacted.

Mitigation

We strongly suggest to use Rocky 9.3 with an updated kernel until the issue on Rocky 9.4 is solved.

In case you must use Rocky 9.4, you can follow the instructions below to restore the correct /etc permissions. To do so you can follow two alternative approaches:

  1. Change the permissions executing the command sudo chmod 0755 /etc as part of an OnNodeStart custom action executed on all cluster nodes.
  2. Or change the permissions executing the command sudo chmod 0755 /etc on the Rocky 9.4 vanilla AMI and use the resulting AMI as ParentImage in your build-image configuration file.
Clone this wiki locally