Skip to content

(2.11.7 and earlier) Cluster creation fails with awsbatch scheduler

chenwany edited this page Nov 16, 2022 · 3 revisions

The issue

Beginning on 2022-07-28, the creation of a ParallelCluster cluster with the awsbatch scheduler is failing with the following error message:

Cluster creation failed.  Failed events:
  - AWS::CloudFormation::Stack AWSBatchStack Embedded stack arn:aws:cloudformation:eu-south-1:12345678:stack/parallelcluster-clustername-AWSBatchStack-xxx was not successfully created: The following resource(s) failed to create: [ManageDockerImagesFunction, SendBuildNotificationFunction]. 
    - AWS::CloudFormation::Stack parallelcluster-clustername-AWSBatchStack-xxx The following resource(s) failed to create: [ManageDockerImagesFunction, SendBuildNotificationFunction]. 
    - AWS::Lambda::Function ManageDockerImagesFunction Resource handler returned message: "The runtime parameter of python3.6 is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (python3.9) while creating or updating functions. (Service: Lambda, Status Code: 400, Request ID: xxx)" (RequestToken: xxx, HandlerErrorCode: InvalidRequest)

Performing a cluster update where the iam_lambda_role is changed for the awsbatch scheduler will not be supported after 2022-08-28

This is because ParallelCluster 2.11.7 and earlier build awsbatch support with Lambda functions using Python 3.6 runtime. According to the Lambda runtime support policy, starting from July 18, 2022, you will no longer be able to create new Lambda functions using the Python 3.6 runtime. Starting August 17, 2022, you will no longer be able to update existing functions using the Python 3.6 runtime. End of support does not impact function execution.

Affected versions

  • Cluster creation with awsbatch scheduler with ParallelCluster 2.11.7 and earlier
  • Performing a cluster update where the iam_lambda_role is changed for the awsbatch scheduler with ParallelCluster from 2.10.1 to 2.11.7

Mitigation

The recommended solution is to upgrade to the latest version of ParallelCluster 3, which does not contain the issue. If you are not able to upgrade, below is the workaround for ParallelCluster 2.11.7. Add the following variables to the pcluster config file, under the [cluster ...] section:

template_url = https://us-east-1-aws-parallelcluster.s3.amazonaws.com/patches/2.11.7/aws-parallelcluster-2.11.7.cfn.json

If you want to update an existing cluster, do not change iam_lambda_role, as it triggers the update of the lambda function and will fail to update.

Note: for 'aws-us-gov' or 'aws-cn' partitions, download the above template and upload it into a bucket of the partition in use. Then configure the 'template_url' to point at the uploaded object URL.

The issue is fixed in ParallelCluster 2.11.8

Clone this wiki locally