Skip to content

(3.0.0 3.1.4) ParallelCluster API Stack Upgrade Fails for ECR resources

eantonin edited this page Nov 16, 2022 · 2 revisions

The issue

The ParallelCluster API version being used can be updated using the instructions in the ParallelCluster API Docs.

Upgrading the CloudFormation stack for the ParallelCluster API from one version to another (within the versions in reported range) is going to fail with the following error messages for these reported resource IDs:

  • EcrImagesRemover:
UPDATE_FAILED:
Received response status [FAILED] from custom resource. 
Message returned: See the details in CloudWatch Log Stream: ...
  • EcrImage:
UPDATE_FAILED:  Resource creation cancelled

This is preventing customers from upgrading their ParallelCluster API stacks from one version to another.

The API Stack update failure occurs because the EcrImageDeletionLambda IAM Role also has a policy that only matches an ImagePipeline resource found in the current API Stack. This makes it unable to use the ImagePipeline resource of the new API stack.

Affected versions

  • ParallelCluster API: 3.0.0 to 3.1.5

Mitigation

Option 1: Recreate the ParallelCluster API Stack (Recommended)

Remove the existing API by deleting the corresponding AWS CloudFormation stack and deploy the new API as shown in the ParallelCluster API Docs.

Option 2: Modifying policies used by ImageBuilderInstanceRole and EcrImageDeletionLambdaRole

The deployment of the ParallelCluster API stack creates an IAM role with a name matching -EcrImageDeletionLambda- .

In order to fix the update process you need to modify the IAM role:

  • From the IAM console, identify the IAM role with name matching -EcrImageDeletionLambda-
    • Attach this inline-policy to the role
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "imagebuilder:ListImagePipelineImages",
            "Resource": "arn:aws:imagebuilder:<REGION>:<AWS_ACCOUNT_ID>:image-pipeline/ecrimagepipeline-*",
            "Effect": "Allow"
        }
    ]
}

<AWS_ACCOUNT_ID>: Id of the AWS account where the ParallelCluster API is running

<REGION>: Region where the ParallelCluster API is running

REGION=<region>
API_STACK_NAME=<stack-name> # This needs to correspond to the existing API stack name
VERSION=<target_version>

aws cloudformation update-stack \
    --region ${REGION} \
    --stack-name ${API_STACK_NAME} \  
    --template-url https://${REGION}-aws-parallelcluster.s3.${REGION}.amazonaws.com/parallelcluster/${VERSION}/api/parallelcluster-api.yaml \
    --capabilities CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND
Clone this wiki locally