Skip to content

(3.0.0 3.1.3) build image creates invalid images when using aws cdk.aws imagebuilder 1.153.0

Enrico Usai edited this page Apr 25, 2022 · 2 revisions

The issue

ParallelCluster depends on the aws-cdk Python library. It is installed automatically when installing the aws-parallelcluster package from PyPI.

The latest version of aws-cdk (1.153.0) changed its internal behaviour in the aws-cdk.aws-imagebuilder library, causing the build image process, executed by the pcluster build-image command, to create invalid images.

You can verify if you are affected by this issue by checking the installed version of the aws-imagebuilder library in the environment on which you have the ParallelCluster CLI installed using the following command:

$ pip freeze | grep imagebuilder
aws-cdk.aws-imagebuilder==1.153.0

When using the 1.153.0 version of the library, the pcluster build-image execution doesn’t return any error, the AMI is created but the image won’t be available in the list of ParallelCluster images, because internal information are missing (e.g. tags for ParallelCluster version, image id, etc).

This means you won’t see the image in the output of the pcluster describe-image and pcluster list-images commands and you cannot delete it with pcluster-delete command.

$ pcluster build-image -c image-config.yaml -i test
{
  "image": {
    "imageId": "test",
    "imageBuildStatus": "BUILD_IN_PROGRESS",
    "cloudformationStackStatus": "CREATE_IN_PROGRESS",
    "cloudformationStackArn": "arn:aws:cloudformation:eu-west-1:xxx:stack/test/e5d0c250-c232-11ec-bc7a-0661d2293f0d",
    "region": "eu-west-1",
    "version": "3.1.3"
  }
}

# After about 1h the CFN stack will be deleted but the image will not be available
$ pcluster describe-image -i test
{
  "message": "No image or stack associated with ParallelCluster image id: test."
}

Note: If you created any AMI using this command with the mentioned version of the aws-imagebuilder library, you need to manually delete them from the AWS console because the delete-image won’t be able to find and delete them.

Affected versions (OSes, schedulers)

All versions of ParallelCluster >= 3.0.0 are affected when using aws-cdk.aws-imagebuilder==1.153.

Mitigation

All the versions of aws-cdk.aws-imagebuilder library < 1.153.0 or >= 1.153.1 work as expected. You can fix the issue by downgrading all the aws-cdk python libraries or upgrading all of them by executing the following steps:

  1. Double check that the installed version of CDK image builder library matches the erroneous version (i.e. 1.153.0)
$ pip freeze | grep imagebuilder
aws-cdk.aws-imagebuilder==1.153.0
  1. Create a new requirements.txt file with the following content:
aws-cdk.core>=1.153.1
aws-cdk.aws-batch>=1.153.1
aws_cdk.aws-cloudwatch>=1.153.1
aws-cdk.aws-codebuild>=1.153.1
aws-cdk.aws-dynamodb>=1.153.1
aws-cdk.aws-ec2>=1.153.1
aws-cdk.aws-efs>=1.153.1
aws-cdk.aws-events>=1.153.1
aws-cdk.aws-fsx>=1.153.1
aws-cdk.aws-imagebuilder>=1.153.1
aws-cdk.aws-iam>=1.153.1
aws_cdk.aws-lambda>=1.153.1
aws-cdk.aws-logs>=1.153.1
aws-cdk.aws-route53>=1.153.1
aws-cdk.aws-ssm>=1.153.1
aws-cdk.aws-sqs>=1.153.1
aws-cdk.aws-cloudformation>=1.153.1
  1. Install the aws-cdk library with the new requirements file:
pip install -r requirements.txt
  1. Verify that the next version of the library is correctly installed.
$ pip freeze | grep imagebuilder
aws-cdk.aws-imagebuilder==1.153.1

Now you may use use the pcluster build-image command and the created image will be available as expected:

$ pcluster build-image -c image-config.yaml -i test-working

# After about 1h the image will be created and can be described succeessfully
$ pcluster describe-image -i test-working
{
  "imageConfiguration": {
    "url": "..."
  },
  "imageId": "test-working",
  "creationTime": "2022-04-22T18:52:26.000Z",
  "imageBuildStatus": "BUILD_COMPLETE",
  "region": "eu-west-1",
  ...

Error details

The CDK library is used by the build-image command to generate the CloudFormation template with all the resources required for the build process. When using aws-cdk.aws-imagebuilder==1.153.0 the AmiDistributionConfiguration field created by the library in the generated template is an empty dictionary, while it should contain important information like the tags that are used by the ParallelCluster CLI commands to manage the created image as seen below:

  ...
  DistributionConfiguration:
    DependsOn:
    - DeleteStackFunctionExecutionRole
    Properties:
      Distributions:
      - AmiDistributionConfiguration: {}
        Region:
          Ref: AWS::Region
      ...

To check if the template generated by the build-image process is problematic, retrieve it with AWS CLI. If the AmiDistributionConfiguration is an empty dict, the build process will create broken images.

$ pcluster build-image -c image-config.yaml -i test

$ aws cloudformation get-template --stack-name test --output text | grep AmiDistributionConfiguration -A 3
      - AmiDistributionConfiguration: {}
        Region:
          Ref: AWS::Region

In a working version, the AmiDistributionConfiguration contains a list of tags and the name of the image:

$ aws cloudformation get-template --stack-name test-working --output text | grep AmiDistributionConfiguration -A 18
      - AmiDistributionConfiguration:
          AmiTags:
            parallelcluster:build_config: s3://parallelcluster-xxx-v1-do-not-delete/parallelcluster/3.1.3/images/test-working-0iwk3tixcvrltuop/configs/image-config.yaml
            parallelcluster:build_log:
              Fn::Join:
              - ''
              - - 'arn:'
                - Ref: AWS::Partition
                - ':logs:eu-west-1:'
                - Ref: AWS::AccountId
                - :log-group:/aws/imagebuilder/ParallelClusterImage-test-working
            parallelcluster:image_id: test-working
            parallelcluster:image_name: test-alinux2
            parallelcluster:s3_bucket: parallelcluster-xxx-v1-do-not-delete
            parallelcluster:s3_image_dir: parallelcluster/3.1.3/images/test-working-0iwk3tixcvrltuop
            parallelcluster:version: 3.1.3
          Name: test-alinux2 {{ imagebuilder:buildDate }}
        Region:
          Ref: AWS::Region
Clone this wiki locally