Skip to content

(3.3.0 3.4.1) Custom AMI creation fails on Ubuntu 20.04 during MySQL packages installation

Enrico Usai edited this page Jan 30, 2023 · 2 revisions

The issue

The process of creating a custom AMI, by using the the pcluster build-image command, fails in ParallelCluster 3.3.0 - 3.4.1, when using an Ubuntu 20.04 x86_64 as ParentImage in the image build configuration file.

The pcluster build-image command will fail after about 1 hour (see BUILD_FAILED as imageBuildStatus).

$ pcluster build-image -c image-config.yaml -i ubuntu20

$ pcluster describe-image -i ubuntu20
{
  ...
  "imageId": "ubuntu20",
  "imagebuilderImageStatus": "FAILED",
  "imagebuilderImageStatusReason": "SSM execution 'e4c5e892-35e3-43bb-be52-d367e708a806' failed for image arn: 'arn:aws:imagebuilder:us-east-1:xxx:image/parallelclusterimage-ubuntu20/3.1.4/1' with status = 'Failed' in state = 'BUILDING' and failure message = 'Document arn:aws:imagebuilder:us-east-1:xxx:component/parallelclusterimage-9d99aa00-9e27-11ed-ad2d-0e24e64524eb/3.1.4/1 failed!'",
  "imageBuildStatus": "BUILD_FAILED",
  "cloudformationStackStatus": "CREATE_FAILED",
  ...

The reason is that the installation of the MySQL packages, during the execution of the install_mysql_client recipe, part of the official ParallelCluster cookbook, is failing with the following error message:

#20 598.8 The following packages will be upgraded:
#20 598.8   mysql-common
#20 598.8 The following packages will be DOWNGRADED:
#20 598.8   libmysqlclient-dev libmysqlclient21
#20 598.8 1 upgraded, 1 newly installed, 2 downgraded, 0 to remove and 0 not upgraded.
#20 598.8 STDERR: WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
#20 598.8 
#20 598.8 E: Packages were downgraded and -y was used without --allow-downgrades.

The custom AMI creation process started to fail on January 25th, because an updated version of MySQL packages (8.0.32) has been released in the Ubuntu OS repository. This package is installed in the system by the initial phases of the build AMI process and then the install_mysql_client code tries to install the old 8.0.31 version on top of the updated version, causing the installation steps to fail.

Affected versions

  • ParallelCluster versions released after the introduction of the Slurm Accounting feature are affected: 3.3.0 - 3.4.1.
  • Affected OS is Ubuntu 20.04 x86_64.

Mitigation

We prepared a patched cookbook and uploaded it in a public bucket. To fix the build image process you need to use it in the image configuration file, in the DevSettings / Cookbook parameter.

Note that the patched cookbook is compatible only with the latest version of ParallelCluster: 3.4.1. If you’re using an old version we recommend you to move to the latest one.

Create a build image configuration file as the following and use it as input for the build-image command.

Image:
  Name: <your_image_name>
Build:
  InstanceType: <your_instance_type> 
  ParentImage: <your_parent_image_id>
DevSettings:
  Cookbook:
    ChefCookbook: https://us-east-1-aws-parallelcluster.s3.amazonaws.com/patches/3.4.1/aws-parallelcluster-cookbook-3.4.1.tgz

The pcluster build-image command will work as expected and after about 1 hour the created image will be available as expected.

$ pcluster build-image -c patched-image-config.yaml -i ubuntu20-working

# After about 1h the image will be created and can be described successfully
$ pcluster describe-image -i ubuntu20-working
{
  ...
  "imageId": "ubuntu20-working",
  "imageBuildStatus": "BUILD_COMPLETE",
  ...

Error details

When using the default cookbook, it is possible to see the details of the failure by looking at the logs:

$ pcluster list-image-log-streams -i ubuntu20 -r us-east-1
{
  "logStreams": [
    {
      "logStreamArn": "arn:aws:logs:us-east-1:xxx:log-group:/aws/imagebuilder/ParallelClusterImage-ubuntu20:log-stream:3.1.4/1",
      "logStreamName": "3.4.1/1",
      ...
    }
  ]
}

$ pcluster get-image-log-events -i ubuntu20 --log-stream-name 3.4.1/1 --limit 15
{
  ...
  "events": [
    {
      "message": "Stdout: The following packages will be DOWNGRADED:",
      "timestamp": "2023-01-27T11:20:16.887Z"
    },
    {
      "message": "Stdout:   libmysqlclient-dev libmysqlclient21",
      "timestamp": "2023-01-27T11:20:16.887Z"
    },
    {
      "message": "Stdout: 1 upgraded, 1 newly installed, 2 downgraded, 0 to remove and 0 not upgraded.",
      "timestamp": "2023-01-27T11:20:16.887Z"
    },
    ...
    {
      "message": "Stdout: E: Packages were downgraded and -y was used without --allow-downgrades.",
      "timestamp": "2023-01-27T11:20:16.887Z"
    },
    ...
    {
      "message": "TOE has completed execution with failure - Execution failed!",
      "timestamp": "2023-01-27T11:20:17.984Z"
    }
  ]
}

The install_mysql_client recipe tries to install the 8.0.31 version on top of an updated version, without using the --allow-downgrades flag. In this case apt throws an exit code 100. In the recipe we are calling apt install -y from a bash block, with set -e configured, which causes the whole block to fail if any command returns a non-zero status code.

This wasn’t an issue in past because the version installed in the system, coming from OS repository, was aligned with the version installed by the install_mysql_client recipe. On Jan, 25 a new version of MySQL was released in the OS repositories (8.0.32) causing the build image process to fail.

Clone this wiki locally