-
Notifications
You must be signed in to change notification settings - Fork 312
(3.0.0 3.1.2) build image stack deletion failed after image successfully created
Starting from 2021-03-03 the build-image
command may fail during stack deletion after image created successfully.
From the result of pcluster describe-image --image-id <your-image-id>
, imageBuildStatus
is in BUILD_COMPLETE
state and image state
is AVAILABLE
, but the CloudFormation stack deletion failed with the following error messages:
Resource handler returned message:
"User: arn:aws:sts::1234567:assumed-role/ParallelClusterImageCleanup-xxxx-xxxx-xxxx/ParallelClusterImage-xxxxx is not authorized to perform:
imagebuilder:GetImage on resource: arn:aws:imagebuilder:us-east-1:1234567:image/parallelclusterimage-build-image/3.1.2/1
(Service: Imagebuilder, Status Code: 403, Request ID: xxx)"
(RequestToken: xxxxx, HandlerErrorCode: GeneralServiceException)
Resource handler returned message:
"User: arn:aws:sts::1234567:xxxx is not authorized to perform:
imagebuilder:CancelImageCreation on resource: arn:aws:imagebuilder:sa-east-1:1234567:image/parallelclusterimage-integ-tests-build-image-xxx/3.1.2/1
(Service: Imagebuilder, Status Code: 403, Request ID: xxx)"
(RequestToken: xxxx, HandlerErrorCode: GeneralServiceException)
You can verify whether you are impacted by the issue by executing the following command:
pcluster get-image-stack-events --image-id <your-image-id> --region <image-build-region> --query "events[0]"
Then you can check from the output if the resourceStatus
and resourceStatusReason
match the following:
"resourceStatusReason": "The following resource(s) failed to delete: [ParallelClusterImage]. "
"resourceStatus": "DELETE_FAILED",
If you are not impacted by the issue, the following output is expected:
{
"message": "CloudFormation Stack for Image build-image does not exist."
}
The root cause of the problem is that the new version of EC2 Image Builder service, which is used by ParallelCluster build-image
command, requires two new additional IAM policies imagebuilder:GetImage
and imagebuilder:CancelImageCreation
.
This issue affects ParallelCluster versions from 3.0.0 to 3.1.2 when using build-image
functionality.
There are two options to mitigate to problem. Option 1 can be used to manually delete a stack in DELETE_FAILED
. Option 2 can be used if you want to use build-image
command in the future but want to avoid delete the stack by hand after the image build successfully.
Use pcluster describe-image
command to make sure the image is already created:
>pcluster describe-image --image-id <your-image-id> --region <image-region>
{
"imageConfiguration": {
"url":...
},
"imageId": "your-image-id",
"creationTime": "2022-03-06T21:20:52.000Z",
"imageBuildStatus": "BUILD_COMPLETE",
"region": "us-east-1",
"ec2AmiInfo": {
"amiName": "...",
"amiId": "ami-xxx",
"description": "...",
"state": "AVAILABLE",
"tags": [...],
"architecture": "x86_64"
},
"version": "3.1.2"
}
After seeing imageBuildStatus
is in BUILD_COMPLETE
and image state
is AVAILABLE
, you can manually delete the stack by using command:
aws cloudformation delete-stack --stack-name <your-image-id>
Or you can go to AWS CloudFormation console, find the stack named with image_id
and delete it.
This option can be used if you want to use build-image
command in the future and you want to avoid to delete the stack by hand after the image build successfully.
You can use the CleanupLambdaRole
from ParallelCluster documentation. The CleanupLambdaRole
should contains policies imagebuilder:GetImage
and imagebuilder:CancelImageCreation
.
After creating the custom CleanupLambdaRole
, add the role to build-image
configuration:
Build:
Iam:
CleanupLambdaRole: <custom cleanup_lambda_role>