-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(aws-logs): log retention failing cdk deployment with OperationAbortedException #17546
Comments
I've added more logging to the
This resulted in:
|
I have the exact same issue with |
Could the speed of the machine CDK is running on manifest this ? We have |
Confirming that #17688 also fixes the problem for me, hope it gets merged soon! |
Fixes: #17546 This adds to the fix in #16083 that was addressing the issue where the LogRetention Lambda can be executed concurrently and create a race condition where multiple invocations are trying to create or modify the same log group. The previous fix addressed the issue if it occurred during log group creation, in the `createLogGroupSafe` method, but did not account for the same problem happening when modifying a log group's retention period in the `setRetentionPolicy` method. This fix applies the same logic from the last fix to the other method. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
|
Still occurs as of |
) Fixes: aws#17546 This adds to the fix in aws#16083 that was addressing the issue where the LogRetention Lambda can be executed concurrently and create a race condition where multiple invocations are trying to create or modify the same log group. The previous fix addressed the issue if it occurred during log group creation, in the `createLogGroupSafe` method, but did not account for the same problem happening when modifying a log group's retention period in the `setRetentionPolicy` method. This fix applies the same logic from the last fix to the other method. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
What is the problem?
The
LogRetention
custom resource is causing CDK deployments to fail due to a race condition happening due to the log group the log retention is trying to create and the log group being created (and the retention period being set) for the log retention Lambda.There was previously an issue made for this and a fix was completed and released, but many people are still reporting that the issue persists. The previous issue is here: #15709
Reproduction Steps
This is difficult to reproduce consistently, but the more a CDK app makes use of the
LogRetention
custom resource, the more likely it is to happen.The RFDK integration tests (RFDK is a CDK construct library) deploy multiple stacks in parallel and have seen the failure on 7 of the last 10 runs.
What did you expect to happen?
We were expecting the CDK app to deploy successfully.
What actually happened?
The CDK app failed with the following error:
CDK CLI Version
1.129.0
Framework Version
No response
Node.js Version
14.17.1
OS
Amazon Linux 2
Language
Typescript
Language Version
TypeScript ~4.4.4
Other information
I modified the LogRetention Lambda to include a few more log statements and was able to reproduce the error.
This is my modified
createLogGroupSafe
function:This produced the following log:
From this we can see that:
ResourceAlreadyExistsException
, likely when trying to create the LogGroup for the Lambda itself, which is expected if the Lambda has previously ran.OperationAbortedException
without hitting theCaught error
logging line I added or flowing through any of the retry logic.I believe this logging would indicate that the
OperationAbortedException
might not be coming fromcreateLogGroupSafe
and might instead be coming fromsetRetentionPolicy
that happens after.The text was updated successfully, but these errors were encountered: