-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(logs): log retention fails with OperationAbortedException #16083
Conversation
Fixes #15709 Include OperationAbortedException into retryableErrors to handle race of "another create operation is in progress" when the target lambda creates the log group and this function also tries to create the same log group at the same time.
Title does not follow the guidelines of Conventional Commits. Please adjust title before merge. |
try { | ||
const region = process.env.AWS_REGION; | ||
await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions); | ||
await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See in the old code, if CreateLogGroupSafe throws an error, setRetentionPolicy is skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by this?
This behaviour is the same in your updated code as well, i.e., if createLogGroupSafe()
throws an error setRetentionPolicy()
is skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old code:
CreateLogGroupSafe encounters OperationAbortedException, throws exception, setRetentionPolicy is skipped, the error is eaten by catch all.
Result: retention policy for this function is not set and no error. This is a bug in the old code I'm talking about.
New code:
CreateLogGroupSafe encounters OperationAbortedException, retries, if really fails after several attempts, then the error is floated up the stack.
Result: either log group is created with limited retention, or an exception is thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this pull request. Please see some of my comments below.
try { | ||
const region = process.env.AWS_REGION; | ||
await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions); | ||
await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by this?
This behaviour is the same in your updated code as well, i.e., if createLogGroupSafe()
throws an error setRetentionPolicy()
is skipped.
Fixes #15709 Include OperationAbortedException into retryableErrors to handle race of "another create operation is in progress" when the target lambda creates the log group and this function also tries to create the same log group at the same time.
Pull request has been modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes. Looks good to me.
Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
Fixes: #17546 This adds to the fix in #16083 that was addressing the issue where the LogRetention Lambda can be executed concurrently and create a race condition where multiple invocations are trying to create or modify the same log group. The previous fix addressed the issue if it occurred during log group creation, in the `createLogGroupSafe` method, but did not account for the same problem happening when modifying a log group's retention period in the `setRetentionPolicy` method. This fix applies the same logic from the last fix to the other method. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
) Fixes: aws#17546 This adds to the fix in aws#16083 that was addressing the issue where the LogRetention Lambda can be executed concurrently and create a race condition where multiple invocations are trying to create or modify the same log group. The previous fix addressed the issue if it occurred during log group creation, in the `createLogGroupSafe` method, but did not account for the same problem happening when modifying a log group's retention period in the `setRetentionPolicy` method. This fix applies the same logic from the last fix to the other method. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Fixes: #15709
When creating a lambda with log retention, CDK actually creates 2 lambda functions. The second lambda function alters log retention of the log group of the first lambda and the retention of its own log group.
Because log group creation is asynchronous, the log retention lambda tries to pre-create both log groups to guarantee it has an object to work on.
If a normal lambda execution also creates the related log group at the same time, an "OperationAbortedException:... Please retry" error is returned.
The existing code handles this situation for log retention lambda but not for the first lambda.
This fix adds the retry pattern to the general log group creation code.
Also existing code had a bug: if OperationAbortedException is hit, the error is hidden but the retention policy is skipped and not actually applied. This fix addresses this bug as well.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license