fix(logs): log retention fails with OperationAbortedException #16083

ddl-denis-parnovskiy · 2021-08-17T02:09:40Z

When creating a lambda with log retention, CDK actually creates 2 lambda functions. The second lambda function alters log retention of the log group of the first lambda and the retention of its own log group.

Because log group creation is asynchronous, the log retention lambda tries to pre-create both log groups to guarantee it has an object to work on.

If a normal lambda execution also creates the related log group at the same time, an "OperationAbortedException:... Please retry" error is returned.

The existing code handles this situation for log retention lambda but not for the first lambda.

This fix adds the retry pattern to the general log group creation code.

Also existing code had a bug: if OperationAbortedException is hit, the error is hidden but the retention policy is skipped and not actually applied. This fix addresses this bug as well.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

Fixes #15709 Include OperationAbortedException into retryableErrors to handle race of "another create operation is in progress" when the target lambda creates the log group and this function also tries to create the same log group at the same time.

gitpod-io · 2021-08-17T02:09:42Z

mergify · 2021-08-17T02:10:13Z

Title does not follow the guidelines of Conventional Commits. Please adjust title before merge.

ddl-denis-parnovskiy · 2021-08-17T02:10:37Z

packages/@aws-cdk/aws-logs/lib/log-retention-provider/index.ts

-        try {
-          const region = process.env.AWS_REGION;
-          await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions);
-          await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1);


See in the old code, if CreateLogGroupSafe throws an error, setRetentionPolicy is skipped.

What do you mean by this?
This behaviour is the same in your updated code as well, i.e., if createLogGroupSafe() throws an error setRetentionPolicy() is skipped.

Old code:
CreateLogGroupSafe encounters OperationAbortedException, throws exception, setRetentionPolicy is skipped, the error is eaten by catch all.
Result: retention policy for this function is not set and no error. This is a bug in the old code I'm talking about.

New code:
CreateLogGroupSafe encounters OperationAbortedException, retries, if really fails after several attempts, then the error is floated up the stack.
Result: either log group is created with limited retention, or an exception is thrown.

nija-at

Thanks for submitting this pull request. Please see some of my comments below.

packages/@aws-cdk/aws-logs/lib/log-retention-provider/index.ts

nija-at · 2021-09-06T16:36:05Z

packages/@aws-cdk/aws-logs/lib/log-retention-provider/index.ts

-        try {
-          const region = process.env.AWS_REGION;
-          await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions);
-          await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1);


What do you mean by this?
This behaviour is the same in your updated code as well, i.e., if createLogGroupSafe() throws an error setRetentionPolicy() is skipped.

packages/@aws-cdk/aws-logs/lib/log-retention-provider/index.ts

Fixes #15709 Include OperationAbortedException into retryableErrors to handle race of "another create operation is in progress" when the target lambda creates the log group and this function also tries to create the same log group at the same time.

Pull request has been modified.

nija-at

Thanks for the fixes. Looks good to me.

mergify · 2021-09-10T13:23:08Z

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

aws-cdk-automation · 2021-09-10T14:12:45Z

AWS CodeBuild CI Report

CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
Commit ID: 64d2bb7
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mergify · 2021-09-10T14:13:12Z

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

Fixes: #17546 This adds to the fix in #16083 that was addressing the issue where the LogRetention Lambda can be executed concurrently and create a race condition where multiple invocations are trying to create or modify the same log group. The previous fix addressed the issue if it occurred during log group creation, in the `createLogGroupSafe` method, but did not account for the same problem happening when modifying a log group's retention period in the `setRetentionPolicy` method. This fix applies the same logic from the last fix to the other method. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

) Fixes: aws#17546 This adds to the fix in aws#16083 that was addressing the issue where the LogRetention Lambda can be executed concurrently and create a race condition where multiple invocations are trying to create or modify the same log group. The previous fix addressed the issue if it occurred during log group creation, in the `createLogGroupSafe` method, but did not account for the same problem happening when modifying a log group's retention period in the `setRetentionPolicy` method. This fix applies the same logic from the last fix to the other method. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

ddl-denis-parnovskiy commented Aug 17, 2021

View reviewed changes

ddl-denis-parnovskiy changed the title ~~Fix for aws#15709 race when creating log groups~~ fix: aws#15709 race when creating log groups Aug 17, 2021

Merge branch 'master' into fix-15709

e70f491

peterwoodworth changed the title ~~fix: aws#15709 race when creating log groups~~ fix: race when creating log groups Aug 19, 2021

peterwoodworth assigned rix0rrr Aug 19, 2021

peterwoodworth added @aws-cdk/aws-logs Related to Amazon CloudWatch Logs effort/small Small work item – less than a day of effort p1 labels Aug 19, 2021

peterwoodworth changed the title ~~fix: race when creating log groups~~ fix(logs): race when creating log groups Aug 19, 2021

nija-at previously requested changes Sep 6, 2021

View reviewed changes

nija-at assigned nija-at and unassigned rix0rrr Sep 6, 2021

nija-at changed the title ~~fix(logs): race when creating log groups~~ fix(logs): log retention fails with OperationAbortedException Sep 6, 2021

nija-at approved these changes Sep 10, 2021

View reviewed changes

Merge branch 'master' into fix-15709

64d2bb7

mergify bot merged commit 3e9f04d into aws:master Sep 10, 2021

ddl-denis-parnovskiy deleted the fix-15709 branch September 11, 2021 00:20

horsmand mentioned this pull request Nov 24, 2021

fix(logs): log retention fails with OperationAbortedException #17688

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(logs): log retention fails with OperationAbortedException #16083

fix(logs): log retention fails with OperationAbortedException #16083

ddl-denis-parnovskiy commented Aug 17, 2021 •

edited

Loading

gitpod-io bot commented Aug 17, 2021

mergify bot commented Aug 17, 2021

ddl-denis-parnovskiy Aug 17, 2021

nija-at Sep 6, 2021

ddl-denis-parnovskiy Sep 8, 2021

nija-at left a comment

nija-at Sep 6, 2021

nija-at left a comment

mergify bot commented Sep 10, 2021

aws-cdk-automation commented Sep 10, 2021

mergify bot commented Sep 10, 2021

fix(logs): log retention fails with OperationAbortedException #16083

fix(logs): log retention fails with OperationAbortedException #16083

Conversation

ddl-denis-parnovskiy commented Aug 17, 2021 • edited Loading

gitpod-io bot commented Aug 17, 2021

mergify bot commented Aug 17, 2021

ddl-denis-parnovskiy Aug 17, 2021

Choose a reason for hiding this comment

nija-at Sep 6, 2021

Choose a reason for hiding this comment

ddl-denis-parnovskiy Sep 8, 2021

Choose a reason for hiding this comment

nija-at left a comment

Choose a reason for hiding this comment

nija-at Sep 6, 2021

Choose a reason for hiding this comment

nija-at left a comment

Choose a reason for hiding this comment

mergify bot commented Sep 10, 2021

aws-cdk-automation commented Sep 10, 2021

AWS CodeBuild CI Report

mergify bot commented Sep 10, 2021

ddl-denis-parnovskiy commented Aug 17, 2021 •

edited

Loading