Skip to content

Commit

Permalink
fix(logs): log retention fails with OperationAbortedException (#16083)
Browse files Browse the repository at this point in the history
Fixes: #15709

When creating a lambda with log retention, CDK actually creates 2 lambda functions. The second lambda function alters log retention of the log group of the first lambda and the retention of its own log group.

Because log group creation is asynchronous, the log retention lambda tries to pre-create both log groups to guarantee it has an object to work on.

If a normal lambda execution also creates the related log group at the same time, an "OperationAbortedException:... Please retry" error is returned.

The existing code handles this situation for log retention lambda but not for the first lambda. 

This fix adds the retry pattern to the general log group creation code.

Also existing code had a bug: if OperationAbortedException is hit, the error is hidden but the retention policy is skipped and not actually applied. This fix addresses this bug as well.


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
  • Loading branch information
ddl-denis-parnovskiy authored Sep 10, 2021
1 parent deaac4a commit 3e9f04d
Show file tree
Hide file tree
Showing 2 changed files with 138 additions and 24 deletions.
64 changes: 42 additions & 22 deletions packages/@aws-cdk/aws-logs/lib/log-retention-provider/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,39 @@ interface SdkRetryOptions {
* @param options CloudWatch API SDK options.
*/
async function createLogGroupSafe(logGroupName: string, region?: string, options?: SdkRetryOptions) {
try { // Try to create the log group
const cloudwatchlogs = new AWS.CloudWatchLogs({ apiVersion: '2014-03-28', region, ...options });
await cloudwatchlogs.createLogGroup({ logGroupName }).promise();
} catch (e) {
if (e.code !== 'ResourceAlreadyExistsException') {
throw e;
// If we set the log retention for a lambda, then due to the async nature of
// Lambda logging there could be a race condition when the same log group is
// already being created by the lambda execution. This can sometime result in
// an error "OperationAbortedException: A conflicting operation is currently
// in progress...Please try again."
// To avoid an error, we do as requested and try again.
let retryCount = options?.maxRetries == undefined ? 10 : options.maxRetries;
const delay = options?.retryOptions?.base == undefined ? 10 : options.retryOptions.base;
do {
try {
const cloudwatchlogs = new AWS.CloudWatchLogs({ apiVersion: '2014-03-28', region, ...options });
await cloudwatchlogs.createLogGroup({ logGroupName }).promise();
return;
} catch (error) {
if (error.code === 'ResourceAlreadyExistsException') {
// The log group is already created by the lambda execution
return;
}
if (error.code === 'OperationAbortedException') {
if (retryCount > 0) {
retryCount--;
await new Promise(resolve => setTimeout(resolve, delay));
continue;
} else {
// The log group is still being created by another execution but we are out of retries
throw new Error('Out of attempts to create a logGroup');
}
}
// Any other error
console.error(error);
throw error;
}
}
} while (true); // exit happens on retry count check
}

/**
Expand Down Expand Up @@ -64,21 +89,16 @@ export async function handler(event: AWSLambda.CloudFormationCustomResourceEvent
await setRetentionPolicy(logGroupName, logGroupRegion, retryOptions, parseInt(event.ResourceProperties.RetentionInDays, 10));

if (event.RequestType === 'Create') {
// Set a retention policy of 1 day on the logs of this function. The log
// group for this function should already exist at this stage because we
// already logged the event but due to the async nature of Lambda logging
// there could be a race condition. So we also try to create the log group
// of this function first. If multiple LogRetention constructs are present
// in the stack, they will try to act on this function's log group at the
// same time. This can sometime result in an OperationAbortedException. To
// avoid this and because this operation is not critical we catch all errors.
try {
const region = process.env.AWS_REGION;
await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions);
await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1);
} catch (e) {
console.log(e);
}
// Set a retention policy of 1 day on the logs of this very function.
// Due to the async nature of the log group creation, the log group for this function might
// still be not created yet at this point. Therefore we attempt to create it.
// In case it is being created, createLogGroupSafe will handle the conflic.
const region = process.env.AWS_REGION;
await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions);
// If createLogGroupSafe fails, the log group is not created even after multiple attempts
// In this case we have nothing to set the retention policy on but an exception will skip
// the next line.
await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1);
}
}

Expand Down
98 changes: 96 additions & 2 deletions packages/@aws-cdk/aws-logs/test/test.log-retention-provider.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,14 @@ function createRequest(type: string) {
.reply(200);
}

class MyError extends Error {
code: string;
constructor(message: string, code: string) {
super(message);
this.code = code;
}
}

export = {
'tearDown'(callback: any) {
AWS.restore();
Expand Down Expand Up @@ -231,10 +239,60 @@ export = {
test.done();
},

async 'does not fail when operations on provider log group fail'(test: Test) {
async 'does not if when operations on provider log group fails'(test: Test) {
let attempt = 2;
const createLogGroupFake = (params: AWSSDK.CloudWatchLogs.CreateLogGroupRequest) => {
if (params.logGroupName === '/aws/lambda/provider') {
return Promise.reject(new Error('OperationAbortedException'));
if (attempt > 0) {
attempt--;
return Promise.reject(new MyError(
'A conflicting operation is currently in progress against this resource. Please try again.',
'OperationAbortedException'));
} else {
return Promise.resolve({});
}
}
return Promise.resolve({});
};

const putRetentionPolicyFake = sinon.fake.resolves({});
const deleteRetentionPolicyFake = sinon.fake.resolves({});

AWS.mock('CloudWatchLogs', 'createLogGroup', createLogGroupFake);
AWS.mock('CloudWatchLogs', 'putRetentionPolicy', putRetentionPolicyFake);
AWS.mock('CloudWatchLogs', 'deleteRetentionPolicy', deleteRetentionPolicyFake);

const event = {
...eventCommon,
RequestType: 'Create',
ResourceProperties: {
ServiceToken: 'token',
RetentionInDays: '30',
LogGroupName: 'group',
},
};

const request = createRequest('SUCCESS');

await provider.handler(event as AWSLambda.CloudFormationCustomResourceCreateEvent, context);

test.equal(request.isDone(), true);

test.done();
},

async 'does not fail if operations on CDK lambda log group fails twice'(test: Test) {
let attempt = 2;
const createLogGroupFake = (params: AWSSDK.CloudWatchLogs.CreateLogGroupRequest) => {
if (params.logGroupName === 'group') {
if (attempt > 0) {
attempt--;
return Promise.reject(new MyError(
'A conflicting operation is currently in progress against this resource. Please try again.',
'OperationAbortedException'));
} else {
return Promise.resolve({});
}
}
return Promise.resolve({});
};
Expand Down Expand Up @@ -265,6 +323,42 @@ export = {
test.done();
},

async 'does fail if operations on CDK lambda log group fails indefinitely'(test: Test) {
const createLogGroupFake = (params: AWSSDK.CloudWatchLogs.CreateLogGroupRequest) => {
if (params.logGroupName === 'group') {
return Promise.reject(new MyError(
'A conflicting operation is currently in progress against this resource. Please try again.',
'OperationAbortedException'));
}
return Promise.resolve({});
};

const putRetentionPolicyFake = sinon.fake.resolves({});
const deleteRetentionPolicyFake = sinon.fake.resolves({});

AWS.mock('CloudWatchLogs', 'createLogGroup', createLogGroupFake);
AWS.mock('CloudWatchLogs', 'putRetentionPolicy', putRetentionPolicyFake);
AWS.mock('CloudWatchLogs', 'deleteRetentionPolicy', deleteRetentionPolicyFake);

const event = {
...eventCommon,
RequestType: 'Create',
ResourceProperties: {
ServiceToken: 'token',
RetentionInDays: '30',
LogGroupName: 'group',
},
};

const request = createRequest('FAILED');

await provider.handler(event as AWSLambda.CloudFormationCustomResourceCreateEvent, context);

test.equal(request.isDone(), true);

test.done();
},

async 'response data contains the log group name'(test: Test) {
AWS.mock('CloudWatchLogs', 'createLogGroup', sinon.fake.resolves({}));
AWS.mock('CloudWatchLogs', 'putRetentionPolicy', sinon.fake.resolves({}));
Expand Down

0 comments on commit 3e9f04d

Please sign in to comment.