Skip to content

Commit

Permalink
fix(eks): nodegroup synthesis fails when configured with an AMI type …
Browse files Browse the repository at this point in the history
…that is not compatible to the default instance type (#12441)

> Note: both issues here were introduced in #11962

## Problem 1

When creating a `Nodegroup` without passing instance types, we currently default to use `t3.medium`:

https://github.com/aws/aws-cdk/blob/da1ed08a6a2de584f5ddf43dab4efbb530541419/packages/%40aws-cdk/aws-eks/lib/managed-nodegroup.ts#L294

This default is then used to calculate the expected AMI type, and assert that the configured AMI type is indeed as expected:

https://github.com/aws/aws-cdk/blob/da1ed08a6a2de584f5ddf43dab4efbb530541419/packages/%40aws-cdk/aws-eks/lib/managed-nodegroup.ts#L302-L304

However, a user might configure instance types on the launch template, and an AMI type on the nodegroup. In this scenario, we still use the default instance type to perform the validation, which will fail if the ami type is not compatible with it.

To make things worse, we don't actually use the default instance type at all, apart from the validation:

https://github.com/aws/aws-cdk/blob/da1ed08a6a2de584f5ddf43dab4efbb530541419/packages/%40aws-cdk/aws-eks/lib/managed-nodegroup.ts#L329-L330

And in-fact, this default was only introduced in this [PR](#11962), which also added the problematic validation. 

### Solution

Drop the default instance type altogether, like it was before. The new validation will only take place if the user explicitly configured both `instanceTypes` and `amiType` on the nodegroup. Since the default value was never actually used, this doesn't incur any behavior change.

## Problem 2

When a launch template is used, we currently ignore the value of `amiType` explicitly passed by the user:

https://github.com/aws/aws-cdk/blob/da1ed08a6a2de584f5ddf43dab4efbb530541419/packages/%40aws-cdk/aws-eks/lib/managed-nodegroup.ts#L324-L325

This behavior means that users who configured a launch template without a custom ami, and passing an `amiType` to the nodegroup, would now result in no ami specification at all, defaulting to whatever EKS does, which might not be what the user had in mind.

There's no good reason to do this, we should either throw a validation error if both are used, or pass the explicit value nevertheless, even though it might cause problems.

### Solution

When a user explicitly passes an AMI type, just use it and assume the user knows what he/she is doing. When a user does not explicitly pass it, only apply the default if a launch template is not used. 

> If we apply the default in the presence of a launch template, a user would not be able to escape if they also have a custom AMI in the launch template.

This change means that users who previously "relied" on this override, might now experience a deployment failure if they are using a custom AMI in the launch template, those users can resolve the problem by removing the `amiType` property from the nodegroup (since it wasn't used, its not needed). I don't imagine many such users exist since this behavior is new and it doesn't make much sense to configure both a custom AMI and an `amiType`.

--------------------

Fixes #12389

BREAKING CHANGE: Explicitly passing `amiType` to nodegroups will now take affect even if a launch template is configured as well. If your launch template contains a custom AMI, this will cause a deployment failure, to resolve, remove the explicit `amiType` from the nodegroup configuration.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
  • Loading branch information
iliapolo committed Jan 12, 2021
1 parent 422dc8e commit 5f6f0f9
Show file tree
Hide file tree
Showing 2 changed files with 118 additions and 22 deletions.
54 changes: 32 additions & 22 deletions packages/@aws-cdk/aws-eks/lib/managed-nodegroup.ts
Original file line number Diff line number Diff line change
Expand Up @@ -226,10 +226,6 @@ export interface NodegroupProps extends NodegroupOptions {
* The Nodegroup resource class
*/
export class Nodegroup extends Resource implements INodegroup {
/**
* Default instanceTypes
*/
public static readonly DEFAULT_INSTANCE_TYPES = [new InstanceType('t3.medium')];
/**
* Import the Nodegroup from attributes
*/
Expand Down Expand Up @@ -291,16 +287,17 @@ export class Nodegroup extends Resource implements INodegroup {
if (props.instanceType) {
Annotations.of(this).addWarning('"instanceType" is deprecated and will be removed in the next major version. please use "instanceTypes" instead');
}
const instanceTypes = props.instanceTypes ?? (props.instanceType ? [props.instanceType] : Nodegroup.DEFAULT_INSTANCE_TYPES);
// get unique AMI types from instanceTypes
const uniqAmiTypes = getAmiTypes(instanceTypes);
// uniqAmiTypes.length should be at least 1
if (uniqAmiTypes.length > 1) {
throw new Error('instanceTypes of different CPU architectures is not allowed');
}
const determinedAmiType = uniqAmiTypes[0];
if (props.amiType && props.amiType !== determinedAmiType) {
throw new Error(`The specified AMI does not match the instance types architecture, either specify ${determinedAmiType} or dont specify any`);
const instanceTypes = props.instanceTypes ?? (props.instanceType ? [props.instanceType] : undefined);
let expectedAmiType = undefined;

if (instanceTypes && instanceTypes.length > 0) {
// if the user explicitly configured instance types, we can calculate the expected ami type.
expectedAmiType = getAmiType(instanceTypes);

// if the user explicitly configured an ami type, make sure its the same as the expected one.
if (props.amiType && props.amiType !== expectedAmiType) {
throw new Error(`The specified AMI does not match the instance types architecture, either specify ${expectedAmiType} or dont specify any`);
}
}

if (!props.nodeRole) {
Expand All @@ -321,13 +318,18 @@ export class Nodegroup extends Resource implements INodegroup {
nodegroupName: props.nodegroupName,
nodeRole: this.role.roleArn,
subnets: this.cluster.vpc.selectSubnets(props.subnets).subnetIds,
// AmyType is not allowed by CFN when specifying an image id in your launch template.
amiType: props.launchTemplateSpec === undefined ? determinedAmiType : undefined,

// if a launch template is configured, we cannot apply a default since it
// might exist in the launch template as well, causing a deployment failure.
amiType: props.launchTemplateSpec !== undefined ? props.amiType : (props.amiType ?? expectedAmiType),

capacityType: props.capacityType ? props.capacityType.valueOf() : undefined,
diskSize: props.diskSize,
forceUpdateEnabled: props.forceUpdate ?? true,
instanceTypes: props.instanceTypes ? props.instanceTypes.map(t => t.toString()) :
props.instanceType ? [props.instanceType.toString()] : undefined,

// note that we don't check if a launch template is configured here (even though it might configure instance types as well)
// because this doesn't have a default value, meaning the user had to explicitly configure this.
instanceTypes: instanceTypes?.map(t => t.toString()),
labels: props.labels,
releaseVersion: props.releaseVersion,
remoteAccess: props.remoteAccess ? {
Expand Down Expand Up @@ -392,8 +394,16 @@ function getAmiTypeForInstanceType(instanceType: InstanceType) {
NodegroupAmiType.AL2_X86_64;
}

function getAmiTypes(instanceType: InstanceType[]) {
const amiTypes = instanceType.map(i =>getAmiTypeForInstanceType(i));
// retuen unique AMI types
return [...new Set(amiTypes)];
// this function examines the CPU architecture of every instance type and determines
// what ami type is compatible for all of them. it either throws or produces a single value because
// instance types of different CPU architectures are not supported.
function getAmiType(instanceTypes: InstanceType[]) {
const amiTypes = new Set(instanceTypes.map(i => getAmiTypeForInstanceType(i)));
if (amiTypes.size == 0) { // protective code, the current implementation will never result in this.
throw new Error(`Cannot determine any ami type comptaible with instance types: ${instanceTypes.map(i => i.toString).join(',')}`);
}
if (amiTypes.size > 1) {
throw new Error('instanceTypes of different CPU architectures is not allowed');
}
return amiTypes.values().next().value;
}
86 changes: 86 additions & 0 deletions packages/@aws-cdk/aws-eks/test/test.nodegroup.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,92 @@ import { testFixture } from './util';
const CLUSTER_VERSION = eks.KubernetesVersion.V1_18;

export = {

'default ami type is not applied when launch template is configured'(test: Test) {

// GIVEN
const { stack, vpc } = testFixture();

const launchTemplate = new ec2.CfnLaunchTemplate(stack, 'LaunchTemplate', {
launchTemplateData: {
instanceType: ec2.InstanceType.of(ec2.InstanceClass.C5, ec2.InstanceSize.MEDIUM).toString(),
},
});

// WHEN
const cluster = new eks.Cluster(stack, 'Cluster', {
vpc,
defaultCapacity: 0,
version: CLUSTER_VERSION,
});
new eks.Nodegroup(stack, 'Nodegroup', {
cluster,
instanceTypes: [ec2.InstanceType.of(ec2.InstanceClass.C5, ec2.InstanceSize.LARGE)],
launchTemplateSpec: {
id: launchTemplate.ref,
version: launchTemplate.attrLatestVersionNumber,
},
});

// THEN
test.equal(expect(stack).value.Resources.Nodegroup62B4B2C1.Properties.AmiType, undefined);
test.done();
},

'explicit ami type is applied even when launch template is configured'(test: Test) {

// GIVEN
const { stack, vpc } = testFixture();

const launchTemplate = new ec2.CfnLaunchTemplate(stack, 'LaunchTemplate', {
launchTemplateData: {
instanceType: ec2.InstanceType.of(ec2.InstanceClass.C5, ec2.InstanceSize.MEDIUM).toString(),
},
});

// WHEN
const cluster = new eks.Cluster(stack, 'Cluster', {
vpc,
defaultCapacity: 0,
version: CLUSTER_VERSION,
});
new eks.Nodegroup(stack, 'Nodegroup', {
cluster,
amiType: eks.NodegroupAmiType.AL2_X86_64,
launchTemplateSpec: {
id: launchTemplate.ref,
version: launchTemplate.attrLatestVersionNumber,
},
});

// THEN
test.equal(expect(stack).value.Resources.Nodegroup62B4B2C1.Properties.AmiType, 'AL2_x86_64');
test.done();
},

'ami type is taken as is when no instance types are configured'(test: Test) {

// GIVEN
const { stack, vpc } = testFixture();

// WHEN
const cluster = new eks.Cluster(stack, 'Cluster', {
vpc,
defaultCapacity: 0,
version: CLUSTER_VERSION,
});
new eks.Nodegroup(stack, 'Nodegroup', {
cluster,
amiType: eks.NodegroupAmiType.AL2_X86_64_GPU,
});

// THEN
expect(stack).to(haveResourceLike('AWS::EKS::Nodegroup', {
AmiType: 'AL2_x86_64_GPU',
}));
test.done();
},

'create nodegroup correctly'(test: Test) {
// GIVEN
const { stack, vpc } = testFixture();
Expand Down

0 comments on commit 5f6f0f9

Please sign in to comment.