Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blueprints for EKS cluster and workers #18

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mromaszewicz
Copy link

Simple blueprints to create an EKS cluster, and to create workers
in an AutoScalngGroup which connect to that master.

Simple blueprints to create an EKS cluster, and to create workers
in an AutoScalngGroup which connect to that master.
Copy link
Member

@phobologic phobologic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a great start!

]
)
)
return GetAtt(eks_service_role_id, "Arn")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also probably want to include some outputs for the created role. It may not be used now, but always a good idea. These days I tend to create outputs for just about every Ref and GetAtt that a resource provides, just to avoid any issues in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is output later - whether or not existing role is passed in, or a new one is created, it's an output in create_stack.

t.add_resource(
eks.Cluster(
"EksCluster",
Name=eks_name_tag,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it's usually a good idea to avoid Names in Cloudformation, due to limitaitons with Name Types in Cloudformation (they can't be updated with "requires replacement" updates without changing the Name: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-name.html)

I'm not 100% certain this is a Name Type, but I'm pretty sure it is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a Name type. It's required for an EKS cluster, can't skip it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? It doesn't look like it's required (I don't think name types are ever required): https://github.com/cloudtools/troposphere/blob/master/troposphere/eks.py#L20

def create_iam_role(self):
eks_service_role_id = "EksServiceRole"
t = self.template
t.add_resource(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A newer pattern in troposphere that we're making more use of in stacker is to assign the creation of resources to python variables, then use the builtin troposphere Ref and GetAtt methods. An example for this:

role = t.add_resource(Role(...))
return role.GetAtt("Arn")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# Output the ClusterName and RoleArn, which are useful as inputs for
# EKS worker nodes.
t.add_output(Output("ClusterName", Value=eks_name_tag))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add more outputs for the Cluster & Role. Usually we create the outputs in the same method where the resources are created, just to keep code close together. Might be difficult with the Role, though I think there are some good examples in the codebase already.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this exports everything that's needed, you can get everything via GetAtt(clustername, thing). I did the Role here on purpose, so that I can pass it in or create it. I can make it output in two places - once just after creation, and once after reusing an existing one, but I chose to spread things out and not duplicate. Your call, which do you prefer?


def create_max_pods_per_node_mapping(t):
mapping = {}
for instance, max_pods in max_pods_per_instance.iteritems():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: As we've moved to python3 support in stacker, we should probably start thinking about that in stacker_blueprints as well. One thing I know we'll need to do is stop using iteritems and instead use items. Might as well start here :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def create_node_instance_role(self):
t = self.template

# The user data below relies on this map being present.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems out of place here, since it's not used in this method directly - though maybe not necessary per previous comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. t is used to create an Output below

InstanceType=variables["InstanceType"],
KeyName=variables["KeyName"],
SecurityGroups=[variables["WorkerSecurityGroupId"]],
UserData=Base64(Join("", user_data)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method get_launch_config_userdata already returns a Base64 and Join'd string, right? Not sure this will work here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

desired_instances = min_instances

# Create the AutoScalingGroup which will manage our instances. It's
# easy to change the worker count be tweaking the limits in here once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: be -> by

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"type": str,
"description": "The name of the cluster for workers to join."
},
"WorkerSecurityGroupId": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop the Worker part of these variables. They're already attached to the Worker blueprint, and it makes it easier if you can specify generic terms like Subnets and SecurityGroup (which I believe are the variables we use elsewhere).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"default": -1,
},
"WorkerSubnets": {
"type": str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not use a list type? It seems like that's what is expected when it's used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I frequently pass a stack output here from another stack, and maybe I did something wrong, but I can't seem to create list outputs from a stack which I can reference here.

marcinromaszewicz and others added 2 commits July 16, 2018 22:15
@phobologic
Copy link
Member

Hey @mromaszewicz - I did a quick run through this to update it to use some of the concepts we've been pulling into newer blueprints (things like properties for variables, using Sub) and removed the Mapping. Let me know what you think.

@mromaszewicz
Copy link
Author

Looks good overall. I had the mapping so that the instance userdata that the stack created was the same as Amazon's CFN script for EKS workers.

I just checked - you can indeed launch a Cluster without a Name, and AWS assigns a random name - the name must be there. I think the stack should permit specifying the Name, at least optionally via argument, because downstream from the cluster, lots of things depend on the Name, not the Cluster ID or ARN. For example, creating the role which allows workers to join the cluster is one, but that's no biggie, you can use that as a reference from the stack. However, when you create your own kubeconfig, it would be nice to have a cluster named something like "prod-Cluster", instead of "Cluster-EKS1234567". I've got four clusters, and without meaningful names, it gets messy.

@phobologic
Copy link
Member

phobologic commented Jul 22, 2018

So the allure of naming things is really dangerous - it's bitten me in the past a ton of times, so much so that I really think the benefits are not worth it.

You'll find that not using naming isn't all that difficult in the long run - the clusters aren't named totally randomly, they'll be named based on: <namespace>-<stack_name>-<resource_name><random>, so you'll find it's pretty easy to tell which is which.

Honestly, it's just the extension of the old "treat your servers like cattle, not pets" philosophy. I think in the public stacker_blueprints repo I want to avoid dangerous settings - so I think I'd prefer to keep the Name out of this (which we've done in most newer Blueprints, since it is so dangerous).

re: The mapping - the userdata, on the box, should end up the same. It'll just look different in cloudformation (it'll have a single value, rather than a map that you then have to go lookup). I've found with stacker (and other cloudformation tools) that it's actually worth getting away from some of the old cloudformation constructs (Parameters, for example), since they aren't actually used as they were intended (ie: meant to be filled in manually so you can re-use a cloudformation template - stacker doesn't need that, since it can already re-use the template via it's blueprints).

@phobologic
Copy link
Member

BTW, an example of not using naming - at my new company, we setup ECS clusters with this. With these settings:

namespace: modsy-dev
stack_name: wrc-cluster
resource: Cluster

And our name ended up being: modsy-dev-wrc-cluster-Cluster-1IRSA68PZZ08K

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants