Feature: Node pools #46

mumoshu · 2016-11-08T09:10:35Z

I'd like to add GKE's "Node Pools" like feature to kube-aws.

With that feature, we can differentiate the following things per pool without maintaining multiple etcd clusters and kubernetes controller planes:

instance type
storage size
instance profile
security group
min/max spot price
auto scaling group or spot fleet
node labels

It will eventually give us more granular control over:

security
- in combination with node selectors, we can
  1. restrict nodes which are able to communicate with databases containing sensitive information and
  2. schedule only necessary pods onto the nodes
cost effectiveness
- in combination with node selectors, we can
  1. make pool(s) backed by spot instances and label nodes with e.g. spot=true
  2. explicitly schedule pods which can be spawned/killed more aggressively onto those nodes
scalability
- creating a separate pool per availability zone allows us utilizing cluster-autoscaler in the multi-AZ setup (see kube-aws: Kubernetes' Cluster Autoscaler on AWS requires separate ASG for each AZ coreos/coreos-kubernetes#668)
etc
- If you are aware of one, let us know!

Beware that until the taint-and-toleration feature is implemented into Kubernetes, we'd need to give pods specific node selectors to prevent them from being scheduled onto undesirable nodes.

Edit: Taints and tolerations are partially supported in Kubernetes 1.4.x and introduced in kube-aws since #132

TODOs(possibly done in separate pull requests)

kube-aws nodepool update (Add kube-aws node-pools update command #130)
- fix kube-aws nodepool update which was always failing 😉 (Make kube-aws node-pools update not to fail #140)
kube-aws nodepool validate Add kube-aws node-pools validate command #161
kube-aws nodepool destroy (Add kube-aws node-pools destroy command #139)
Node labels configurable via node-pools/yourpoolname/cluster.yaml Experimental feature: User-specified node labels for worker nodes #149
Spot Fleet support (Feature: Spot Fleet support for worker nodes #112 Experimental feature: Spot Fleet support for worker nodes #113)
confirm EBS volumes using old-generation standard magnetic disks for controller and etcd #68 doesn't affect node pools
redo Proposal: Clouformation wait for signal #49 for node pools (Add several experimental features already introduced to main clusters to node pools #128)

Non-TODOs

Multi region deployment a.k.a deploy node pools in another region(s)

The text was updated successfully, but these errors were encountered:

mumoshu · 2016-11-08T09:12:11Z

For now, I'm going to build a POC like I've described in coreos/coreos-kubernetes#667 (comment)

Any requests, comments, proposals are welcomed!

pieterlange · 2016-11-08T09:16:04Z

If we implement node pools by firing off new cloudformations we could generate separate stack-templates and userdata for each node pool.

We should still maintain a "main" stack for cross referencing resources.

cknowles · 2016-11-08T11:11:51Z

Sounds great and definitely would provide a lot of flexibility. I think it's a similar concept to kops instance groups? Part of me thinks it would be good to combine efforts as kops has a ticket for coreos support too.

mumoshu · 2016-11-11T08:33:29Z

@c-knowles Thanks, I had never noticed kops already had that!
Definitely looking forward to see how it's implemented/working currently and collaborate whenever possible.

mumoshu · 2016-11-11T08:33:49Z

Added something about node labels, cluster-autoscaler, spot pricing to organize what I should include in the poc.

mumoshu · 2016-11-16T09:12:48Z

Regarding the POC, I've ended up with this:

master...mumoshu:node-pools

If you're bored of just waiting for this to land, please feel free to build the kube-aws command from the mumoshu:node-pools branch and try the new kube-aws nodepool (init|render|up) --pool-name mypool command 😃

# requires `./cluster.yaml` to exist to inherit various variables from the original/base k8s cluster(including dnsServiceIP, apiServerEndpoint, etcdEndpoints, cfn stack name for importing exported outputs like worker security group id, vpc id, route table id, etc)
kube-aws nodepool init --pool-name mypool \
    --availability-zone ${KUBE_AWS_AVAILABILITY_ZONE} \
    --key-name ${KUBE_AWS_KEY_NAME} \
    --kms-key-arn ${KUBE_AWS_KMS_KEY_ARN}

kube-aws nodepool render stack --pool-name mypool

# requires the base cluster to be up and exports required outputs(worker security group id, at least. also vpc id, route table id if omitteed in `node-pools/mypool/cluster.yaml`)
kube-aws nodepool up --pool-name mypool --s3-uri ${KUBE_AWS_S3_URI}

ref: https://github.com/mumoshu/kube-aws/blob/cb58f10fa0da61cf88eb3cbfc463957d2b37f9d0/e2e/run#L176-L186

TODOs

Remove code duplication, more
Do some refactoring
Squash commits and then split into more meaningful sequence of commits(possibly allowing me to send multiple, separate pull requests for more reviewer-happiness 😆 )
Unit tests

mumoshu · 2016-11-17T00:18:23Z

Several notes on current implementation:

Usage

kube-aws nodepool (init|render|up) --pool-name mypool command assumes you run kube-aws from the same working directory where you've run kube-aws (init|render|up).

$ mkdir mycluster
$ cd mycluster
$ kube-aws init --cluster-name mycluster ...
$ kube-aws render credentials
$ kube-aws render stack
$ kube-aws up ...
$ kube-aws nodepool init --pool-name mypool ...
$ kube-aws nodepool render stack --pool-name mypool ...
$ kube-aws nodepool up --pool-name mypool ...

Designs

We now have separate cluster.yaml, stack-template.json, user-data/cloud-init-worker for each node pool.
I tried to make cluster.yaml for node pools look familiar to us. You'll see the worker* config keys e.g. workerInstanceType, workerRootVolumeSize, workerRootVolumeType, etc. in both main cluster's and node pools' cluster.yaml, instead of seeing instanceType, rootVolumeSize, rootVolumeType, etc. without worker prefix only in node pools' cluster.yaml.
- The end result is a bit verbose but consistent, familiar looking cluster.yamls for main cluster and node pools.
Worker nodes in a main cluster and node pools are assigned to the same security group(the one with the logical name SecurityGroupWorker in cloudformation stacks) so that we can allow etcd nodes in the main cluster can communicate with newly added worker nodes from node pools, etc.
- CloudFormation's Export/ImportValue functions are used to achieve it btw

Implementation details

A cloudformation stack for a node pool works in two ways
- If you've omitted vpcId and/or routeTableId in a node pool's cluster.yaml, kube-aws nodepool up will generate a cfn stack template containing Fn::ImportValues to import required values from the main cluster's cfn stack.
- If you've specified vpcId and/or routeTableId in it, kube-aws nodepool up doesn't import values from the main stack. It just embed values as is, instead of embedding Fn::ImportValues. This is achieved via computing RouteTableRef and VPCRef in go code and referencing it from stack-template.json for node pools. (refs to go code and stack-template.json)

Implications

It won't allow us to create kube-aws node pools connected to non-kube-aws main clusters in a clean way. Though allowing it would be out of scope for this issue, I just wanted to note.

File tree

The whole file tree for a main cluster named mycluster with a node pool named mypool would look like:

$ tree mycluster/
mycluster/
├── cluster.yaml
├── credentials
│   ├── admin-key.pem
│   ├── admin.pem
│   ├── apiserver-key.pem
│   ├── apiserver.pem
│   ├── ca-key.pem
│   ├── ca.pem
│   ├── etcd-client-key.pem
│   ├── etcd-client.pem
│   ├── etcd-key.pem
│   ├── etcd.pem
│   ├── worker-key.pem
│   └── worker.pem
├── kubeconfig
├── nodepools
│   └── mypool
│       ├── cluster.yaml
│       ├── stack-template.json
│       └── userdata
│           └── cloud-config-worker
├── stack-template.json
└── userdata
    ├── cloud-config-controller
    ├── cloud-config-etcd
    └── cloud-config-worker

5 directories, 21 files

Btw, I named it baseCluster instead of mainCluster in the current node-pools branch, as can be seen in the baseClusterName: mycluster config var in node-pools/mypool/cluster.yaml.
If you have any idea of better naming, please let me know 🙏

cknowles · 2016-11-17T08:06:26Z

Relevant kops pull for autoscaler - kubernetes/kops#914

mumoshu · 2016-11-17T23:21:11Z

@c-knowles Thanks for the info!

I guess the cluster-autoscaler can be deployed like that in kube-aws too, once node pools landed and we ensure Exactly 1 AZ for each ASG rule with it.

cknowles · 2016-11-19T09:39:46Z

Yeah I think the AZ/ASG relationship seems to be the most important part for autoscaling, in its current form at least.

mumoshu · 2016-11-21T09:49:47Z

For anyone interested, I've just rebased the branch onto current master and finished squashing while redoing commits in meaningful units.

mumoshu · 2016-11-21T10:16:00Z

To put separate node labels for each node pool, I'd like kubelet to do that on startup so that we can avoid races between kubelet starts accepting pods and a node is labeled.
I believe such races can result in pods rarely get scheduled into undesired nodes.

Fortunately, it is supported as --node-labels key=value according to the doc(or key:value according to kubernetes#19498?).
Unfortunately, it is an alpha feature according to the doc.

I'm going to accept relying on this alpha feature for now but warn me if that's something anyone wants me to avoid in this stage 😃

A related reading: kubernetes/kubernetes#16090

mumoshu · 2016-11-22T15:50:50Z

just a forecast: maybe i'll merge part of my node pools branch(the refactoring part) to master once we've release v0.9.1 final.
v0.9.1 is planned to be cut after #82 is merged.

mumoshu · 2016-11-25T07:31:50Z

I'm satisfied with the refactoring but now it needs to be rebased again 😃

mumoshu · 2016-11-25T07:33:25Z

Btw I've moved several things including node labels to "Later TODOs".
For now, I'd like to concentrate on refactoring and adding a solid base to develop the node pools feature further.

mumoshu · 2016-11-25T09:10:00Z

Squashed.

mumoshu · 2016-11-28T05:03:09Z

All the refactoring commits merged into master.
I'm now running e2e while preparing the pull request for the node pools feature.

mumoshu · 2016-11-28T07:42:42Z

Final brush up is in progress before submitting the pull request.

Here's the final notes=commit message for the commit for the pr.

Notes on the initial implementation of Node Pools.

Usage

kube-aws node-pools (init|render|up) --node-pool-name mypool commands assume you run kube-aws from the same working directory where you've run kube-aws (init|render|up) hence the cluster.yaml and other assets exist.

$ mkdir mycluster
$ cd mycluster
$ kube-aws init --cluster-name mycluster ...
$ kube-aws render credentials
$ kube-aws render stack
$ kube-aws up ...
$ kube-aws node-pools init --node-pool-name mypool ...
$ kube-aws node-pools render stack --node-pool-name mypool ...
$ kube-aws node-pools up --node-pool-name mypool ...

Design

A node pool is a group of worker nodes.
Worker nodes in the same node pool have the same instance type, volume type, volume size.
A cluster is not a node pool.
A cluster may be called a "main" cluster if needed to be very explicit.
A cluster has been able to contain 1 or more etcd nodes, 1 or more controller nodes, 1 or more worker nodes.
A cluster can contain zero or more node pools from now on.
Vanilla kube-aws sub-commands work for a cluster, while kube-aws node-pools sub-commands work for a node pool.

Sources of inspiration

GKE/the gcloud command
- Both kube-aws and gcloud uses notion of <root cmd> node-pools for subcommands, instead of nodepool, nodepools, node-pool.

Specifications

1 cloudformation stack for each main cluster, 1 stack for each node pool.
We now have separate cluster.yaml, stack-template.json, user-data/cloud-init-worker for each node pool.
I tried to make cluster.yaml for node pools look familiar to us. You'll see the worker* config keys e.g. workerInstanceType, workerRootVolumeSize, workerRootVolumeType, etc. in both main cluster's and node pools' cluster.yaml, instead of seeing instanceType, rootVolumeSize, rootVolumeType, etc. without worker prefix only in node pools' cluster.yaml.
- The end result is a bit verbose but consistent, familiar looking cluster.yamls for clusters themselves and their addition - node pools.
Worker nodes in a main cluster and node pools are assigned to the same security group(the one with the logical name SecurityGroupWorker in cloudformation stacks) so that we can allow etcd nodes in the main cluster can communicate with newly added worker nodes from node pools, controller nodes in the main cluster can communicate with the worker nodes.
A cloudformation stack for a node pool works in two ways
- If you've omitted vpcId and/or routeTableId in a node pool's cluster.yaml, kube-aws node-pools up will generate a cfn stack template containing Fn::ImportValues to import required values from the main cluster's cfn stack.
- If you've specified vpcId and/or routeTableId in it, kube-aws nodepool up doesn't import values from the main stack. It just embed values as is, instead of embedding Fn::ImportValues. This is achieved via computing RouteTableRef and VPCRef in go code and referencing it from stack-template.json for node pools. (refs to go code and stack-template.json)

Implications

It won't allow us to create kube-aws node pools connected to non-kube-aws main clusters in a clean way. Though allowing it would be out of scope for this issue, I just wanted to note.

File tree

The whole file tree for a main cluster named mycluster with a node pool named mypool would look like:

$ tree mycluster/
mycluster/
├── cluster.yaml
├── credentials
│   ├── admin-key.pem
│   ├── admin.pem
│   ├── apiserver-key.pem
│   ├── apiserver.pem
│   ├── ca-key.pem
│   ├── ca.pem
│   ├── etcd-client-key.pem
│   ├── etcd-client.pem
│   ├── etcd-key.pem
│   ├── etcd.pem
│   ├── worker-key.pem
│   └── worker.pem
├── kubeconfig
├── node-pools (NEW!)
│   └── mypool
│       ├── cluster.yaml
│       ├── stack-template.json
│       └── userdata
│           └── cloud-config-worker
├── stack-template.json
└── userdata
    ├── cloud-config-controller
    ├── cloud-config-etcd
    └── cloud-config-worker

5 directories, 21 files

Notes for POC users/testers

While in the POC building phase, nodePoolName in a node pool's cluster.yaml had been clusterName, clusterName had been baseClusterName respectively.

See notes on the POC: #46 (comment)

They're now named nodePoolName and clusterName respectively to be more explicit about the clusterName is clusterName - the same variable in the main cluster's cluster.yaml, and the design which differentiates a cluster and a node pool. A cluster is not a node pool, and vice versa.

mumoshu · 2016-11-30T06:57:25Z

#106 is merged and will be included in the second RC of kube-aws v0.9.2.

I'm still wondering (how I could/if I should) explicitly label Node Pools an experimental feature.

Maybe the commands can be changed to kube-aws experimental node-pools (init|render|up) before we graduate from RCs of v0.9.2 if needed?

Hey @pieterlange @c-knowles @camilb @nicr9 @danielfm @sdouche, do you mind whether/how these commands are explicitly shown as experimental features or not to our users?
Any input is welcomed 🙇

danielfm · 2016-11-30T13:50:06Z

@mumoshu IMO adding commands to kube-aws experimental and then moving them back to kube-aws just to flag them as experimental makes the UI kinda "unstable", which is not good from the UI point of view.

Maybe we can just use confirmation prompts (like "This is an experimental feature. Confirm? [y/n]") or just execute the command and print a warning message at the end, something like that.

What do you think?

pieterlange · 2016-11-30T15:08:52Z

Agreed with @danielfm. Simply printing a warning message is sufficient, we shouldn't be changing cmdline options for this purpose.

mumoshu · 2016-12-01T00:41:46Z

@danielfm @pieterlange Thanks for your feedbacks!

I'm now convinced that we should not introduce the experimental subcommand for that.

For which alternative to be used, I'd rather like to "start" with a warning message without prompting, so that we don't prevent our users from scripting around kube-aws while achieving the original goal of notifying users for the experimental feature.
It would be a bit of effort to automatically answer y to kube-aws's prompt using e.g. expect command 😉
I'll send a pull request to add a warning message shortly.

Also/However, I'll be open to future pull requests cover both:

adding a y/n promp
-f option to automatically answer y

cknowles · 2016-12-01T01:41:21Z

+1, if someone wanted to use this feature in an existing CI pipeline that uses kube-aws they'd need a way to skip any prompts.

sdouche · 2016-12-02T22:22:07Z

Hi @mumoshu
I think imho a simple warning in the cluster configuration file is enough.

mumoshu · 2016-12-06T09:50:16Z

@sdouche Thanks for your feedback! Then I'll start with just a warning in the very beginning of cluster.yaml for now.

So explain the fact in the very beginning of cluster.yaml for node pools according to what we've discussed in kubernetes-retired#46 (comment) ref kubernetes-retired#46

sdouche · 2016-12-06T18:05:55Z

@mumoshu I'm eager to test that!

… to node pools The following experimental features are added node pools via this change: * awsEnvironment * waitSignal * nodeLabel * loadBalancer ref kubernetes-retired#46

mumoshu · 2016-12-08T00:56:40Z

@c-knowles I've just noticed that I've missed not only kube-aws nodepool validate but also kube-aws nodepool destroy 😉

Added to the TODOs list in this issue's description.

mumoshu · 2016-12-09T16:37:38Z

@sdouche FYI, v0.9.2-rc.4 is available with the experimental node pools feature!

This complements Node Pools(kubernetes-retired#46) and Spot Fleet support(kubernetes-retired#112) The former `experimental.nodeLabel` configuration key is renamed to `experimental.awsNodeLabels` to avoid collision with newly added `experimental.nodeLabels` and consistency with `experimental.awsEnvironment`.

mumoshu · 2016-12-14T07:35:59Z

It seems I've finished finished all the TODOs

mumoshu · 2016-12-19T08:05:34Z

The documentation for Node Pools is at https://github.com/coreos/kube-aws/blob/master/Documentation/kubernetes-on-aws-node-pool.md

mumoshu · 2016-12-19T08:06:46Z

Closing this issue as the initial iterations to bring the feature have finished.

This complements Node Pools(kubernetes-retired#46) and Spot Fleet support(kubernetes-retired#112) The former `experimental.nodeLabel` configuration key is renamed to `experimental.awsNodeLabels` to avoid collision with newly added `experimental.nodeLabels` and consistency with `experimental.awsEnvironment`.

…/add-featureGates-to-controllers to hcom-flavour * commit 'e15df3b5365543ad931a2ab267758e36790be709': RUN-1111 Add in ExpandPersistentVolumes as part of feature-gates

pieterlange mentioned this issue Nov 8, 2016

Allow workers to sit inside private subnets within an existing VPC #44

Closed

mumoshu added this to the v0.9.2 milestone Nov 9, 2016

mumoshu mentioned this issue Nov 10, 2016

Avoid the 51200 bytes limitation errors of CloudFormation #38

Closed

mumoshu changed the title ~~Node pools~~ Proposal: Node pools Nov 10, 2016

mumoshu mentioned this issue Nov 17, 2016

Production Quality Deployment #9

Closed

22 tasks

pieterlange mentioned this issue Nov 20, 2016

Destroy do not clean up everything #59

Closed

mumoshu mentioned this issue Nov 22, 2016

Add additional AWS labels and annotations to node #83

Merged

pieterlange mentioned this issue Nov 25, 2016

Proposal: Switch CloudFormation to YAML #96

Closed

mumoshu mentioned this issue Nov 30, 2016

Feature: Spot Fleet support for worker nodes #112

Closed

7 tasks

mumoshu added the waiting for community feedback label Dec 2, 2016

mumoshu changed the title ~~Proposal: Node pools~~ Feature: Node pools Dec 2, 2016

mumoshu mentioned this issue Dec 6, 2016

Add several experimental features already introduced to main clusters to node pools #128

Merged

2 tasks

mumoshu mentioned this issue Dec 6, 2016

Node Pool is still an experimental feature #131

Merged

mumoshu mentioned this issue Dec 10, 2016

Experimental feature: User-specified node labels for worker nodes #149

Merged

linki mentioned this issue Dec 13, 2016

Consider adopting kube-aws zalando-incubator/kubernetes-on-aws#114

Closed

mumoshu closed this as completed Dec 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Node pools #46

Feature: Node pools #46

mumoshu commented Nov 8, 2016 •

edited

Loading

mumoshu commented Nov 8, 2016

pieterlange commented Nov 8, 2016

cknowles commented Nov 8, 2016

mumoshu commented Nov 11, 2016

mumoshu commented Nov 11, 2016 •

edited

Loading

mumoshu commented Nov 16, 2016 •

edited

Loading

mumoshu commented Nov 17, 2016 •

edited

Loading

cknowles commented Nov 17, 2016

mumoshu commented Nov 17, 2016

cknowles commented Nov 19, 2016

mumoshu commented Nov 21, 2016

mumoshu commented Nov 21, 2016 •

edited

Loading

mumoshu commented Nov 22, 2016

mumoshu commented Nov 25, 2016

mumoshu commented Nov 25, 2016 •

edited

Loading

mumoshu commented Nov 25, 2016

mumoshu commented Nov 28, 2016

mumoshu commented Nov 28, 2016 •

edited

Loading

mumoshu commented Nov 30, 2016

danielfm commented Nov 30, 2016

pieterlange commented Nov 30, 2016

mumoshu commented Dec 1, 2016 •

edited

Loading

cknowles commented Dec 1, 2016

sdouche commented Dec 2, 2016

mumoshu commented Dec 6, 2016

sdouche commented Dec 6, 2016

mumoshu commented Dec 8, 2016 •

edited

Loading

mumoshu commented Dec 9, 2016

mumoshu commented Dec 14, 2016

mumoshu commented Dec 19, 2016

mumoshu commented Dec 19, 2016

Feature: Node pools #46

Feature: Node pools #46

Comments

mumoshu commented Nov 8, 2016 • edited Loading

TODOs(possibly done in separate pull requests)

Non-TODOs

mumoshu commented Nov 8, 2016

pieterlange commented Nov 8, 2016

cknowles commented Nov 8, 2016

mumoshu commented Nov 11, 2016

mumoshu commented Nov 11, 2016 • edited Loading

mumoshu commented Nov 16, 2016 • edited Loading

mumoshu commented Nov 17, 2016 • edited Loading

Usage

Designs

Implementation details

Implications

File tree

cknowles commented Nov 17, 2016

mumoshu commented Nov 17, 2016

cknowles commented Nov 19, 2016

mumoshu commented Nov 21, 2016

mumoshu commented Nov 21, 2016 • edited Loading

mumoshu commented Nov 22, 2016

mumoshu commented Nov 25, 2016

mumoshu commented Nov 25, 2016 • edited Loading

mumoshu commented Nov 25, 2016

mumoshu commented Nov 28, 2016

mumoshu commented Nov 28, 2016 • edited Loading

Usage

Design

Sources of inspiration

Specifications

Implications

File tree

Notes for POC users/testers

mumoshu commented Nov 30, 2016

danielfm commented Nov 30, 2016

pieterlange commented Nov 30, 2016

mumoshu commented Dec 1, 2016 • edited Loading

cknowles commented Dec 1, 2016

sdouche commented Dec 2, 2016

mumoshu commented Dec 6, 2016

sdouche commented Dec 6, 2016

mumoshu commented Dec 8, 2016 • edited Loading

mumoshu commented Dec 9, 2016

mumoshu commented Dec 14, 2016

mumoshu commented Dec 19, 2016

mumoshu commented Dec 19, 2016

mumoshu commented Nov 8, 2016 •

edited

Loading

mumoshu commented Nov 11, 2016 •

edited

Loading

mumoshu commented Nov 16, 2016 •

edited

Loading

mumoshu commented Nov 17, 2016 •

edited

Loading

mumoshu commented Nov 21, 2016 •

edited

Loading

mumoshu commented Nov 25, 2016 •

edited

Loading

mumoshu commented Nov 28, 2016 •

edited

Loading

mumoshu commented Dec 1, 2016 •

edited

Loading

mumoshu commented Dec 8, 2016 •

edited

Loading