Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Feature: Node pools #46

Closed
8 tasks done
mumoshu opened this issue Nov 8, 2016 · 32 comments
Closed
8 tasks done

Feature: Node pools #46

mumoshu opened this issue Nov 8, 2016 · 32 comments

Comments

@mumoshu
Copy link
Contributor

mumoshu commented Nov 8, 2016

succeed to coreos/coreos-kubernetes#667


I'd like to add GKE's "Node Pools" like feature to kube-aws.

With that feature, we can differentiate the following things per pool without maintaining multiple etcd clusters and kubernetes controller planes:

  • instance type
  • storage size
  • instance profile
  • security group
  • min/max spot price
  • auto scaling group or spot fleet
  • node labels

It will eventually give us more granular control over:

  • security
    • in combination with node selectors, we can
      1. restrict nodes which are able to communicate with databases containing sensitive information and
      2. schedule only necessary pods onto the nodes
  • cost effectiveness
    • in combination with node selectors, we can
      1. make pool(s) backed by spot instances and label nodes with e.g. spot=true
      2. explicitly schedule pods which can be spawned/killed more aggressively onto those nodes
  • scalability
  • etc
    • If you are aware of one, let us know!

Beware that until the taint-and-toleration feature is implemented into Kubernetes, we'd need to give pods specific node selectors to prevent them from being scheduled onto undesirable nodes.

Edit: Taints and tolerations are partially supported in Kubernetes 1.4.x and introduced in kube-aws since #132

TODOs(possibly done in separate pull requests)

Non-TODOs

  • Multi region deployment a.k.a deploy node pools in another region(s)
@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 8, 2016

For now, I'm going to build a POC like I've described in coreos/coreos-kubernetes#667 (comment)

Any requests, comments, proposals are welcomed!

@pieterlange
Copy link
Contributor

If we implement node pools by firing off new cloudformations we could generate separate stack-templates and userdata for each node pool.

We should still maintain a "main" stack for cross referencing resources.

@cknowles
Copy link
Contributor

cknowles commented Nov 8, 2016

Sounds great and definitely would provide a lot of flexibility. I think it's a similar concept to kops instance groups? Part of me thinks it would be good to combine efforts as kops has a ticket for coreos support too.

@mumoshu mumoshu added this to the v0.9.2 milestone Nov 9, 2016
@mumoshu mumoshu changed the title Node pools Proposal: Node pools Nov 10, 2016
@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 11, 2016

@c-knowles Thanks, I had never noticed kops already had that!
Definitely looking forward to see how it's implemented/working currently and collaborate whenever possible.

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 11, 2016

Added something about node labels, cluster-autoscaler, spot pricing to organize what I should include in the poc.

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 16, 2016

Regarding the POC, I've ended up with this:

master...mumoshu:node-pools

If you're bored of just waiting for this to land, please feel free to build the kube-aws command from the mumoshu:node-pools branch and try the new kube-aws nodepool (init|render|up) --pool-name mypool command 😃

# requires `./cluster.yaml` to exist to inherit various variables from the original/base k8s cluster(including dnsServiceIP, apiServerEndpoint, etcdEndpoints, cfn stack name for importing exported outputs like worker security group id, vpc id, route table id, etc)
kube-aws nodepool init --pool-name mypool \
    --availability-zone ${KUBE_AWS_AVAILABILITY_ZONE} \
    --key-name ${KUBE_AWS_KEY_NAME} \
    --kms-key-arn ${KUBE_AWS_KMS_KEY_ARN}

kube-aws nodepool render stack --pool-name mypool

# requires the base cluster to be up and exports required outputs(worker security group id, at least. also vpc id, route table id if omitteed in `node-pools/mypool/cluster.yaml`)
kube-aws nodepool up --pool-name mypool --s3-uri ${KUBE_AWS_S3_URI}

ref: https://github.com/mumoshu/kube-aws/blob/cb58f10fa0da61cf88eb3cbfc463957d2b37f9d0/e2e/run#L176-L186

TODOs

  • Remove code duplication, more
  • Do some refactoring
  • Squash commits and then split into more meaningful sequence of commits(possibly allowing me to send multiple, separate pull requests for more reviewer-happiness 😆 )
  • Unit tests

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 17, 2016

Several notes on current implementation:

Usage

kube-aws nodepool (init|render|up) --pool-name mypool command assumes you run kube-aws from the same working directory where you've run kube-aws (init|render|up).

$ mkdir mycluster
$ cd mycluster
$ kube-aws init --cluster-name mycluster ...
$ kube-aws render credentials
$ kube-aws render stack
$ kube-aws up ...
$ kube-aws nodepool init --pool-name mypool ...
$ kube-aws nodepool render stack --pool-name mypool ...
$ kube-aws nodepool up --pool-name mypool ...

Designs

  • We now have separate cluster.yaml, stack-template.json, user-data/cloud-init-worker for each node pool.
  • I tried to make cluster.yaml for node pools look familiar to us. You'll see the worker* config keys e.g. workerInstanceType, workerRootVolumeSize, workerRootVolumeType, etc. in both main cluster's and node pools' cluster.yaml, instead of seeing instanceType, rootVolumeSize, rootVolumeType, etc. without worker prefix only in node pools' cluster.yaml.
    • The end result is a bit verbose but consistent, familiar looking cluster.yamls for main cluster and node pools.
  • Worker nodes in a main cluster and node pools are assigned to the same security group(the one with the logical name SecurityGroupWorker in cloudformation stacks) so that we can allow etcd nodes in the main cluster can communicate with newly added worker nodes from node pools, etc.
    • CloudFormation's Export/ImportValue functions are used to achieve it btw

Implementation details

  • A cloudformation stack for a node pool works in two ways
    • If you've omitted vpcId and/or routeTableId in a node pool's cluster.yaml, kube-aws nodepool up will generate a cfn stack template containing Fn::ImportValues to import required values from the main cluster's cfn stack.
    • If you've specified vpcId and/or routeTableId in it, kube-aws nodepool up doesn't import values from the main stack. It just embed values as is, instead of embedding Fn::ImportValues. This is achieved via computing RouteTableRef and VPCRef in go code and referencing it from stack-template.json for node pools. (refs to go code and stack-template.json)

Implications

  • It won't allow us to create kube-aws node pools connected to non-kube-aws main clusters in a clean way. Though allowing it would be out of scope for this issue, I just wanted to note.

File tree

The whole file tree for a main cluster named mycluster with a node pool named mypool would look like:

$ tree mycluster/
mycluster/
├── cluster.yaml
├── credentials
│   ├── admin-key.pem
│   ├── admin.pem
│   ├── apiserver-key.pem
│   ├── apiserver.pem
│   ├── ca-key.pem
│   ├── ca.pem
│   ├── etcd-client-key.pem
│   ├── etcd-client.pem
│   ├── etcd-key.pem
│   ├── etcd.pem
│   ├── worker-key.pem
│   └── worker.pem
├── kubeconfig
├── nodepools
│   └── mypool
│       ├── cluster.yaml
│       ├── stack-template.json
│       └── userdata
│           └── cloud-config-worker
├── stack-template.json
└── userdata
    ├── cloud-config-controller
    ├── cloud-config-etcd
    └── cloud-config-worker

5 directories, 21 files

Btw, I named it baseCluster instead of mainCluster in the current node-pools branch, as can be seen in the baseClusterName: mycluster config var in node-pools/mypool/cluster.yaml.
If you have any idea of better naming, please let me know 🙏

@mumoshu mumoshu mentioned this issue Nov 17, 2016
22 tasks
@cknowles
Copy link
Contributor

Relevant kops pull for autoscaler - kubernetes/kops#914

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 17, 2016

@c-knowles Thanks for the info!

I guess the cluster-autoscaler can be deployed like that in kube-aws too, once node pools landed and we ensure Exactly 1 AZ for each ASG rule with it.

@cknowles
Copy link
Contributor

Yeah I think the AZ/ASG relationship seems to be the most important part for autoscaling, in its current form at least.

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 21, 2016

For anyone interested, I've just rebased the branch onto current master and finished squashing while redoing commits in meaningful units.

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 21, 2016

To put separate node labels for each node pool, I'd like kubelet to do that on startup so that we can avoid races between kubelet starts accepting pods and a node is labeled.
I believe such races can result in pods rarely get scheduled into undesired nodes.

Fortunately, it is supported as --node-labels key=value according to the doc(or key:value according to kubernetes#19498?).
Unfortunately, it is an alpha feature according to the doc.

I'm going to accept relying on this alpha feature for now but warn me if that's something anyone wants me to avoid in this stage 😃

A related reading: kubernetes/kubernetes#16090

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 22, 2016

just a forecast: maybe i'll merge part of my node pools branch(the refactoring part) to master once we've release v0.9.1 final.
v0.9.1 is planned to be cut after #82 is merged.

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 25, 2016

I'm satisfied with the refactoring but now it needs to be rebased again 😃

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 25, 2016

Btw I've moved several things including node labels to "Later TODOs".
For now, I'd like to concentrate on refactoring and adding a solid base to develop the node pools feature further.

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 25, 2016

Squashed.

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 28, 2016

All the refactoring commits merged into master.
I'm now running e2e while preparing the pull request for the node pools feature.

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 28, 2016

Final brush up is in progress before submitting the pull request.

Here's the final notes=commit message for the commit for the pr.


Notes on the initial implementation of Node Pools.

Usage

kube-aws node-pools (init|render|up) --node-pool-name mypool commands assume you run kube-aws from the same working directory where you've run kube-aws (init|render|up) hence the cluster.yaml and other assets exist.

$ mkdir mycluster
$ cd mycluster
$ kube-aws init --cluster-name mycluster ...
$ kube-aws render credentials
$ kube-aws render stack
$ kube-aws up ...
$ kube-aws node-pools init --node-pool-name mypool ...
$ kube-aws node-pools render stack --node-pool-name mypool ...
$ kube-aws node-pools up --node-pool-name mypool ...

Design

  • A node pool is a group of worker nodes.
  • Worker nodes in the same node pool have the same instance type, volume type, volume size.
  • A cluster is not a node pool.
  • A cluster may be called a "main" cluster if needed to be very explicit.
  • A cluster has been able to contain 1 or more etcd nodes, 1 or more controller nodes, 1 or more worker nodes.
  • A cluster can contain zero or more node pools from now on.
  • Vanilla kube-aws sub-commands work for a cluster, while kube-aws node-pools sub-commands work for a node pool.

Sources of inspiration

  • GKE/the gcloud command
    • Both kube-aws and gcloud uses notion of <root cmd> node-pools for subcommands, instead of nodepool, nodepools, node-pool.

Specifications

  • 1 cloudformation stack for each main cluster, 1 stack for each node pool.
  • We now have separate cluster.yaml, stack-template.json, user-data/cloud-init-worker for each node pool.
  • I tried to make cluster.yaml for node pools look familiar to us. You'll see the worker* config keys e.g. workerInstanceType, workerRootVolumeSize, workerRootVolumeType, etc. in both main cluster's and node pools' cluster.yaml, instead of seeing instanceType, rootVolumeSize, rootVolumeType, etc. without worker prefix only in node pools' cluster.yaml.
    • The end result is a bit verbose but consistent, familiar looking cluster.yamls for clusters themselves and their addition - node pools.
  • Worker nodes in a main cluster and node pools are assigned to the same security group(the one with the logical name SecurityGroupWorker in cloudformation stacks) so that we can allow etcd nodes in the main cluster can communicate with newly added worker nodes from node pools, controller nodes in the main cluster can communicate with the worker nodes.
  • A cloudformation stack for a node pool works in two ways
    • If you've omitted vpcId and/or routeTableId in a node pool's cluster.yaml, kube-aws node-pools up will generate a cfn stack template containing Fn::ImportValues to import required values from the main cluster's cfn stack.
    • If you've specified vpcId and/or routeTableId in it, kube-aws nodepool up doesn't import values from the main stack. It just embed values as is, instead of embedding Fn::ImportValues. This is achieved via computing RouteTableRef and VPCRef in go code and referencing it from stack-template.json for node pools. (refs to go code and stack-template.json)

Implications

  • It won't allow us to create kube-aws node pools connected to non-kube-aws main clusters in a clean way. Though allowing it would be out of scope for this issue, I just wanted to note.

File tree

The whole file tree for a main cluster named mycluster with a node pool named mypool would look like:

$ tree mycluster/
mycluster/
├── cluster.yaml
├── credentials
│   ├── admin-key.pem
│   ├── admin.pem
│   ├── apiserver-key.pem
│   ├── apiserver.pem
│   ├── ca-key.pem
│   ├── ca.pem
│   ├── etcd-client-key.pem
│   ├── etcd-client.pem
│   ├── etcd-key.pem
│   ├── etcd.pem
│   ├── worker-key.pem
│   └── worker.pem
├── kubeconfig
├── node-pools (NEW!)
│   └── mypool
│       ├── cluster.yaml
│       ├── stack-template.json
│       └── userdata
│           └── cloud-config-worker
├── stack-template.json
└── userdata
    ├── cloud-config-controller
    ├── cloud-config-etcd
    └── cloud-config-worker

5 directories, 21 files

Notes for POC users/testers

While in the POC building phase, nodePoolName in a node pool's cluster.yaml had been clusterName, clusterName had been baseClusterName respectively.

See notes on the POC: #46 (comment)

They're now named nodePoolName and clusterName respectively to be more explicit about the clusterName is clusterName - the same variable in the main cluster's cluster.yaml, and the design which differentiates a cluster and a node pool. A cluster is not a node pool, and vice versa.

@mumoshu
Copy link
Contributor Author

mumoshu commented Nov 30, 2016

#106 is merged and will be included in the second RC of kube-aws v0.9.2.

I'm still wondering (how I could/if I should) explicitly label Node Pools an experimental feature.

Maybe the commands can be changed to kube-aws experimental node-pools (init|render|up) before we graduate from RCs of v0.9.2 if needed?

Hey @pieterlange @c-knowles @camilb @nicr9 @danielfm @sdouche, do you mind whether/how these commands are explicitly shown as experimental features or not to our users?
Any input is welcomed 🙇

@danielfm
Copy link
Contributor

@mumoshu IMO adding commands to kube-aws experimental and then moving them back to kube-aws just to flag them as experimental makes the UI kinda "unstable", which is not good from the UI point of view.

Maybe we can just use confirmation prompts (like "This is an experimental feature. Confirm? [y/n]") or just execute the command and print a warning message at the end, something like that.

What do you think?

@pieterlange
Copy link
Contributor

Agreed with @danielfm. Simply printing a warning message is sufficient, we shouldn't be changing cmdline options for this purpose.

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 1, 2016

@danielfm @pieterlange Thanks for your feedbacks!

I'm now convinced that we should not introduce the experimental subcommand for that.

For which alternative to be used, I'd rather like to "start" with a warning message without prompting, so that we don't prevent our users from scripting around kube-aws while achieving the original goal of notifying users for the experimental feature.
It would be a bit of effort to automatically answer y to kube-aws's prompt using e.g. expect command 😉
I'll send a pull request to add a warning message shortly.

Also/However, I'll be open to future pull requests cover both:

  • adding a y/n promp
  • -f option to automatically answer y

@cknowles
Copy link
Contributor

cknowles commented Dec 1, 2016

+1, if someone wanted to use this feature in an existing CI pipeline that uses kube-aws they'd need a way to skip any prompts.

@mumoshu mumoshu changed the title Proposal: Node pools Feature: Node pools Dec 2, 2016
@sdouche
Copy link

sdouche commented Dec 2, 2016

Hi @mumoshu
I think imho a simple warning in the cluster configuration file is enough.

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 6, 2016

@sdouche Thanks for your feedback! Then I'll start with just a warning in the very beginning of cluster.yaml for now.

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 6, 2016
So explain the fact in the very beginning of cluster.yaml for node pools according to what we've discussed in kubernetes-retired#46 (comment)

ref kubernetes-retired#46
@sdouche
Copy link

sdouche commented Dec 6, 2016

@mumoshu I'm eager to test that!

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 7, 2016
… to node pools

The following experimental features are added node pools via this change:

* awsEnvironment
* waitSignal
* nodeLabel
* loadBalancer

ref kubernetes-retired#46
@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 8, 2016

@c-knowles I've just noticed that I've missed not only kube-aws nodepool validate but also kube-aws nodepool destroy 😉

Added to the TODOs list in this issue's description.

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 9, 2016

@sdouche FYI, v0.9.2-rc.4 is available with the experimental node pools feature!

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 14, 2016
This complements Node Pools(kubernetes-retired#46) and Spot Fleet support(kubernetes-retired#112)

The former `experimental.nodeLabel` configuration key is renamed to `experimental.awsNodeLabels` to avoid collision with newly added `experimental.nodeLabels` and consistency with `experimental.awsEnvironment`.
@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 14, 2016

It seems I've finished finished all the TODOs

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 19, 2016

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 19, 2016

Closing this issue as the initial iterations to bring the feature have finished.

@mumoshu mumoshu closed this as completed Dec 19, 2016
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018
This complements Node Pools(kubernetes-retired#46) and Spot Fleet support(kubernetes-retired#112)

The former `experimental.nodeLabel` configuration key is renamed to `experimental.awsNodeLabels` to avoid collision with newly added `experimental.nodeLabels` and consistency with `experimental.awsEnvironment`.
davidmccormick pushed a commit to HotelsDotCom/kube-aws that referenced this issue Jul 18, 2018
…/add-featureGates-to-controllers to hcom-flavour

* commit 'e15df3b5365543ad931a2ab267758e36790be709':
  RUN-1111 Add in ExpandPersistentVolumes as part of feature-gates
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants