Cluster upgrades #608

colhom · 2016-08-09T23:55:28Z

Complete and working upgrade path for kube-aws clusters, minus the discrete etcd cluster instances.

As part of this, we now have external CA support for TLS asset generation, along with support for allowing user to generate all TLS assets.

Fixes #104 #161
Depends #544 #596
Follow up with #465

Unfortunately does not support upgrading clusters that have already launched. --edit-- by already launched, i mean created by kube-aws code prior to this functionality merging.

@mumoshu I'd like to get your work on node draining on shutdown integrated as well.

\cc @plange @whereisaaron @robszumski @sym3tri @bfallik

Ref #340 #230 #161

pieterlange · 2016-08-10T08:49:18Z

This is awesome! I will have to make some time to test this (along with the HA stuff).

bfallik · 2016-08-10T10:55:58Z

Looking forward to testing this!

colhom · 2016-08-10T18:54:41Z

@bfallik once #465 is rebased on this PR, you'll have pods drained off nodes before they shutdown and are destroyed.

Cluster upgrades are entirely functional here, but keep in mind that the nodes will be shutdown ungracefully, and consequently requests will be routed to the pod ips for some amount of time after the containers have disappeared.

robszumski · 2016-08-10T19:57:48Z

At a high level, I'd like to think more about these commands and flags.

Current (this PR)

The current method in this PR:

$ kube-aws render
$ git diff # view changes to rendered assets
$ kube-aws up --update

$ kube-aws render --generate-credentials
$ kube-aws up --update

This retains the two primary commands that we are used to, but makes them much more complicated. The up really doesn't do the same thing as before, where the user was taught that it brings up a complete stack. Now it just modifies the stack or creates a new stack, based on this flag.

Proposal

I propose changing these names. Here are the same scenarios:

$ kube-aws render stack
$ git diff # view changes to rendered assets
$ kube-aws update stack

$ kube-aws render credentials
$ kube-aws update credentials

Note the use of the same subcommand for each. Makes it easier to teach you the terms and pieces that are involved.

This PR does a great job of separating out the render vs update part, this retains that and makes it even more explicit.

Backwards Compatible

For backwards compatibility, we can alias (but not document) the render command from the last release:

$ kube-aws render         # v0.8.1
$ kube-aws render stack   # master

robszumski · 2016-08-10T19:59:26Z

Documentation/kube-aws-cluster-updates.md

@@ -0,0 +1,40 @@
+# kube-aws cluster updates


To fit the naming scheme, can we name this doc kubernetes-on-aws-updates.md?

colhom · 2016-08-10T20:31:50Z

@robszumski great suggestion! i'll think it over more thoroughly while implementing it, but sgtm and I'll move forward with what you have outlined. I was unsure of what to do with the command tree... thanks for figuring it out.

cgag · 2016-08-10T22:28:03Z

Documentation/kube-aws-cluster-updates.md

+## Types of cluster update
+There are two distinct categories of cluster update.
+
+* **Parameter-level update**: Only changes to `cluster.yaml` and/or TLS assets in `credentials/` folder are reflected. To enact this type of update. Modifications to CloudFormation or cloud-config userdata templates will not be reflected. In this case, you do not have to re-render:


"To enact this type of update."?

yeah, should probably add that.

colhom · 2016-08-29T23:40:55Z

@pieterlange any news on the calico problem you encountered with this PR?

This fixes the problem i ran into here: coreos#608 (comment)

colhom · 2016-08-30T00:39:11Z

@pieterlange I've cherry-picked in your commit

This fixes the problem i ran into here: coreos#608 (comment)

colhom · 2016-08-31T19:20:21Z

@robszumski check out b4d05dc

robszumski · 2016-08-31T20:38:57Z

@robszumski check out b4d05dc

Nice, lookin' good!

iwarp · 2016-09-07T03:05:26Z

How far away is this from being merged? Im keen to start using this.

Does rolling replacement update on controller ASG, followed by workers Punts on upgrading etcd cluster- simply makes sure resource definitions don't change after create.

render command now operates on stack and credentials independently add top-level update command

colhom · 2016-09-08T20:09:00Z

The last two commits are this PR. Prior two are for #596 and #544

colhom · 2016-09-08T20:09:43Z

@iwarp we're working on getting this code reviewed! Sorry for the delay

If you're really keen to start using it, it should all be functional if you pull from colhom:cluster-upgrades.

colhom · 2016-09-09T21:29:31Z

Note to self- I also need to add UpdatePolicy stanzas to to Subnets and VPC prohibiting updates to them . To update a subnet or vpc with cloudformation, the whole deployment is essentially replicated in a different availability zone. Kube-aws in general will not be able to support this in the near future.

pieterlange · 2016-09-22T09:57:17Z

I've been using the colhom:cluster-upgrades branch (plus some minor patches for an external etcd cluster) for a little over a week now and it works great. Updating the stack works as expected!

Minor note: the update-policy for the worker autoscaler might need a little increase from the default 2 minutes depending on app startup time. The kubernetes master also has a brief window where it's unavailable but everything recovers just fine

iamsaso · 2016-09-29T18:34:23Z

Any updates on this? Would love to start using it 🚀

colhom · 2016-09-29T22:31:13Z

An update for all interested parties:

We'll be merging this functionality (along with some of @mumoshu 's work regarding node draining) in an experimental branch in the near future. Work is encouraged in that direction, though the officially released code in master will not receiving this functionality. Going forward, the goal is to orchestrate these critical behaviors via the Kubernetes control plane, rather than via CloudFormation.

mumoshu · 2016-09-29T23:47:55Z

@colhom Thanks for the update!
Does it mean that you and your colleagues won't be focusing on things in experimental branch anymore?
(Btw, in the long term, I agree with the goal you've mentioned 👍 )

camilb · 2016-09-30T16:55:20Z

@colhom
Have a working version based on #608, #629 and the latest changes from the master.
At the moment I'm running the tests using :

3 Controllers in Multi-AZ with LoadBalancer
3 external ETCD nodes configured with SSL in Multi-AZ
3 Workers in Multi-AZ
CNI
Hyperkube v1.4.0_coreos.2

Tested using the latest stable and alpha OS releases.

For my current setup this works pretty good. Next I will try to put ETCD in a Auto Scaling Group with S3 daily backups.

If there is someone interested in it, I have a working branch:
https://github.com/camilb/coreos-kubernetes/tree/1.4.0-ha

iwarp · 2016-10-03T00:43:06Z

Hmmm interesting change of direction. What's the guidance for a highly available cluster that i should be using right now then? I was planning that this PR was going to be complete before going live on a new project

Do i need to create multiple k8s clusters and load balance across which is closer to the k8s federation approach.

How have others approached this?

apenney · 2016-10-04T16:10:07Z

Echoing the previous response. I put off deploying kubernetes until this PR was finished but now I find myself unsure how to proceed. I might just look at kops at this point until there's a clearer vision for coreos-kubernetes. We looked at enterprise support for coreos but this project was a blocker for us being able to proceed with that (in case it helps justify anyone spending time on laying out a clear roadmap).

iamsaso · 2016-10-04T18:11:19Z

I was pushing the date to deploy coreos kube to production and waiting for this PR to land master. We would like to have a procedure to do future updates and this seemed as a good solution. Any plans on providing some guidance on how updates will be done with future new releases?

This was referenced Aug 9, 2016

kube-aws: Drain nodes before shutting them down #465

Closed

Production Quality Deployment #340

Closed

robszumski reviewed Aug 10, 2016
View reviewed changes

cgag reviewed Aug 10, 2016
View reviewed changes

colhom mentioned this pull request Aug 11, 2016

kube-aws: add maxWorkerCount param #522

Closed

colhom pushed a commit to colhom/coreos-kubernetes that referenced this pull request Aug 30, 2016

kube-aws: fix template generation error when defining RouteTableId.

b1036a7

This fixes the problem i ran into here: coreos#608 (comment)

colhom force-pushed the cluster-upgrades branch from b1036a7 to 7b138ea Compare August 30, 2016 20:20

colhom pushed a commit to colhom/coreos-kubernetes that referenced this pull request Aug 30, 2016

kube-aws: fix template generation error when defining RouteTableId.

7b138ea

This fixes the problem i ran into here: coreos#608 (comment)

colhom mentioned this pull request Sep 1, 2016

Logstash or other centralized logging #320

Open

pieterlange mentioned this pull request Sep 8, 2016

Update Kubernetes credentials #657

Closed

colhom added 4 commits September 8, 2016 12:53

WIP discrete etcd cluster

bbe383e

(WIP) HA control plane.

f6619bf

aws: cluster upgrade support via kube-aws up --update

fe6965c

Does rolling replacement update on controller ASG, followed by workers Punts on upgrading etcd cluster- simply makes sure resource definitions don't change after create.

aws: rework cli command structure

08e5fda

render command now operates on stack and credentials independently add top-level update command

colhom force-pushed the cluster-upgrades branch from b4d05dc to 08e5fda Compare September 8, 2016 20:08

colhom mentioned this pull request Sep 9, 2016

Discrete etcd cluster #544

Closed

pieterlange mentioned this pull request Oct 11, 2016

Support multi availability zone deployments on AWS #100

Closed

colhom closed this Nov 4, 2016

pieterlange mentioned this pull request Nov 4, 2016

move kube-aws development to dedicated repository #751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster upgrades #608

Cluster upgrades #608

colhom commented Aug 9, 2016 •

edited

Loading

pieterlange commented Aug 10, 2016

bfallik commented Aug 10, 2016

colhom commented Aug 10, 2016

robszumski commented Aug 10, 2016

robszumski Aug 10, 2016

colhom commented Aug 10, 2016

cgag Aug 10, 2016

colhom Aug 11, 2016

colhom commented Aug 29, 2016

colhom commented Aug 30, 2016

colhom commented Aug 31, 2016

robszumski commented Aug 31, 2016

iwarp commented Sep 7, 2016

colhom commented Sep 8, 2016

colhom commented Sep 8, 2016

colhom commented Sep 9, 2016

pieterlange commented Sep 22, 2016

iamsaso commented Sep 29, 2016

colhom commented Sep 29, 2016

mumoshu commented Sep 29, 2016 •

edited

Loading

camilb commented Sep 30, 2016

iwarp commented Oct 3, 2016

apenney commented Oct 4, 2016

iamsaso commented Oct 4, 2016

Cluster upgrades #608

Cluster upgrades #608

Conversation

colhom commented Aug 9, 2016 • edited Loading

pieterlange commented Aug 10, 2016

bfallik commented Aug 10, 2016

colhom commented Aug 10, 2016

robszumski commented Aug 10, 2016

Current (this PR)

Proposal

Backwards Compatible

robszumski Aug 10, 2016

Choose a reason for hiding this comment

colhom commented Aug 10, 2016

cgag Aug 10, 2016

Choose a reason for hiding this comment

colhom Aug 11, 2016

Choose a reason for hiding this comment

colhom commented Aug 29, 2016

colhom commented Aug 30, 2016

colhom commented Aug 31, 2016

robszumski commented Aug 31, 2016

iwarp commented Sep 7, 2016

colhom commented Sep 8, 2016

colhom commented Sep 8, 2016

colhom commented Sep 9, 2016

pieterlange commented Sep 22, 2016

iamsaso commented Sep 29, 2016

colhom commented Sep 29, 2016

mumoshu commented Sep 29, 2016 • edited Loading

camilb commented Sep 30, 2016

iwarp commented Oct 3, 2016

apenney commented Oct 4, 2016

iamsaso commented Oct 4, 2016

colhom commented Aug 9, 2016 •

edited

Loading

mumoshu commented Sep 29, 2016 •

edited

Loading