Upgrade Kubernetes #300

2opremio · 2016-01-19T12:08:16Z

We need to upgrade Kubelet to prevent it's high CPU consumption (see kubernetes/kubernetes#19658). Fixed in in version 1.1.4

We are also experiencing kubernetes/kubernetes#16651 (at least in dev) which should be fixed from v1.2.0-alpha.5

2opremio · 2016-01-19T15:49:19Z

How it is done in GKE: https://cloud.google.com/container-engine/docs/clusters/upgrade

An upgrade/downgrade works by deleting all node instances, one at a time, and replacing them with new instances running the desired Kubernetes version.

There doesn't seem to be a tool to do this in AWS, sigh :S

2opremio · 2016-01-19T15:51:08Z

This is the script used to upgrade gce: https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/upgrade.sh

2opremio · 2016-02-12T01:55:50Z

This is getting urgent. Kubelet is consuming >70%CPU in all machines

errordeveloper · 2016-02-16T13:45:19Z

Let's discuss f2f and decide on the plan.

2opremio · 2016-02-16T18:21:12Z

Sure let's talk f2f. In a nutshell, the plan is to do the following for dev and prod:

Create a new k8s cluster but reusing the existing databases (I guess we will use the latest 1.1.X since Kubernetes 1.2 will only be released on March)
Make frontend.{dev,prod}.weave.works CNAME record (managed in Gcloud) point to the new cluster. See https://github.com/weaveworks/service/tree/master/infra#how-is-dns-configured
To move the existing apps to the new cluster, we can simply wipe out the org_hostname table of the app-mapper database to ensure new Scope apps are lazily created. See https://github.com/weaveworks/service/blob/master/app-mapper/README.md

(2) and (3) should happen atomically to avoid:

App-not-found errors
Apps sneaking back into the old cluster

We should probably have 30min maintenance window or so (@pidster could send an email about it)

Also, we should first test that the migration works as expected in the dev environment. For that we can create a user and make sure that everything works before and after the migration.

errordeveloper · 2016-02-17T13:52:23Z

Things we want to confirm work in new clusters and haven't been verified yet:

#301
1149a32

errordeveloper · 2016-02-17T13:54:44Z

We also want to make sure this modifications in kube-up.sh are still valid for the new version of the cluster: https://github.com/weaveworks/service/blob/master/infra/k8s#L60-L61

errordeveloper · 2016-02-18T10:38:56Z

General tasks:

deploy dev cluster using kube-up.sh
test if changes are working
redeploy and test migration workflow
deploy new prod cluster (prod-b)
migrate to prod-b

Current key steps in migration workflow:

top-level DNS record changes (CloudFlair)
(@2opremio suggests to purge the record first, then recreate)
drop app mapper table

errordeveloper · 2016-02-24T10:49:59Z

@tomwilkie nothing needs to be done in CloudFlare for this actually, isn't it?

2opremio · 2016-02-29T14:51:00Z

This process should be added to the future operations runbook (#266)

2opremio mentioned this issue Jan 19, 2016

Kubelet consumes 50% of CPU #289

Closed

2opremio added the ops label Feb 3, 2016

This was referenced Feb 3, 2016

Incident: Kubernetes couldn't schedule pods #287

Closed

Increase log history for instances and containers #261

Closed

2opremio assigned errordeveloper Feb 12, 2016

errordeveloper mentioned this issue Feb 19, 2016

Upgrade Docker #299

Closed

errordeveloper closed this as completed in 1824086 Feb 24, 2016

2opremio mentioned this issue Feb 24, 2016

Indicent: Inconsistencies in app_mapper database when upgrading kubernetes #308

Closed

weaveworks-admin-bot unassigned errordeveloper Feb 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Kubernetes #300

Upgrade Kubernetes #300

2opremio commented Jan 19, 2016

2opremio commented Jan 19, 2016

2opremio commented Jan 19, 2016

2opremio commented Feb 12, 2016

errordeveloper commented Feb 16, 2016

2opremio commented Feb 16, 2016

errordeveloper commented Feb 17, 2016

errordeveloper commented Feb 17, 2016

errordeveloper commented Feb 18, 2016

errordeveloper commented Feb 24, 2016

2opremio commented Feb 29, 2016

Upgrade Kubernetes #300

Upgrade Kubernetes #300

Comments

2opremio commented Jan 19, 2016

2opremio commented Jan 19, 2016

2opremio commented Jan 19, 2016

2opremio commented Feb 12, 2016

errordeveloper commented Feb 16, 2016

2opremio commented Feb 16, 2016

errordeveloper commented Feb 17, 2016

errordeveloper commented Feb 17, 2016

errordeveloper commented Feb 18, 2016

errordeveloper commented Feb 24, 2016

2opremio commented Feb 29, 2016