Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Kubernetes #300

Closed
2opremio opened this issue Jan 19, 2016 · 10 comments
Closed

Upgrade Kubernetes #300

2opremio opened this issue Jan 19, 2016 · 10 comments
Labels

Comments

@2opremio
Copy link

We need to upgrade Kubelet to prevent it's high CPU consumption (see kubernetes/kubernetes#19658). Fixed in in version 1.1.4

We are also experiencing kubernetes/kubernetes#16651 (at least in dev) which should be fixed from v1.2.0-alpha.5

@2opremio
Copy link
Author

How it is done in GKE: https://cloud.google.com/container-engine/docs/clusters/upgrade

An upgrade/downgrade works by deleting all node instances, one at a time, and replacing them with new instances running the desired Kubernetes version.

There doesn't seem to be a tool to do this in AWS, sigh :S

@2opremio
Copy link
Author

This is the script used to upgrade gce: https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/upgrade.sh

@2opremio
Copy link
Author

This is getting urgent. Kubelet is consuming >70%CPU in all machines

@errordeveloper
Copy link

Let's discuss f2f and decide on the plan.

@2opremio
Copy link
Author

Sure let's talk f2f. In a nutshell, the plan is to do the following for dev and prod:

  1. Create a new k8s cluster but reusing the existing databases (I guess we will use the latest 1.1.X since Kubernetes 1.2 will only be released on March)
  2. Make frontend.{dev,prod}.weave.works CNAME record (managed in Gcloud) point to the new cluster. See https://github.com/weaveworks/service/tree/master/infra#how-is-dns-configured
  3. To move the existing apps to the new cluster, we can simply wipe out the org_hostname table of the app-mapper database to ensure new Scope apps are lazily created. See https://github.com/weaveworks/service/blob/master/app-mapper/README.md

(2) and (3) should happen atomically to avoid:

  • App-not-found errors
  • Apps sneaking back into the old cluster

We should probably have 30min maintenance window or so (@pidster could send an email about it)

Also, we should first test that the migration works as expected in the dev environment. For that we can create a user and make sure that everything works before and after the migration.

@errordeveloper
Copy link

Things we want to confirm work in new clusters and haven't been verified yet:

#301
1149a32

@errordeveloper
Copy link

We also want to make sure this modifications in kube-up.sh are still valid for the new version of the cluster: https://github.com/weaveworks/service/blob/master/infra/k8s#L60-L61

@errordeveloper
Copy link

General tasks:

  • deploy dev cluster using kube-up.sh
  • test if changes are working
  • redeploy and test migration workflow
  • deploy new prod cluster (prod-b)
  • migrate to prod-b

Current key steps in migration workflow:

  • top-level DNS record changes (CloudFlair)
    (@2opremio suggests to purge the record first, then recreate)
  • drop app mapper table

@errordeveloper
Copy link

@tomwilkie nothing needs to be done in CloudFlare for this actually, isn't it?

@2opremio
Copy link
Author

This process should be added to the future operations runbook (#266)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants