-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define a policy for Kubernetes version support for workload clusters #4444
Comments
@fabriziopandini I think we need to break this down a bit more granular. I tend to agree with this for individual patch releases, but I think we need something a bit more flexible for major/minor releases. For example, for a given minor release, I think we should give best effort support for future versions that are released during the time that we are actively supporting a minor release (how to handle this once we actually have a major release is probably yet another discussion, since I don't think we'd want to support multiple minor releases for a given major release). I also don't think we should also provide best effort support for kubernetes versions that were supported when the minor version was released, but may no longer be fully supported by upstream policies (we shouldn't intentionally drop support for a kubernetes version that previously worked in a given minor version) |
It would also be nice to keep best effort support for older Kubernetes versions as much as possible across minor version bumps to ease the burden of management cluster upgrades. While I would love it if everyone upgraded their clusters frequently, I don't know that we've actually seen that in the real world. It might be good to try to get some more real world user feedback around this. |
With about a dozen of clusters to my name, and currently none of them at 1.21, I can chime in a bit.
There are several reasons not to upgrade right away. In the past, k8s was not very mature. so the many reasons were mostly outweighed by needing features so frequent upgrades dominated the equation. I kept all my clusters very up to date. As Kubernetes has matured, that flipped at 1.15.
The big issue that kept me on 1.15 for so long had to do with api deprecation in 1.16. Unless you are huge like google and can afford to build all the apps you deploy on k8s from scratch, you have to deal with software managed by others. Software provided by the community via things like helm charts, has incentive to be usable by the widest possible audience. This resulted in them favoring using the old deprecated apis over the new apis as it could reach a wider audience. When 1.16 came out, all that code had to be reworked. That takes time. As a provider of Kubernetes service to others, I maximize what my users can do if those apis and existing software continues to work. So there is a strong incentive to stay at the older version until the existing software has caught up. The software doesn't have a strong incentive to use the new api's until enough users are getting off of the older versions. Once the software has caught up, then upgrading again becomes more reasonable. k8s has done very well with this issue in general, and should be commended for trying so hard. but its just hard to fully escape from.
The next big event will be 1.22 deprecating a lot of api's. So likely some of us will get stuck on 1.21 for as long as we can to let the community software catch up again.
Some of the reasons to not upgrade:
1. It takes effort to upgrade
2. It can disrupt users to upgrade
3. It can break existing software
4. It takes effort to upgrade software so that the cluster can be upgraded
Some of this tackled by:
1. Cluster-api provides a lot of help lowering the bar dramatically. I'm really excited to use it eventually. It could be helped more by progressing the api's to v1.
2 is just hard. most software needs to be rewritten to be able to be completely seamless on upgrades. Or the expectation must be set that it will break on upgrades as pods are rolled around and will eventually get fixed. Maybe servicemeshes with first class support from k8s could help with this.
3. is kind of what I was talking about above. k8s has tried hard to not break the api often. that really helps. Also they have had flags to re-enable deprecated apis for a time. That also helps, but sysadmins are reluctant to use them as they feel less supported. Staying at an older version feels safer.
pluto seems to be a nice tool to help detect some of the issues that may arise. Not doing api removals very often helps.
4. can really limit upgrades too. We for example saw a mismatch in calico vs kube-proxy that caused a cluster wide outage on upgrade. It wasn't calico's fault, nor k8s's fault, but the particular combination and node config caused an issue. And its likely for us to see it again when they got rid of the workaround and put in the proper fix. This may effect other things such as k8s moving to endpoint slices while metallb not supporting them yet.
Sorry I don't have some easy answers on how to get sysadmins to upgrade sooner. But knowing some of the reasons why, might help come up with new solutions to help.
Thanks,
Kevin
…________________________________________
From: Jason DeTiberus ***@***.***>
Sent: Monday, April 12, 2021 12:59 PM
To: kubernetes-sigs/cluster-api
Cc: Subscribed
Subject: Re: [kubernetes-sigs/cluster-api] Define a policy for Kubernetes version support for workload clusters (#4444)
Check twice before you click! This email originated from outside PNNL.
It would also be nice to keep best effort support for older Kubernetes versions as much as possible across minor version bumps to ease the burden of management cluster upgrades. While I would love it if everyone upgraded their clusters frequently, I don't know that we've actually seen that in the real world. It might be good to try to get some more real world user feedback around this.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.luolix.top%2Fkubernetes-sigs%2Fcluster-api%2Fissues%2F4444%23issuecomment-818116626&data=04%7C01%7CKevin.Fox%40pnnl.gov%7C0add29c93c2545d02f5308d8fded816c%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637538543814762502%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=R0YDWrSkoXONN9acORY%2FwZCJ2wGWcOTdQSpQTleoPMc%3D&reserved=0>, or unsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.luolix.top%2Fnotifications%2Funsubscribe-auth%2FAALRNQUDNNRFJKPSFGIYBBTTINGJJANCNFSM42RBR4TQ&data=04%7C01%7CKevin.Fox%40pnnl.gov%7C0add29c93c2545d02f5308d8fded816c%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637538543814762502%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tAWGJ8ri5d7Arh21gNLziRbi5cs34afhVCgzobE%2BcZM%3D&reserved=0>.
|
can you elaborate on what APIs got deprecated in 1.16 and what are planned for deprecation in 1.22 that you see as upgrade blockers / delayers? |
@neolit123 I think:
But I think this highly depends on the environment / company / culture etc. For example:
I think in CAPI it has to be good enough to support a certain amount of old / new Kubernetes versions (like last 3 Kubernetes versions at the time of CAPI release + new ones (in part on a best-effort basis) but enough so there's a good way to upgrade to the next CAPI version). Some of them are more or less guaranteed because we test them via Prow, others are simply on a best-effort basis. |
@kfox1111 @sbueringer thanks for context. I understand very well users concerns, and I agree on your points. In other words, nothing prevent to extend the skew we are considering, but IMO we should extend the skew only when the initial one is well covered and we are confident that this is sustainable for the CAPI community (e.g we are keeping up with the test signal and fixing test errors/flakes timely). |
Its the Removals that are at the most issue. Pluto is keeping track of them.
|
and one must use |
My two cent on this
One concrete example. Today we discovered a problem on Kubernetes v1.20, and this was possible due to the hard work we are doing on E2E test; without such E2E test being monitored, triaged and mantained any declaration about CAPI supporting a wide range of version isn't a real guarantee for the users. |
i'd agree that delayed upgrades due to core API removals is an upstream k8s problem and not a CAPI problem. |
For helm charts, its typically helm upgrading with a new enough version of the chart that has the new api's used. |
Absolutely agree with @fabriziopandini. While there are some changes in upstream Kubernetes which require work on the user-side, the deprecation/removal periods are pretty long (especially with probably 3 Kubernetes releases per year now, these add up to 1-2 years). From a CAPI-side we have to also take into account what's sustainable for the CAPI community. It's non-trivial for a single CAPI version to support a wide range of Kubernetes releases (this also includes breaking changes in components like kubeadm). Each additional Kubernetes release we guarantee/ensure support for adds a maintenance burden to CAPI, both in terms of testing and implementation. An important factor is definitely that we have to support enough Kubernetes versions to ensure enough time for CAPI upgrades (e.g. like support for 1.21/1.22 with CAPI 0.3.x). I think we should be open to support a wider range of versions on a best-effort basis, but we really have to get it right how many versions we guarantee support for (in practice, by testing them via periodic jobs and ensuring those tests are always green). If there is a lot of demand for support of older versions, I think it's fair to accept contributions from the wider community (but as I wrote above, on a best-effort basis). When we support the last 3 Kubernetes versions at the time of a CAPI release during the life time of a minor release, I think our support of upstream Kubernetes versions is by far longer then the upstream support of those versions. Example:
|
Folks, what's the status of this discussion? Could we follow up with an update to the book stating our support range a bit more? |
/milestone Next |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lifecycle frozen |
/assign @fabriziopandini |
User Story
As a user I would like to have a clear guarantees about Kubernetes version support for workload clusters for each CAPI release series/release.
Detailed Description
This issues is a follow up of #4423
Draft for discussion:
/kind feature
@CecileRobertMichon @detiber @vincepri opinions?
The text was updated successfully, but these errors were encountered: