Define a policy for Kubernetes version support for workload clusters #4444

fabriziopandini · 2021-04-07T15:46:38Z

User Story

As a user I would like to have a clear guarantees about Kubernetes version support for workload clusters for each CAPI release series/release.

Detailed Description

This issues is a follow up of #4423

Draft for discussion:

Full support for the latest three stable Kubernetes releases at the time of the CAPI release cut:
- e.g. if you want to deploy Kubernetes clusters v1.20, v1.19, v1.18 using CAPI v0.3.15 - released March 30th 2021, it should work
  - Safeguard: CI signal (starting from CAPI v0.4/v1alpha4)
Best effort support for the Kubernetes release being worked at the time of the CAPI release cut:
- e.g. if you want to deploy Kubernetes clusters v1.21 using CAPI v0.3.15 - released March 30th 2021, it should work, but it could happen you will be required to upgrade to more recent versions of CAPI
  - Safeguard: CI signal (starting from CAPI v0.4/v1alpha4), but things can still change in between the CAPI release date and the Kubernetes release date
No support for the Kubernetes releases not yet existing at the time of the CAPI release cut:
- e.g. if you want to deploy Kubernetes clusters >=v1.22 using CAPIv0.3.15, it might work, but most probably you will be required to upgrade to more recent versions of CAPI
  - No safeguard (We can’t foresee the future)

/kind feature

@CecileRobertMichon @detiber @vincepri opinions?

detiber · 2021-04-12T19:56:13Z

@fabriziopandini I think we need to break this down a bit more granular. I tend to agree with this for individual patch releases, but I think we need something a bit more flexible for major/minor releases.

For example, for a given minor release, I think we should give best effort support for future versions that are released during the time that we are actively supporting a minor release (how to handle this once we actually have a major release is probably yet another discussion, since I don't think we'd want to support multiple minor releases for a given major release). I also don't think we should also provide best effort support for kubernetes versions that were supported when the minor version was released, but may no longer be fully supported by upstream policies (we shouldn't intentionally drop support for a kubernetes version that previously worked in a given minor version)

detiber · 2021-04-12T19:59:01Z

It would also be nice to keep best effort support for older Kubernetes versions as much as possible across minor version bumps to ease the burden of management cluster upgrades. While I would love it if everyone upgraded their clusters frequently, I don't know that we've actually seen that in the real world. It might be good to try to get some more real world user feedback around this.

kfox1111 · 2021-04-12T20:52:07Z

With about a dozen of clusters to my name, and currently none of them at 1.21, I can chime in a bit. There are several reasons not to upgrade right away. In the past, k8s was not very mature. so the many reasons were mostly outweighed by needing features so frequent upgrades dominated the equation. I kept all my clusters very up to date. As Kubernetes has matured, that flipped at 1.15. The big issue that kept me on 1.15 for so long had to do with api deprecation in 1.16. Unless you are huge like google and can afford to build all the apps you deploy on k8s from scratch, you have to deal with software managed by others. Software provided by the community via things like helm charts, has incentive to be usable by the widest possible audience. This resulted in them favoring using the old deprecated apis over the new apis as it could reach a wider audience. When 1.16 came out, all that code had to be reworked. That takes time. As a provider of Kubernetes service to others, I maximize what my users can do if those apis and existing software continues to work. So there is a strong incentive to stay at the older version until the existing software has caught up. The software doesn't have a strong incentive to use the new api's until enough users are getting off of the older versions. Once the software has caught up, then upgrading again becomes more reasonable. k8s has done very well with this issue in general, and should be commended for trying so hard. but its just hard to fully escape from. The next big event will be 1.22 deprecating a lot of api's. So likely some of us will get stuck on 1.21 for as long as we can to let the community software catch up again. Some of the reasons to not upgrade: 1. It takes effort to upgrade 2. It can disrupt users to upgrade 3. It can break existing software 4. It takes effort to upgrade software so that the cluster can be upgraded Some of this tackled by: 1. Cluster-api provides a lot of help lowering the bar dramatically. I'm really excited to use it eventually. It could be helped more by progressing the api's to v1. 2 is just hard. most software needs to be rewritten to be able to be completely seamless on upgrades. Or the expectation must be set that it will break on upgrades as pods are rolled around and will eventually get fixed. Maybe servicemeshes with first class support from k8s could help with this. 3. is kind of what I was talking about above. k8s has tried hard to not break the api often. that really helps. Also they have had flags to re-enable deprecated apis for a time. That also helps, but sysadmins are reluctant to use them as they feel less supported. Staying at an older version feels safer. pluto seems to be a nice tool to help detect some of the issues that may arise. Not doing api removals very often helps. 4. can really limit upgrades too. We for example saw a mismatch in calico vs kube-proxy that caused a cluster wide outage on upgrade. It wasn't calico's fault, nor k8s's fault, but the particular combination and node config caused an issue. And its likely for us to see it again when they got rid of the workaround and put in the proper fix. This may effect other things such as k8s moving to endpoint slices while metallb not supporting them yet. Sorry I don't have some easy answers on how to get sysadmins to upgrade sooner. But knowing some of the reasons why, might help come up with new solutions to help. Thanks, Kevin

…

________________________________________ From: Jason DeTiberus ***@***.***> Sent: Monday, April 12, 2021 12:59 PM To: kubernetes-sigs/cluster-api Cc: Subscribed Subject: Re: [kubernetes-sigs/cluster-api] Define a policy for Kubernetes version support for workload clusters (#4444) Check twice before you click! This email originated from outside PNNL. It would also be nice to keep best effort support for older Kubernetes versions as much as possible across minor version bumps to ease the burden of management cluster upgrades. While I would love it if everyone upgraded their clusters frequently, I don't know that we've actually seen that in the real world. It might be good to try to get some more real world user feedback around this. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.luolix.top%2Fkubernetes-sigs%2Fcluster-api%2Fissues%2F4444%23issuecomment-818116626&data=04%7C01%7CKevin.Fox%40pnnl.gov%7C0add29c93c2545d02f5308d8fded816c%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637538543814762502%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=R0YDWrSkoXONN9acORY%2FwZCJ2wGWcOTdQSpQTleoPMc%3D&reserved=0>, or unsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.luolix.top%2Fnotifications%2Funsubscribe-auth%2FAALRNQUDNNRFJKPSFGIYBBTTINGJJANCNFSM42RBR4TQ&data=04%7C01%7CKevin.Fox%40pnnl.gov%7C0add29c93c2545d02f5308d8fded816c%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637538543814762502%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tAWGJ8ri5d7Arh21gNLziRbi5cs34afhVCgzobE%2BcZM%3D&reserved=0>.

neolit123 · 2021-04-12T21:49:04Z

@kfox1111

can you elaborate on what APIs got deprecated in 1.16 and what are planned for deprecation in 1.22 that you see as upgrade blockers / delayers?

sbueringer · 2021-04-12T22:05:03Z

@neolit123 I think:

1.16: https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/
Same article contains some for 1.22

But I think this highly depends on the environment / company / culture etc. For example:

we have about 500-600 clusters internally
we keep them all on the same version (we rollout 1.20 right now and are usually 1 release behind Kubernetes)
we have no problems with users of our clusters. Usually, we announce the Kubernetes deprecations 2-3 release (6-9 months before). We then expect our internal users to upgrade in time, otherwise it's simply their problem

I think in CAPI it has to be good enough to support a certain amount of old / new Kubernetes versions (like last 3 Kubernetes versions at the time of CAPI release + new ones (in part on a best-effort basis) but enough so there's a good way to upgrade to the next CAPI version). Some of them are more or less guaranteed because we test them via Prow, others are simply on a best-effort basis.

fabriziopandini · 2021-04-13T14:21:16Z

@kfox1111 @sbueringer thanks for context.
@detiber I hear you asking for a policy that applies at minor release version (and possibly cross major versions).

I understand very well users concerns, and I agree on your points.
However from the other side I'm concerned to declare something that we can effectively ensure as a project, and this imply proper E2E test coverage + the capability to deal with the breaking changes that happens in kubeadm and Kubernetes, and maintain such implementations over time.

In other words, nothing prevent to extend the skew we are considering, but IMO we should extend the skew only when the initial one is well covered and we are confident that this is sustainable for the CAPI community (e.g we are keeping up with the test signal and fixing test errors/flakes timely).

kfox1111 · 2021-04-14T18:41:49Z

@kfox1111

can you elaborate on what APIs got deprecated in 1.16 and what are planned for deprecation in 1.22 that you see as upgrade blockers / delayers?

Its the Removals that are at the most issue. Pluto is keeping track of them.

[kfox@zathras ~]$ pluto list-versions
KIND                           NAME                                   DEPRECATED IN   REMOVED IN   REPLACEMENT                       COMPONENT     
Deployment                     extensions/v1beta1                     v1.9.0          v1.16.0      apps/v1                           k8s           
Deployment                     apps/v1beta2                           v1.9.0          v1.16.0      apps/v1                           k8s           
Deployment                     apps/v1beta1                           v1.9.0          v1.16.0      apps/v1                           k8s           
StatefulSet                    apps/v1beta1                           v1.9.0          v1.16.0      apps/v1                           k8s           
StatefulSet                    apps/v1beta2                           v1.9.0          v1.16.0      apps/v1                           k8s           
NetworkPolicy                  extensions/v1beta1                     v1.9.0          v1.16.0      networking.k8s.io/v1              k8s           
Ingress                        extensions/v1beta1                     v1.14.0         v1.22.0      networking.k8s.io/v1              k8s           
Ingress                        networking.k8s.io/v1beta1              v1.19.0         v1.22.0      networking.k8s.io/v1              k8s           
DaemonSet                      apps/v1beta2                           v1.9.0          v1.16.0      apps/v1                           k8s           
DaemonSet                      extensions/v1beta1                     v1.9.0          v1.16.0      apps/v1                           k8s           
PodSecurityPolicy              extensions/v1beta1                     v1.10.0         v1.16.0      policy/v1beta1                    k8s           
ReplicaSet                     extensions/v1beta1                     n/a             v1.16.0      apps/v1                           k8s           
ReplicaSet                     apps/v1beta1                           n/a             v1.16.0      apps/v1                           k8s           
ReplicaSet                     apps/v1beta2                           n/a             v1.16.0      apps/v1                           k8s           
PriorityClass                  scheduling.k8s.io/v1beta1              v1.14.0         v1.17.0      scheduling.k8s.io/v1              k8s           
PriorityClass                  scheduling.k8s.io/v1alpha1             v1.14.0         v1.17.0      scheduling.k8s.io/v1              k8s           
CustomResourceDefinition       apiextensions.k8s.io/v1beta1           v1.16.0         v1.22.0      apiextensions.k8s.io/v1           k8s           
MutatingWebhookConfiguration   admissionregistration.k8s.io/v1beta1   v1.16.0         v1.22.0      admissionregistration.k8s.io/v1   k8s           
ClusterRoleBinding             rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRole                    rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRoleBindingList         rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRoleList                rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
Role                           rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleBinding                    rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleList                       rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleBindingList                rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRoleBinding             rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRole                    rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRoleBindingList         rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRoleList                rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
Role                           rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleBinding                    rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleList                       rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleBindingList                rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
PodDisruptionBudget            policy/v1beta1                         v1.22.0         n/a          n/a                               k8s           
PodDisruptionBudgetList        policy/v1beta1                         v1.22.0         n/a          n/a                               k8s           
HorizontalPodAutoscaler        autoscaling/v2beta1                    v1.22.0         n/a          autoscaling/v1                    k8s           
HorizontalPodAutoscalerList    autoscaling/v2beta1                    v1.22.0         n/a          autoscaling/v1                    k8s           
HorizontalPodAutoscaler        autoscaling/v2beta2                    v1.22.0         n/a          autoscaling/v1                    k8s           
HorizontalPodAutoscalerList    autoscaling/v2beta2                    v1.22.0         n/a          autoscaling/v1                    k8s           
CronJob                        batch/v1beta1                          v1.22.0         n/a          n/a                               k8s           
CronJobList                    batch/v1beta1                          v1.22.0         n/a          n/a                               k8s           
CSINode                        storage.k8s.io/v1beta1                 v1.17.0         n/a          n/a                               k8s           
AuthorizationPolicies          rbac.istio.io                          v1.4.0          v1.4.0       security.istio.io/v1beta1         istio         
                               authentication.istio.io/v1alpha1       v1.5.0          v1.6.0       security.istio.io/v1beta1         istio         
                               networking.istio.io/v1alpha3           v1.5.0          n/a          networking.istio.io/v1beta1       istio         
Certificate                    certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  
Issuer                         certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  
ClusterIssuer                  certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  
CertificateRequest             certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  
Order                          certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  
Challenge                      certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager

neolit123 · 2021-04-14T19:05:49Z

and one must use kubectl convert to migrate manifests on disk?

fabriziopandini · 2021-04-14T19:06:44Z

My two cent on this

sig-release: Add release cadence KEP kubernetes/enhancements#2567 could be a better place discussion about impacts of Kubernetes changes.
IMO CAPI can't be the definitive solution for avoiding the side effects caused by those system changing; sooner or later each consumer should adapt like we are doing.
If possible and everyone agree, I would like to keep the discussion on this issue focused on what is a reasonable and sustainable skew that we can ensure as a project (and forgive me repeating myself, I would stress sustainable and ensure here).

One concrete example. Today we discovered a problem on Kubernetes v1.20, and this was possible due to the hard work we are doing on E2E test; without such E2E test being monitored, triaged and mantained any declaration about CAPI supporting a wide range of version isn't a real guarantee for the users.

neolit123 · 2021-04-14T19:17:49Z

i'd agree that delayed upgrades due to core API removals is an upstream k8s problem and not a CAPI problem.

kfox1111 · 2021-04-14T19:20:48Z

and one must use kubectl convert to migrate manifests on disk?

For helm charts, its typically helm upgrading with a new enough version of the chart that has the new api's used.

sbueringer · 2021-04-14T19:24:00Z

Absolutely agree with @fabriziopandini. While there are some changes in upstream Kubernetes which require work on the user-side, the deprecation/removal periods are pretty long (especially with probably 3 Kubernetes releases per year now, these add up to 1-2 years).

From a CAPI-side we have to also take into account what's sustainable for the CAPI community. It's non-trivial for a single CAPI version to support a wide range of Kubernetes releases (this also includes breaking changes in components like kubeadm). Each additional Kubernetes release we guarantee/ensure support for adds a maintenance burden to CAPI, both in terms of testing and implementation. An important factor is definitely that we have to support enough Kubernetes versions to ensure enough time for CAPI upgrades (e.g. like support for 1.21/1.22 with CAPI 0.3.x).

I think we should be open to support a wider range of versions on a best-effort basis, but we really have to get it right how many versions we guarantee support for (in practice, by testing them via periodic jobs and ensuring those tests are always green). If there is a lot of demand for support of older versions, I think it's fair to accept contributions from the wider community (but as I wrote above, on a best-effort basis).

When we support the last 3 Kubernetes versions at the time of a CAPI release during the life time of a minor release, I think our support of upstream Kubernetes versions is by far longer then the upstream support of those versions. Example:

k/k release cadence 4 months (3 per year)
CAPI support period of a minor release 6 months (just a wild guess, currently it's longer afaik)
I guess this would add up to about 5-6+ supported Kubernetes versions by an individual CAPI version over time (assuming we will keep supporting new released Kubernetes versions during the lifetime of a CAPI minor release)

vincepri · 2021-05-25T14:39:43Z

Folks, what's the status of this discussion? Could we follow up with an update to the book stating our support range a bit more?

vincepri · 2021-05-25T14:39:58Z

/milestone Next
/kind documentation
/priority important-soon

k8s-triage-robot · 2021-08-23T14:45:43Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

fabriziopandini · 2021-08-24T11:51:52Z

/lifecycle frozen
this topic is relevant for project graduation

sbueringer · 2022-02-11T18:34:20Z

/assign @fabriziopandini
to re-assess

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 7, 2021

fabriziopandini mentioned this issue Apr 7, 2021

Implement support for kubeadm v1beta3 types #4445

Closed

CecileRobertMichon mentioned this issue Apr 8, 2021

Don't useExperimentalRetryJoin kubernetes-sigs/cluster-api-provider-azure#1029

Merged

3 tasks

enxebre mentioned this issue Apr 13, 2021

Allow management of Kubelet configuration in CAPI #4464

Closed

Arvinderpal mentioned this issue Apr 22, 2021

Upgrade CAPI to v1alpha4 & CAPM3 to v1alpha5 for Bare Metal airshipit/airshipctl#518

Closed

enxebre mentioned this issue Apr 22, 2021

Allow management of KubeProxy configuration in CAPI #4512

Closed

k8s-ci-robot added this to the Next milestone May 25, 2021

k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels May 25, 2021

eak13 mentioned this issue Jun 1, 2021

Upgrade capm3, bmo and ironic version to v0.4.2 airshipit/airshipctl#554

Closed

sbueringer mentioned this issue Jul 23, 2021

📖 update supported Kubernetes versions #5012

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2021

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 24, 2021

k8s-ci-robot assigned fabriziopandini Feb 11, 2022

fabriziopandini mentioned this issue Feb 14, 2022

📖 Improve version support documentation #6122

Merged

k8s-ci-robot closed this as completed in #6122 Feb 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define a policy for Kubernetes version support for workload clusters #4444

Define a policy for Kubernetes version support for workload clusters #4444

fabriziopandini commented Apr 7, 2021

detiber commented Apr 12, 2021

detiber commented Apr 12, 2021

kfox1111 commented Apr 12, 2021 via email

neolit123 commented Apr 12, 2021 •

edited

Loading

sbueringer commented Apr 12, 2021 •

edited

Loading

fabriziopandini commented Apr 13, 2021 •

edited

Loading

kfox1111 commented Apr 14, 2021

neolit123 commented Apr 14, 2021 •

edited

Loading

fabriziopandini commented Apr 14, 2021

neolit123 commented Apr 14, 2021

kfox1111 commented Apr 14, 2021

sbueringer commented Apr 14, 2021 •

edited

Loading

vincepri commented May 25, 2021

vincepri commented May 25, 2021

k8s-triage-robot commented Aug 23, 2021

fabriziopandini commented Aug 24, 2021

sbueringer commented Feb 11, 2022

Define a policy for Kubernetes version support for workload clusters #4444

Define a policy for Kubernetes version support for workload clusters #4444

Comments

fabriziopandini commented Apr 7, 2021

detiber commented Apr 12, 2021

detiber commented Apr 12, 2021

kfox1111 commented Apr 12, 2021 via email

neolit123 commented Apr 12, 2021 • edited Loading

sbueringer commented Apr 12, 2021 • edited Loading

fabriziopandini commented Apr 13, 2021 • edited Loading

kfox1111 commented Apr 14, 2021

neolit123 commented Apr 14, 2021 • edited Loading

fabriziopandini commented Apr 14, 2021

neolit123 commented Apr 14, 2021

kfox1111 commented Apr 14, 2021

sbueringer commented Apr 14, 2021 • edited Loading

vincepri commented May 25, 2021

vincepri commented May 25, 2021

k8s-triage-robot commented Aug 23, 2021

fabriziopandini commented Aug 24, 2021

sbueringer commented Feb 11, 2022

neolit123 commented Apr 12, 2021 •

edited

Loading

sbueringer commented Apr 12, 2021 •

edited

Loading

fabriziopandini commented Apr 13, 2021 •

edited

Loading

neolit123 commented Apr 14, 2021 •

edited

Loading

sbueringer commented Apr 14, 2021 •

edited

Loading