Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a policy for Kubernetes version support for workload clusters #4444

Closed
fabriziopandini opened this issue Apr 7, 2021 · 17 comments · Fixed by #6122
Closed

Define a policy for Kubernetes version support for workload clusters #4444

fabriziopandini opened this issue Apr 7, 2021 · 17 comments · Fixed by #6122
Assignees
Labels
kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@fabriziopandini
Copy link
Member

User Story

As a user I would like to have a clear guarantees about Kubernetes version support for workload clusters for each CAPI release series/release.

Detailed Description

This issues is a follow up of #4423

Draft for discussion:

  1. Full support for the latest three stable Kubernetes releases at the time of the CAPI release cut:
    • e.g. if you want to deploy Kubernetes clusters v1.20, v1.19, v1.18 using CAPI v0.3.15 - released March 30th 2021, it should work
      • Safeguard: CI signal (starting from CAPI v0.4/v1alpha4)
  2. Best effort support for the Kubernetes release being worked at the time of the CAPI release cut:
    • e.g. if you want to deploy Kubernetes clusters v1.21 using CAPI v0.3.15 - released March 30th 2021, it should work, but it could happen you will be required to upgrade to more recent versions of CAPI
      • Safeguard: CI signal (starting from CAPI v0.4/v1alpha4), but things can still change in between the CAPI release date and the Kubernetes release date
  3. No support for the Kubernetes releases not yet existing at the time of the CAPI release cut:
    • e.g. if you want to deploy Kubernetes clusters >=v1.22 using CAPIv0.3.15, it might work, but most probably you will be required to upgrade to more recent versions of CAPI
      • No safeguard (We can’t foresee the future)

/kind feature

@CecileRobertMichon @detiber @vincepri opinions?

@detiber
Copy link
Member

detiber commented Apr 12, 2021

@fabriziopandini I think we need to break this down a bit more granular. I tend to agree with this for individual patch releases, but I think we need something a bit more flexible for major/minor releases.

For example, for a given minor release, I think we should give best effort support for future versions that are released during the time that we are actively supporting a minor release (how to handle this once we actually have a major release is probably yet another discussion, since I don't think we'd want to support multiple minor releases for a given major release). I also don't think we should also provide best effort support for kubernetes versions that were supported when the minor version was released, but may no longer be fully supported by upstream policies (we shouldn't intentionally drop support for a kubernetes version that previously worked in a given minor version)

@detiber
Copy link
Member

detiber commented Apr 12, 2021

It would also be nice to keep best effort support for older Kubernetes versions as much as possible across minor version bumps to ease the burden of management cluster upgrades. While I would love it if everyone upgraded their clusters frequently, I don't know that we've actually seen that in the real world. It might be good to try to get some more real world user feedback around this.

@kfox1111
Copy link

kfox1111 commented Apr 12, 2021 via email

@neolit123
Copy link
Member

neolit123 commented Apr 12, 2021

@kfox1111

can you elaborate on what APIs got deprecated in 1.16 and what are planned for deprecation in 1.22 that you see as upgrade blockers / delayers?

@sbueringer
Copy link
Member

sbueringer commented Apr 12, 2021

@neolit123 I think:

But I think this highly depends on the environment / company / culture etc. For example:

  • we have about 500-600 clusters internally
  • we keep them all on the same version (we rollout 1.20 right now and are usually 1 release behind Kubernetes)
  • we have no problems with users of our clusters. Usually, we announce the Kubernetes deprecations 2-3 release (6-9 months before). We then expect our internal users to upgrade in time, otherwise it's simply their problem

I think in CAPI it has to be good enough to support a certain amount of old / new Kubernetes versions (like last 3 Kubernetes versions at the time of CAPI release + new ones (in part on a best-effort basis) but enough so there's a good way to upgrade to the next CAPI version). Some of them are more or less guaranteed because we test them via Prow, others are simply on a best-effort basis.

@fabriziopandini
Copy link
Member Author

fabriziopandini commented Apr 13, 2021

@kfox1111 @sbueringer thanks for context.
@detiber I hear you asking for a policy that applies at minor release version (and possibly cross major versions).

I understand very well users concerns, and I agree on your points.
However from the other side I'm concerned to declare something that we can effectively ensure as a project, and this imply proper E2E test coverage + the capability to deal with the breaking changes that happens in kubeadm and Kubernetes, and maintain such implementations over time.

In other words, nothing prevent to extend the skew we are considering, but IMO we should extend the skew only when the initial one is well covered and we are confident that this is sustainable for the CAPI community (e.g we are keeping up with the test signal and fixing test errors/flakes timely).

@kfox1111
Copy link

@kfox1111

can you elaborate on what APIs got deprecated in 1.16 and what are planned for deprecation in 1.22 that you see as upgrade blockers / delayers?

Its the Removals that are at the most issue. Pluto is keeping track of them.

[kfox@zathras ~]$ pluto list-versions
KIND                           NAME                                   DEPRECATED IN   REMOVED IN   REPLACEMENT                       COMPONENT     
Deployment                     extensions/v1beta1                     v1.9.0          v1.16.0      apps/v1                           k8s           
Deployment                     apps/v1beta2                           v1.9.0          v1.16.0      apps/v1                           k8s           
Deployment                     apps/v1beta1                           v1.9.0          v1.16.0      apps/v1                           k8s           
StatefulSet                    apps/v1beta1                           v1.9.0          v1.16.0      apps/v1                           k8s           
StatefulSet                    apps/v1beta2                           v1.9.0          v1.16.0      apps/v1                           k8s           
NetworkPolicy                  extensions/v1beta1                     v1.9.0          v1.16.0      networking.k8s.io/v1              k8s           
Ingress                        extensions/v1beta1                     v1.14.0         v1.22.0      networking.k8s.io/v1              k8s           
Ingress                        networking.k8s.io/v1beta1              v1.19.0         v1.22.0      networking.k8s.io/v1              k8s           
DaemonSet                      apps/v1beta2                           v1.9.0          v1.16.0      apps/v1                           k8s           
DaemonSet                      extensions/v1beta1                     v1.9.0          v1.16.0      apps/v1                           k8s           
PodSecurityPolicy              extensions/v1beta1                     v1.10.0         v1.16.0      policy/v1beta1                    k8s           
ReplicaSet                     extensions/v1beta1                     n/a             v1.16.0      apps/v1                           k8s           
ReplicaSet                     apps/v1beta1                           n/a             v1.16.0      apps/v1                           k8s           
ReplicaSet                     apps/v1beta2                           n/a             v1.16.0      apps/v1                           k8s           
PriorityClass                  scheduling.k8s.io/v1beta1              v1.14.0         v1.17.0      scheduling.k8s.io/v1              k8s           
PriorityClass                  scheduling.k8s.io/v1alpha1             v1.14.0         v1.17.0      scheduling.k8s.io/v1              k8s           
CustomResourceDefinition       apiextensions.k8s.io/v1beta1           v1.16.0         v1.22.0      apiextensions.k8s.io/v1           k8s           
MutatingWebhookConfiguration   admissionregistration.k8s.io/v1beta1   v1.16.0         v1.22.0      admissionregistration.k8s.io/v1   k8s           
ClusterRoleBinding             rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRole                    rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRoleBindingList         rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRoleList                rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
Role                           rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleBinding                    rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleList                       rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleBindingList                rbac.authorization.k8s.io/v1alpha1     v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRoleBinding             rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRole                    rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRoleBindingList         rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
ClusterRoleList                rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
Role                           rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleBinding                    rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleList                       rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
RoleBindingList                rbac.authorization.k8s.io/v1beta1      v1.17.0         v1.22.0      rbac.authorization.k8s.io/v1      k8s           
PodDisruptionBudget            policy/v1beta1                         v1.22.0         n/a          n/a                               k8s           
PodDisruptionBudgetList        policy/v1beta1                         v1.22.0         n/a          n/a                               k8s           
HorizontalPodAutoscaler        autoscaling/v2beta1                    v1.22.0         n/a          autoscaling/v1                    k8s           
HorizontalPodAutoscalerList    autoscaling/v2beta1                    v1.22.0         n/a          autoscaling/v1                    k8s           
HorizontalPodAutoscaler        autoscaling/v2beta2                    v1.22.0         n/a          autoscaling/v1                    k8s           
HorizontalPodAutoscalerList    autoscaling/v2beta2                    v1.22.0         n/a          autoscaling/v1                    k8s           
CronJob                        batch/v1beta1                          v1.22.0         n/a          n/a                               k8s           
CronJobList                    batch/v1beta1                          v1.22.0         n/a          n/a                               k8s           
CSINode                        storage.k8s.io/v1beta1                 v1.17.0         n/a          n/a                               k8s           
AuthorizationPolicies          rbac.istio.io                          v1.4.0          v1.4.0       security.istio.io/v1beta1         istio         
                               authentication.istio.io/v1alpha1       v1.5.0          v1.6.0       security.istio.io/v1beta1         istio         
                               networking.istio.io/v1alpha3           v1.5.0          n/a          networking.istio.io/v1beta1       istio         
Certificate                    certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  
Issuer                         certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  
ClusterIssuer                  certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  
CertificateRequest             certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  
Order                          certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  
Challenge                      certmanager.k8s.io/v1alpha1            v0.11.0         v0.11.0      cert-manager.io/v1alpha2          cert-manager  

@neolit123
Copy link
Member

neolit123 commented Apr 14, 2021

and one must use kubectl convert to migrate manifests on disk?

@fabriziopandini
Copy link
Member Author

My two cent on this

  • sig-release: Add release cadence KEP kubernetes/enhancements#2567 could be a better place discussion about impacts of Kubernetes changes.
  • IMO CAPI can't be the definitive solution for avoiding the side effects caused by those system changing; sooner or later each consumer should adapt like we are doing.
  • If possible and everyone agree, I would like to keep the discussion on this issue focused on what is a reasonable and sustainable skew that we can ensure as a project (and forgive me repeating myself, I would stress sustainable and ensure here).

One concrete example. Today we discovered a problem on Kubernetes v1.20, and this was possible due to the hard work we are doing on E2E test; without such E2E test being monitored, triaged and mantained any declaration about CAPI supporting a wide range of version isn't a real guarantee for the users.

@neolit123
Copy link
Member

i'd agree that delayed upgrades due to core API removals is an upstream k8s problem and not a CAPI problem.

@kfox1111
Copy link

and one must use kubectl convert to migrate manifests on disk?

For helm charts, its typically helm upgrading with a new enough version of the chart that has the new api's used.

@sbueringer
Copy link
Member

sbueringer commented Apr 14, 2021

Absolutely agree with @fabriziopandini. While there are some changes in upstream Kubernetes which require work on the user-side, the deprecation/removal periods are pretty long (especially with probably 3 Kubernetes releases per year now, these add up to 1-2 years).

From a CAPI-side we have to also take into account what's sustainable for the CAPI community. It's non-trivial for a single CAPI version to support a wide range of Kubernetes releases (this also includes breaking changes in components like kubeadm). Each additional Kubernetes release we guarantee/ensure support for adds a maintenance burden to CAPI, both in terms of testing and implementation. An important factor is definitely that we have to support enough Kubernetes versions to ensure enough time for CAPI upgrades (e.g. like support for 1.21/1.22 with CAPI 0.3.x).

I think we should be open to support a wider range of versions on a best-effort basis, but we really have to get it right how many versions we guarantee support for (in practice, by testing them via periodic jobs and ensuring those tests are always green). If there is a lot of demand for support of older versions, I think it's fair to accept contributions from the wider community (but as I wrote above, on a best-effort basis).

When we support the last 3 Kubernetes versions at the time of a CAPI release during the life time of a minor release, I think our support of upstream Kubernetes versions is by far longer then the upstream support of those versions. Example:

  • k/k release cadence 4 months (3 per year)
  • CAPI support period of a minor release 6 months (just a wild guess, currently it's longer afaik)
    I guess this would add up to about 5-6+ supported Kubernetes versions by an individual CAPI version over time (assuming we will keep supporting new released Kubernetes versions during the lifetime of a CAPI minor release)

@vincepri
Copy link
Member

Folks, what's the status of this discussion? Could we follow up with an update to the book stating our support range a bit more?

@vincepri
Copy link
Member

/milestone Next
/kind documentation
/priority important-soon

@k8s-ci-robot k8s-ci-robot added this to the Next milestone May 25, 2021
@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels May 25, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2021
@fabriziopandini
Copy link
Member Author

/lifecycle frozen
this topic is relevant for project graduation

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 24, 2021
@sbueringer
Copy link
Member

/assign @fabriziopandini
to re-assess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants