Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Cluster Validation webhooks for .spec.topology.version #10049

Closed
chrischdi opened this issue Jan 24, 2024 · 2 comments · Fixed by #10063
Closed

Improve Cluster Validation webhooks for .spec.topology.version #10049

chrischdi opened this issue Jan 24, 2024 · 2 comments · Fixed by #10063
Labels
area/api Issues or PRs related to the APIs kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@chrischdi
Copy link
Member

chrischdi commented Jan 24, 2024

What would you like to be added (User Story)?

As a user I would like changes to the Cluster.topology.version field to be blocked when:

  • the control plane object has a KCP.spec.version != (current) Cluster.spec.topology.version
  • there are MachineDeployments, MachinePool or Machine objects with a .spec.version != (current) Cluster.spec.topology.version
  • there are MachineSets with .spec.replicas > 0 with a .spec.version != (current) Cluster.spec.topology.version
    • Edit: handled transitively via MachineDeployments.

to

  • prevent triggering additional updates while others are not yet finished.
  • prevent violations against the kubernetes version skew policy.

As a user I would like to overwrite the version validation webhook to skip .spec.topology.version based validations (except semver) to fix screwed up clusters.

Detailed Description

This is supposed to extend and improve the existing version validations.

For a better picture: the current implementation for version validations:

  • Cluster update validation webhook
    • .spec.topology.version cannot be decreased
    • .spec.topology.version cannot be increased by >= 2 minor versions
  • KCP update validation webhook
    • .spec.topology.version cannot be increased by >= 2 minor versions

Proposed changes (same as User stories above, but maybe more detailed):

  • Cluster update validation webhook
    • Add new validation which blocks upgrades by which increase by 1 minor version if:
      • The control plane object of the Cluster has a .spec.version != current .spec.topology.version
      • There are any MachineDeployment, MachinePool or Machine objects with a .spec.version != current .spec.topology.version
      • There are any MachineSets with .spec.replicas > 0 with a .spec.version != current .spec.topology.version
        • Edit: handled transitively via MachineDeployments.
    • Introduce the annotation unsafe.topology.cluster.x-k8s.io/disable-version-validation to skip all .spec.topology.version validations (except semver)
  • KCP update validation webhook:
    • no changes

Anything else you would like to add?

Related issue which would propably be solved:

Label(s) to be applied

/kind feature
/area api

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/api Issues or PRs related to the APIs needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 24, 2024
@chrischdi chrischdi changed the title Improve Validation webhooks for versions Improve Cluster Validation webhooks for .spec.topology.version Jan 24, 2024
@fabriziopandini
Copy link
Member

/triage accepted
+1 to have more safeguards - with escape paths - around the upgrade process.

Ideally (but clearly out of the scope of this issue), this should be balanced by features allowing faster upgrades, like e.g. #7631 or first-class support for the process described in #8615 (comment)

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 25, 2024
@sbueringer
Copy link
Member

Sounds good.

One note. This should work with all control planes not only KCP (when they support version). But I think if we re-use the right utils we're probably already good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Issues or PRs related to the APIs kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants