Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if transition from unordered-upgrade to ordered upgrade is possible #744

Open
yevgeny-shnaidman opened this issue Feb 20, 2024 · 8 comments
Assignees

Comments

@yevgeny-shnaidman
Copy link
Contributor

Currently, if customer wishes to use ordered upgrade, he needs to create a Module with a Version field and set the appropriate labels on the nodes. We don''t allow adding the Version field to the Module , if the Module already exists in the cluster.
We need to check if it is possible to allow it, since in v2 we don't have a running ModuleLoader daemonset.
In case we can support such scenario it will ease the adoption of the ordered-upgrade on the existing customer and will help with KMM upgrade support for day1/day0 modules

@yevgeny-shnaidman yevgeny-shnaidman self-assigned this Feb 20, 2024
@yevgeny-shnaidman
Copy link
Contributor Author

yevgeny-shnaidman commented Mar 10, 2024

There are 2 possible ways to do it:

  1. Manual
  • Customer first sets the customer version labels on all the the nodes targeted by the Module.
  • VersionNodeLabel controller will label the nodes with the internal label
  • This will have no affect on the loaded modules or DevicePlugin pods, since the Version is not yet set in the Module.
  • Once the internal labels are set, customer adds Version field to the Module (without changing the Image field)
  • Change in Module will kick in reconciliation, but no action will be taken on the loaded kernel modules
  • a new DevicePlugin DS will be created. The old one can be deleted via GC. It might result in a little bit of downtime for DevicePlugin, but this is acceptable
  1. Programatical
  • Customer adds the Version field in the Module (and only the Version field)
  • Each controller that is watching the Module will ignore that change ( the filters of each controller will be updated to ignore that specific event)
  • A new controller needs to be added to the operator, which handles only the case of adding Version field to the Module
  • The new controller will first disable all the controllers that are watching the Module (can be done by adding annotation to the module, all the controllers will not run reconciliation loop in case annotation is present)
  • The new controller will update all the nodes with both customer version labels and internal version labels, and then will remove annotation from the Module
  • The rest of the controllers will start reconciling again, which can only cause the creation of the new DevicePlugin DS
  • The previous DevicePlugin DS will be removed by GC. It can incur downtime for DevicePlugin pods, which is acceptable

@qbarrand @ybettan @mresvanis WDYT?

@ybettan
Copy link
Contributor

ybettan commented Mar 11, 2024

I think that if we wish to continue to support both versioned-modules and non-versioned modules then we should have an easy way for customers to move from one to another, therefore, I would go with the Programatical option.

Should we have a way to go back from a versioned-modules to a non-versioned-module as well?

@yevgeny-shnaidman
Copy link
Contributor Author

I think that if we wish to continue to support both versioned-modules and non-versioned modules then we should have an easy way for customers to move from one to another, therefore, I would go with the Programatical option.

Should we have a way to go back from a versioned-modules to a non-versioned-module as well?

I don't see a use-case for it right now, but in case it will be requested - the solution will probably be identical, if not even simplier

@qbarrand
Copy link
Contributor

I see option 2 as very complex for little benefits.
Let us consider a Module with the following selector:

spec:
  selector:
    some-label: some-value

Labeling nodes with the version label can be as simple as:

kubectl label node \
  -l some-label==some-value \
  kmm.node.kubernetes.io/version-module.${ns}.${name}=${version}

Adding the version label to already targeted nodes sounds very straightforward to me, hence I would go with option 1.
The only concern I have is about the device plugin downtime. Are we certain that this does not affect existing workload that would have been scheduled with extended resources?

@ybettan
Copy link
Contributor

ybettan commented Mar 11, 2024

We can start with option 1 for simplicity and if the uses raise with time, re-discuss option 2 for a better UX.

@yevgeny-shnaidman
Copy link
Contributor Author

I see option 2 as very complex for little benefits. Let us consider a Module with the following selector:

spec:
  selector:
    some-label: some-value

Labeling nodes with the version label can be as simple as:

kubectl label node \
  -l some-label==some-value \
  kmm.node.kubernetes.io/version-module.${ns}.${name}=${version}

Adding the version label to already targeted nodes sounds very straightforward to me, hence I would go with option 1. The only concern I have is about the device plugin downtime. Are we certain that this does not affect existing workload that would have been scheduled with extended resources?

ExendedResources are not updated all the time, but in intervals of 1 minutes, i think. I hope it is enough time for new DevicePlugin pod to come up. Of course, new workloads can not be scheduled during the downtime, since scheduling may involve DevicePlugin's pods, but i think that customers can handle this requirement. On the other hand, we can update the GC code to delete the "unversioned" DevicePlugin DS only after the "versioned one is running, but in that case customer must make sure that 2 device plugin pods can run on the same node

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2024
@yevgeny-shnaidman
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants