MCO-507: admin defined node disruption policy enhancement #1525

yuqi-zhang · 2023-12-06T19:14:44Z

Enhancement for users to specify how MachineConfig updates should be acted upon.

See: https://issues.redhat.com/browse/OCPSTRAT-380

openshift-ci-robot · 2023-12-06T19:14:48Z

@yuqi-zhang: This pull request references MCO-507 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.15." or "openshift-4.15.", but it targets "openshift-4.16" instead.

In response to this:

Draft enhancement for users to specify how MachineConfig updates should be acted upon.

See: https://issues.redhat.com/browse/OCPSTRAT-380

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2023-12-06T19:14:56Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

djoshy · 2023-12-12T15:29:35Z

enhancements/machine-config/admin-defined-reboot-policy.md

+The MachineConfigController’s drain controller will process any update to the object and validate the RebootPolicy for duplication, action validity (but not correctness), etc. If the object is valid, the drain controller will update the object’s status and observedGeneration to indicate success, and if not, update those and degrade the controller, example:
+
+```
+status:
+  conditions:
+  - lastTransitionTime: "xxx"
+    Message: “”
+    status: "False"
+    type: RebootPolicyValidationFailing
+  - lastTransitionTime: "xxx"
+    Message: “Validation of the rebootpolicy has succeed.”
+    status: "True"
+    type: RebootPolicyValidated
+  observedGeneration: x
+```
+
+The MachineConfigDaemon will process and apply the current RebootPolicy if it’s valid, indicated by generation == observedGeneration (MCC has processed the latest spec) and RebootPolicyValidated == true/RebootPolicyValidationFailing == false. If not, the daemon will also degrade (and queue a re-sync) waiting for the controller to validate the object.


I like the idea of degrading the operator via conditions as this follows our existing pattern, but just to be super clear, are we talking about degrading the operator via new condition types on an object(say the RebootPolicy in this case), or a new condition type on the MCC itself? I don't think we do degrades via the MCC directly like that, but please correct me if I'm wrong!

You also mention degrading the daemon too, which makes sense. I assume both the controller and the daemon degrades will be simultaneous(barring sync delays), or would there be cases where each could happen independent of the other?

So in this original model, I was thinking that the daemon would only error if it had an ongoing upgrade while the RebootPolicy was invalid. If it's not being currently used, the daemon wouldn't double report the error. In the case where both is erroring, it would be like:

controller is failing on validation of RebootPolicy

daemon is failing on waiting for validation of RebootPolicy

Hopefully that's not too confusing. As for the controller error, I was thinking of just erroring out the sync loop as a start. But maybe having the operator object itself sync the above RebootPolicyValidationFailing would be better, now I think about it

djoshy · 2023-12-12T15:37:52Z

enhancements/machine-config/admin-defined-reboot-policy.md

+
+The special MCO cert paths ("/etc/kubernetes/kubelet-ca.crt", "/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem") are not listed in the clusterDefaultPolicies as they are no longer MachineConfig changes
+
+In terms of the policy itself, in case a change has multiple matching policies (e.g. two files changed, both with a different policy), all policies will be applied in order (e.g. reload multiple services). If any change in the list does not match any defined policy, the MCO will default to rebooting and skip over all the policies.


This is probably out of my wheelhouse but I'm curious - are actions going to calculated and accumulated, and then applied at the end? Or would it be each action to take place in the order it came(regardless of the type of action)? Is there a disadvantage over doing one over the other? I know you mentioned checking for duplicates earlier, but just want to clarify that applies to actions or not

For example, consider these actions from two reboot policies(not applied at the same time)

Policy 1
reload crio
reload crio
reboot

Policy 2
reload crio
reboot

Would policy 1 cause two crio reloads? Or would the actions take from both of them be the same(1 reload crio, 1 reboot)

Great question, I was hoping we can get away with any merge logic and only validate that you don't have 2 entries for the same object (e.g. one file defined twice, once being a no-action and once being a reload, which self-contradicts). Then afterwards we just apply them in order from the daemon without any additional logic.

Is there a disadvantage over doing one over the other?

Other than doing unnecessary actions (reloading the same service multiple times, for example) there shouldn't be any real downsides. I guess there are potentially scenarios where you need to reload the service in a specific order, but I think we can preserve the definition ordering so that shouldn't be a big issue

cgwalters · 2023-12-13T21:03:53Z

enhancements/machine-config/admin-defined-reboot-policy.md

+apiVersion: machineconfiguration.openshift.io/v1
+kind: RebootPolicy


So this enhancement doesn't yet mention the bootc configmap approach (we aren't using it yet I know, doesn't really exist)

But anyways in there I was straw-manning that we have an annotation in the configmap which specifies the "update policy" for it (apply live vs reboot) - there's some ugly bits around whether we need "apply live and execute this arbitrary command" or "apply live and reload these systemd units" that need to be defined.

But anyways to me, associating the update policy with the files makes a lot more sense than having another global list.

Now to implement this today we'd need to add a "reboot policy" section into the rendered config. But I think it would be a lot nicer in the future to get rid of the idea of the big "rendered config" and have the MCD fetch individual MachineConfig objects (like how kubelet fetches individual configmaps, there's no "rendered configmap!").

Doing that gets into versioning/race condition questions around "what happens when an input MC changes while we're doing a rollout" that I think are solvable, but do need some thought.

Now one thing @yuqi-zhang brought up when we were chatting about this is the overall question of "how would this work for files in the base image".

My offhand thoughts on that are that we may want to go in a direction of having "static configmaps" that we can inject into a container image in some way, similar to the arguments around containers/bootc#128

(These things connect because configmaps may specify quadlet units)

Then bootc would be the thing that executes on applying these note!

Anyways...I'm not totally sure we really need to immediately handle the case of handling policies for files embedded in the base image.

It feels like we'd get most of the way if we split out the things that need to be updated live outside of the base image.

Great points 👍 , a few thoughts around this:

The global approach in a sense can help us merge the paths of MachineConfig and Configmap defined objects. So if you define a rebootpolicy, it can apply to both update paths, and eventually container image diffs potentially. So there is a singular API and the MCO can handle the specifics when doing the actual content update. I guess the downside would be that if bootc configmaps allow for that we'd have multiple locations where this can be defined, but that can also be an MCO implementation detail and this should give us enough flexibility

The timeline for this is going to be 4.16 which will likely predate any full implementation of bootc live-applies and configmaps going into the MCO, since there is some urgency around this, otherwise I'd love to wait and see what we finalize on in terms of bootc-based updates

The Configmap approach doesn't map as well to MachineConfigs (at least not the annotations) since a lot of MachineConfigs are a giant bundle of different files/units, due to our existing architecture, and not one-file-per-machineconfig

As a bridge gap, I think the idea of extending the MachineConfig API instead of a new API object is an alternative approach we can pursue. This may actually be easier to conceptualise for some users since I think most users think of "changing a machineconfig" as the catalyst to an update, and not necessarily "a new rendered config for the system was created and we're doing a diff update". I did consider this approach and originally rejected it because most of the "no-reboot updates" that are asked about are certificates, or registry configuration etc. which comes all in the cluster-supplied 00-worker and 00-master MCs, which you can't edit. And you definitely should not be providing your own machineconfigs for those files.

So then if we do opt for this instead I guess the user would end up defining a RebootPolicy as a separate MachineConfig still (for example, for image registry mirror changes, they would define a machineconfig that just has a "no-action" field). The downside would be... that it's a bit confusing for users to write a MachineConfig specifically to define the "how does my file reboot", but I don't see a great alternative. The upside is that it can be separated from the configmaps in the future and both would work separately.

I'm still on the fence about this. Do you think having the MCO write RebootPolicies into bootc configmaps instead eventually would work as well?

I did consider this approach and originally rejected it because most of the "no-reboot updates" that are asked about are certificates, or registry configuration etc. which comes all in the cluster-supplied 00-worker and 00-master MCs, which you can't edit.

But we can easily split out those files into a distinct e.g. 01-worker-configs MC today and aim for stuff that changes in -configs to have distinct "apply" policies.

cgwalters · 2023-12-13T21:04:39Z

enhancements/machine-config/admin-defined-reboot-policy.md

+         actions:
+            - type: restart
+              target: myservice
+    clusterDefaultPolicies:


I don't get why we'd mix "cluster" and "user" in one single CRD. Why not have RebootPolicy have multiple instances (like machineconfig and tons of other things) and just some of them are cluster owned (i.e. changes get overwritten)?

I opted to define the initial approach this way for simplicity because it doesn't require us to:

merge multiple objects

validate separately

So in this way we can validate via the object via the singular cluster object status, and the daemon can just react to that, and we can re-evaluate for GA. We have a similar schema for the "node.config.openshift.io" object (only one "cluster" object is allowed")

But it is definitely more flexible and safer to have multiple... we can implement a merging schema. I don't really want us to have to create a "rendered-rebootpolicy" object but I guess we can have the controller validate each one separately, and the daemon to apply them in... alphanumerical order? We may not even need a merging schema and just have the daemon apply them in order, which at worst will be a few duplicated commands.

JoelSpeed · 2023-12-20T13:00:03Z

enhancements/machine-config/admin-defined-reboot-policy.md

+
+### Goals
+
+* Create a new RebootPolicy CRD to allow users to define how OS updates are actioned on after application


Do we think this needs to be a new CRD or could it be part of the MachineConfiguration operator CR?

I defaulted to a machineconfig-like approach of a new CRD since it's a bit easier to manage, but I think technically speaking there's no reason why it can't be part of MachineConfiguration.

So then our options would be:

have it be part of MachineConfiguration knobs

have it be a new CRD

have it extend MachineConfig objects

I think if 1 is easier we can certainly take that approach, but I wonder if we'd be stuffing too many things into MachineConfiguration. What's the general guideline on what warrants a new CRD vs just a high level knob?

Do you expect there to be one configuration per cluster for this? If you do, then I think containing it in the MachineConfiguration singleton probably makes sense.
If the config might vary per machinepool and you expect we might need many duplicates, then perhaps we look at something separate.

The guidance I'd give is about the consumer. Group things together that the same consumers would want to look at/configure. I expect this reboot configuration will fit in nicely within either the MachineConfiguration singleton, or within the MachinePool if you wanted the MachinePools to have independent configs

I think of the use cases we know of so far, there shouldn't be a need to specify this per-machineconfigpool. Doing it as a separate CRD will give us more flexibility (and the CRD name itself is a bit more direct on what this is configuring, vs the overall MachineConfiguration object) but I think perhaps the end goal is to make this work better with images, so in this sense, doing it via the MachineConfiguration object for cluster-wide, individual MachineConfig effects works best as a first implementation

TBH, we can start with either approach during tech preview, there's no need to promote whatever we do in tech preview to GA. But perhaps we need to explore the pros and cons of each approach first.
The one thing to bear in mind though, is that, once a name has bene used, even in techpreview, it cannot be re-used later with a different shape, so we should probably sync up on this at some point.

I think for now I lean towards seeing what the reboot policy would look like within the MachineConfiguration CR

JoelSpeed · 2023-12-20T13:09:14Z

enhancements/machine-config/admin-defined-reboot-policy.md

+
+An admin of an OCP cluster can at any time create/update/delete RebootPolicy objects to reconfigure how they wish updates to be applied. There isn’t any immediate effect so long as the RebootPolicy CR itself is a valid object.
+
+The MachineConfigController’s drain controller will process any update to the object and validate the RebootPolicy for duplication, action validity (but not correctness), etc. If the object is valid, the drain controller will update the object’s status and observedGeneration to indicate success, and if not, update those and degrade the controller, example:


Why does this need to be done via a controller rather than at admission time using CEL or a validating webhook?

I wanted to leave this open for potentially merging multiple rebootpolicies, sort of like MachineConfigs, since I was a bit ambivalent on having just one.

So in that case, users can still provide any configuration of their own, cluster would be immutable and generated by default, and the controller will "merge" them in some fashion.

I suppose if we want to just have one object (or as a field in MachineConfiguration as mentioned above) having a CEL or validating webhook will be better. I assume those can't be used to cross-validate against other objects?

There are options for cross validating against other objects, we can explore that if it is a requirement.

In general, having the reboot policy defined in a single place would likely prevent duplication anyway no? For example, we can use map keys to make sure certain keys within the lists have unique entries

JoelSpeed · 2023-12-20T13:10:54Z

enhancements/machine-config/admin-defined-reboot-policy.md

+
+The MachineConfigDaemon will process and apply the current RebootPolicy if it’s valid, indicated by generation == observedGeneration (MCC has processed the latest spec) and RebootPolicyValidated == true/RebootPolicyValidationFailing == false. If not, the daemon will also degrade (and queue a re-sync) waiting for the controller to validate the object.
+
+For now, we only support one cluster-wide rebootpolicy, and as such any rebootpolicy not named “cluster” will be ignored and the MCO will report errors via degrading the operator.


Typically we can prevent these objects from being created using validating webhooks, or now even a CEL validation. So perhaps restate this to say that CEL validation will prevent creating an object not named cluster

JoelSpeed · 2023-12-20T13:12:48Z

enhancements/machine-config/admin-defined-reboot-policy.md

+
+For now, we only support one cluster-wide rebootpolicy, and as such any rebootpolicy not named “cluster” will be ignored and the MCO will report errors via degrading the operator.
+
+When the user first enables RebootPolicy via featuregate (or at some version, by default), the “cluster” RebootPolicy object will be generated with the existing MachineConfigDaemon defaults today, so the user can also view the default policies that came with the cluster. The user can update these fields if they wish to use a different workflow via userPolicies, but are not allowed to modify clusterDefaultPolicies, which is reserved for MCO to update these as we wish in the future. The fields that exist today are as highlighted above.


How will you stop users modifying these, but allow the controller to update these? Perhaps this should be enforced by the "have a spec, copy to status, observe from status" pattern

I intended to have the controller validate and overwrite any changes to those fields, much like any other object the MCO manages (for example, if you try to overwrite any of the controller-managed MachineConfig objects, the controller will simply re-apply the old one back. There is a delay so the rendered config doesn't get generated.)

The status will just reflect that the controller has ack'ed whatever is in the spec in this current pattern, so we can save re-listing all the policies in the status. But if the copy-to-status pattern is better, I think we can save a bit of waiting in the daemon (the current approach has the daemon wait for the latest, vs using the last-validated in the status in this new method) so it might be better?

I would probably go for the copy to status approach.

The idea here is that the user can define what they want, and that is spec. What the user says should never be modified. A controller then validates the changes and copies to status, if the changes from the user are acceptable, rejecting and logging (via a condition maybe) why it's ignoring anything from the user spec, and merging anything that must exist.

Then whatever consumes the data, can always rely on the status fields, because they have already been validated.

Ok, as an MVP, I think we can try for MachineConfiguration spec and status.

JoelSpeed · 2023-12-20T13:22:24Z

enhancements/machine-config/admin-defined-reboot-policy.md

+
+The user thereby needs to understand that no additional verification is taking place, and that they accept any responsibility for RebootPolicies causing configuration not to take effect.
+
+Given the popularity of using pause on MachineConfigPools, it is possible some users will exploit this to skip upgrade reboots. That will likely not break the cluster but will put the configuration in an awkward spot, as we can no longer guarantee that the configuration a node claims to be on is actually applied.


What does this mean for support cases? It sounds like we need to have an internal list of what can and cannot skip reboots, and force reboots where they are actually required.

I think at least initially, the usage of rebootpolicies means that you the user take responsibility for them, so from a support perspective this shouldn't be any different than say, you modified a file manually via ssh, and broke your cluster as a result.

When I SSH into a node, it tells me that this has been logged and the node is now out of support. Will you have something similar for reboot policies?

When I SSH into a node, it tells me that this has been logged and the node is now out of support. Will you have something similar for reboot policies?

So I think we actually removed the ssh warning a few versions ago. The whole ssh saga was a long history of errors and sub-optimal behaviour.

The SSH warning was part of a node annotation (and I guess, a daemon log entry) and didn't make it further than that. We can certainly add a node annotation when a rebootpolicy did take effect. Is that sufficient? We can also try to leverage the new MachineConfigNodes objects to reflect in their status instead

I think we need to come up with something to define our boundary for support. If you think there are going to be complex scenarios if users use certain features in tandem, then prevent them from doing so, or, flag it in a status so that we can say, oh well you did XYZ which we told you not to do, that's why it broke.

If we don't do this, I expect you'll see a number of bugs come through that are complicated, will require investigation, and then will conclude, oh they did something unsupported

JoelSpeed · 2023-12-20T13:22:39Z

enhancements/machine-config/admin-defined-reboot-policy.md

+
+As noted in the non-goal sections, any RebootPolicy is not verified in the MCO, minus formatting. This means that if you accidentally define a change that needed a reboot to take effect, the cluster would “finish the update” without being able to use the new configuration.
+
+The user thereby needs to understand that no additional verification is taking place, and that they accept any responsibility for RebootPolicies causing configuration not to take effect.


How will you explain this to a user?

Docs, hopefully :)

We can craft an alert or something to indicate when their rebootpolicy took effect upon an update. Without constraining the user to set based on a list of tested "safe" policies (Which goes back to what we already have today, and we don't want to continue extending necessarily) there isn't a way to guarantee (that I know of) user-defined policies to be always "Safe"

Providing some metric or alert that is easy for support to spot I think would have value

JoelSpeed · 2023-12-20T13:24:43Z

enhancements/machine-config/admin-defined-reboot-policy.md

+      - type: file
+        value: /etc/my-conf
+        Actions:
+            - type: drain
+            - type: reload
+              target: myservice
+            - type: reload
+              target: myotherservice


I would like to see the Go type for this, with godoc documentation and kubebuilder validations. Currently the API looks almost scary to me, in that I have very little context of what any of the things here mean. I'm concerned about how usable this will be to an average user so documentation on this API will be very important IMO. Perhaps we can discuss in the API extensions section

This one looks very similar to https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_handlers.html and would almost certainly face the same pressure to become similarly extensible.

Given that, I think this API is going to need to be thought out very carefully, we can take a look at the extensibility built into playbook API as inspiration, thanks for the link

JoelSpeed · 2023-12-20T13:25:33Z

enhancements/machine-config/admin-defined-reboot-policy.md

+
+Which, when applied, will define actions for all future changes to these MachineConfig fields based on the diff of the update (rendered MachineConfigs).
+
+For this initial implementation we will support MachineConfig changes to:


A users expected to understand the changes introduced as part of the payload, and then add policies before they upgrade? If so, where will they get this information?

Payload updates are not what this is intended for. Basically every payload update we have contains an OS update to RHCOS, which isn't rebootless.

The main use case is for specific machineconfig-defined file changes to skip this process, outside of major version upgrades

Oh ok! I didn't get that when reading the motivation/goals, perhaps an update there might make that more obvious.

Given that, I think this probably alleviates my concern. The user is making changes to lay specific files down or change specific files, so they know what they need to do.

Is there an ordering issue here? Do they need to make sure that the reboot policy change is applied before they apply the MachineConfig change?

perhaps an update there might make that more obvious

I'll add a snippet to indicate that this is not for OCP major/minor upgrades, but instead for individual MachineConfig updates performed by the user

Is there an ordering issue here? Do they need to make sure that the reboot policy change is applied before they apply the MachineConfig change?

With the validation step above, I guess the user must ensure that the rebootpolicy they want has properly made it into the status of (either the MachineConfiguration or a new CR) object before starting their MachineConfig update

In practice if you apply them in quick succession, there is a small inbuilt delay into MachineConfig rendering -> pool acquiring -> daemon processing (of ~15 seconds) such that the rebootpolicy should take effect. I don't think we need to have addition inbuilt checks for this

JoelSpeed · 2023-12-20T13:26:16Z

enhancements/machine-config/admin-defined-reboot-policy.md

+      - type: file
+        value: /etc/my-conf


This seems very very specific. Do we think users want this level of granularity, or do they want classes of changes to ignore reboots for?

We can certainly support wildcarding paths, but this is mostly inline with the MachineConfig objects we have today.

What would you define as a "class of change" in this situation?

I was thinking about this applying to all changes, I think if we are saying this is only applicable to everyday changes and not updates, then I don't think this comment really applies

cgwalters · 2023-12-20T16:17:44Z

enhancements/machine-config/admin-defined-reboot-policy.md

+superseded-by:
+---
+
+# Admin Defined Reboot Policy


At an overall thing I think we don't want to hardcode the term "reboot" here because some changes today are still disruptive in terms of draining, but don't actually involve a reboot right? (Hm, the code flow in the MCO today is a bit confusing on this; I'd have expected postConfigChangeActionDrain but it's special cased)

Certainly in bootc I plan at some point to support systemd userspace-only restarts too which will act differently from reboots.

And then this whole enhancement is really about enabling special casing specific changes which are not reboots...

So maybe "Admin defined Host Change Apply Policy" ?

I picked "reboots" since that's how most people view it :) I was considering the alternative of "Admin defined disruption policy" instead but disruption has multiple meanings

The title I can certainly change to be more descriptive, the naming of the api object (or other alternatives) is definitely more important

yuqi-zhang · 2024-01-04T01:39:02Z

Added 2 alternatives as discussed above, with some of the benefits and drawbacks to the approaches. Happy to discuss further

yuqi-zhang · 2024-02-09T19:07:11Z

Addressed most of the comments and updated the enhancement. Key points:

now is a sub-section of MachineConfiguration object instead of a new CRD
now called Node Disruption Policy instead of Reboot Policy, for clarity (since it encompasses drains, etc.)

yuqi-zhang · 2024-02-09T19:25:47Z

API draft here: openshift/api#1764

dhellmann · 2024-02-13T16:29:48Z

#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template.

sinnykumari · 2024-02-13T20:58:25Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+### Goals
+
+* Create a NodeDisruptionPolicyConfig subfield in the MachineConfiguration CRD to allow users to define how OS updates are actioned on after application


ending of this sentence seems a bit confusing. Perhaps we wanted to write something after application

sinnykumari · 2024-02-13T21:03:47Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+For this initial implementation we will support MachineConfig changes to:
+ - Files
+ - Units
+ - kernelArguments


how are we envisioning applying kargs without performing reboot?

Removed that from the list in the API and here

sinnykumari · 2024-02-13T21:10:56Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+### API Extensions
+
+- Adds fields to the MachineConfiguration CRD


linking to API PR would make sense here.

sinnykumari · 2024-02-13T21:18:32Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+The user thereby needs to understand that no additional verification is taking place, and that they accept any responsibility for NodeDisruptionPolicies. If any are applied to the cluster, a metric should be raised for debuggability.
+
+Given the popularity of using pause on MachineConfigPools, it is possible some users will exploit this to skip upgrade disruption. That will likely not break the cluster but will put the configuration in an awkward spot, as we can no longer guarantee that the configuration a node claims to be on is actually applied.


I am a bit confused how pausing pool impact here. Are we allowing admin to skip drain/reboot for OS update or something else I missing?

Reworded to remove confusion. I didn't mean to talk about paused pools

sinnykumari · 2024-02-13T21:24:30Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+ - Have the user specify a global policy 
+ - Allow users to wildcard paths
+ - Allow users to define policies per pool instead of globally


Keeping policies at cluster level makes sense to me initially since in regular usecase if a change is safe, it would be safe to apply on all nodes. Maybe something we can fine tune afterward based on usecases.

sinnykumari · 2024-02-13T21:27:36Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+The biggest question is the format and constraints we would like for the UX to be. Conceivably, we have the options to:
+
+ - Have the user specify a global policy 
+ - Allow users to wildcard paths


Can be useful for applying at directory level. Especically when adding drop-in files into drop-in directory.

sinnykumari · 2024-02-13T21:29:10Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+ - Have the user specify a global policy 
+ - Allow users to wildcard paths
+ - Allow users to define policies per pool instead of globally
+ - Provide users more flexibility in defining the actions (e.g. provide a shell script to run to change selinux context)


User can write a systemd unit to achieve that and that unit itself can be defined in the policy :)

sinnykumari · 2024-02-13T21:31:43Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+ - Allow users to wildcard paths
+ - Allow users to define policies per pool instead of globally
+ - Provide users more flexibility in defining the actions (e.g. provide a shell script to run to change selinux context)
+ - Provide users more granularity on changes within a file (e.g. adding vs deleting image mirrors within registries.conf)


icsp changes are tricky. We may want to handle it as a special case. Something like ClusterDefaultPolicy for this would be set to no drain and reboot for addition. And userPolicy may specify deletion.

JoelSpeed

It's not clear to me (though I think from prior discussion this is true) that this policy only applies to users modifying MachineConfig objects, and doesn't apply in the general case for node upgrades. It would be good to clarify if users can prevent reboots if say, an upgrade says the kubelet service needs to be updated, and they specify to prevent that reboot or service restart.

Having users have the ability to prevent nodes from restarting for core updates to kubelet and crio could cause support concerns down the road.

Another thing I may have missed, but, will I be able to apply the policy to a subset of machine config pools? Is there any time where I may want to skip a reboot on workers, but I'm happy to do so on masters?

JoelSpeed · 2024-02-14T12:14:42Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+status:
+  validation: Failed
+  Reason: Invalid NodeDisruptionPolicy spec: xxx


This probably wants to be a condition

Updated and added a TODO in the API PR

JoelSpeed · 2024-02-14T12:16:49Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+When the user first enables the new nodeDisruptionPolicySpec via featuregate (or at some version, by default), the "default" MachineConfiguration object will be generated with the existing cluster defaults by the MachineConfigController, so the user can also view the default policies that came with the cluster. The user can update these fields if they wish to use a different workflow via userPolicies, but are not allowed to modify clusterDefaultPolicies, which is reserved for MCO to update these as we wish in the future. The controller will enforce this when generating the status. The fields that exist today are as highlighted above.
+
+Note that /etc/containers/registries.conf has a “Special” keyword. This is due to the extra processing the MCO does today. No user defined policy may use the “Special” keyword.


This will need to be spelt out clearly in the API. I would like to understand more about the Special keyword so lets talk about that at some point, how it manifests and the implications of it

The "special" keyword is reserved for the MCO-defined actions and we will reject any user policies with this keyword.

I was hoping to use the phrasing in describe to flesh out what this means to the user. There are basically additional logic for registries specifically around mirror-by-digest-only changes that exist in the MCD today. This is just for user awareness and not something they can set themselves.

JoelSpeed · 2024-02-14T12:17:29Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+The special MCO cert paths ("/etc/kubernetes/kubelet-ca.crt", "/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem") are not listed in the clusterDefaultPolicies as they are no longer MachineConfig changes.
+
+In terms of the policy itself, in case a change has multiple matching policies (e.g. two files changed, both with a different policy), all policies will be applied in order (e.g. reload multiple services). If any change in the list does not match any defined policy, the MCO will default to rebooting and skip over all the policies.


I think the API change to make path the map key will mean this statement is null

JoelSpeed · 2024-02-14T12:17:47Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+In terms of the policy itself, in case a change has multiple matching policies (e.g. two files changed, both with a different policy), all policies will be applied in order (e.g. reload multiple services). If any change in the list does not match any defined policy, the MCO will default to rebooting and skip over all the policies.
+
+The MachineConfigNodes CR will track the status of the upgrade, and will also be updated to indicate what the NodeDisruptionPolicy used for the update was.


Do we have an API extension for this?

Not today, since MCN is also tech preview.

Will make it a goal for GA

JoelSpeed · 2024-02-14T12:18:43Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+### API Extensions
+
+- Adds fields to the MachineConfiguration CRD


Please link to API PR at a minimum. Providing YAML examples to show usage would also be helpful for other readers. We can also talk about the validations that will apply here

done. An example is provided above

JoelSpeed · 2024-02-14T12:20:10Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+The user thereby needs to understand that no additional verification is taking place, and that they accept any responsibility for NodeDisruptionPolicies. If any are applied to the cluster, a metric should be raised for debuggability.
+
+Given the popularity of using pause on MachineConfigPools, it is possible some users will exploit this to skip upgrade disruption. That will likely not break the cluster but will put the configuration in an awkward spot, as we can no longer guarantee that the configuration a node claims to be on is actually applied.


I didn't think the disruption policy would apply to cluster version upgrades?

It doesn't. Removed

openshift-bot · 2024-03-15T01:15:45Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2024-03-22T08:45:25Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

yuqi-zhang · 2024-03-22T21:37:47Z

/remove-lifecycle rotten

Updated based on updates to the API PR

djoshy

Overall looks good to me! Just a few minor nits and some questions about the future work. Thanks for working on this, Jerry!

djoshy · 2024-04-18T19:58:54Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+ - Files
+ - Units
+ - sshKeys
+And actions for:


Might need a newline here for better formatting

djoshy · 2024-04-18T19:59:45Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+In the future, we can extend this in two ways:
+Supporting more changes/actions (e.g. user defined script to run after a certain change)
+Supporting diffing between ostree images (for image based updates) and bootc managed systems (for bifrost)
+


Probably worth adding directories/wildcarding plans here too?

djoshy · 2024-04-18T20:02:19Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+An admin of an OCP cluster can at any time create/update/delete nodeDisruptionPolicy fields in the MachineConfiguration object to reconfigure how they wish updates to be applied. There isn’t any immediate effect so long as the CR itself is valid.
+
+The MachineConfigOperator pod will process any update to the object and validate the NodeDisruptionPolicy for potential issues, although most, if not all are covered by API-level validation. The MCO pod will then merge the user provided contents to the cluster defaults (user contents will always override cluster defaults) and populate the nodeDisruptionPolicyStatus. Success and errors will also be reflected in the status:


I suppose this could change in the future, but as it stands we are completely guarded by API validation, no?

modifed based on our current implementation

djoshy · 2024-04-18T20:03:51Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+#### Hypershift [optional]
+
+As noted above, Hypershift cannot directly tap into this. In a bit more detail, Hypershift has 2 modes of operation for nodepools:
+Replace upgrades - since all machines are deleted and recreated in this model, there is no method of live-application today


Nit: better formatting

djoshy · 2024-04-18T20:09:24Z

enhancements/machine-config/admin-defined-node-disruption-policy.md

+
+In terms of the policy itself, in case a change has multiple matching policies (e.g. two files changed, both with a different policy), all policies will be applied in order (e.g. reload multiple services). Any invalid policy, such as invalid actions or duplicate entries, will be rejected at the API level.
+
+The MachineConfigNodes CR will later be enhanced to track the status of the upgrade, and will also be updated to indicate what the NodeDisruptionPolicy used for the update was.


This makes sense to me. Off the top of my head, I guess we need:

NodeDisruptionPolicyApplySuccessful - this will include the generation of the overall object and MCD can track/wait on this before starting to apply the actions in Status

NodeDisruptionPolicyExecutionSuccessful - perhaps list the target config and the actions applied

NodeDisruptionPolicyExecutionFailure- indicate why failure happened and that the legacy path was taken. Right now I can only think of this being the status not being generated correctly/in time..or in a buggy format that we did not anticipate?

I think we can start off with just a success/fail when the node update gets to that stage. But agreed, we can add a lot of detail here for users to see, if they care

djoshy · 2024-04-22T18:58:20Z

/lgtm
Thanks for working on this with me @yuqi-zhang I believe this is now inline with our current MVP. I think the test can be overridden since we are still in 4.16 development window, but I don't believe I have that power. Would you mind overriding @JoelSpeed , assuming you have no other reservations? (:

yuqi-zhang · 2024-04-23T22:16:05Z

Thanks David :) Based on the current failure, seeing if we can override based on #1436 (comment)

/override ci/prow/markdownlint

openshift-ci · 2024-04-23T22:16:27Z

@yuqi-zhang: yuqi-zhang unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers.

In response to this:

Thanks David :) Based on the current failure, seeing if we can override based on #1436 (comment)

/override ci/prow/markdownlint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2024-05-02T14:14:05Z

@yuqi-zhang: This pull request references MCO-507 which is a valid jira issue.

In response to this:

Enhancement for users to specify how MachineConfig updates should be acted upon.

See: https://issues.redhat.com/browse/OCPSTRAT-380

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

sinnykumari · 2024-05-06T14:23:20Z

All existing comments has been addressed and respective API and MCO implementation has been merged.
Let's take any follow-up feedback anyone may have later on in a follow-up PR.
/approve
Overriding test based on #1525 (comment)
/override ci/prow/markdownlint

openshift-ci · 2024-05-06T14:23:47Z

@sinnykumari: Overrode contexts on behalf of sinnykumari: ci/prow/markdownlint

In response to this:

All existing comments has been addressed and respective API and MCO implementation has been merged.
Let's take any follow-up feedback anyone may have later on in a follow-up PR.
/approve
Overriding test based on #1525 (comment)
/override ci/prow/markdownlint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2024-05-06T14:24:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-05-06T14:24:05Z

@yuqi-zhang: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 6, 2023

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 6, 2023

djoshy reviewed Dec 12, 2023

View reviewed changes

cgwalters reviewed Dec 13, 2023

View reviewed changes

JoelSpeed reviewed Dec 20, 2023

View reviewed changes

cgwalters reviewed Dec 20, 2023

View reviewed changes

yuqi-zhang force-pushed the admin-defined-reboots branch from 38f527e to 977a2af Compare January 4, 2024 01:38

yuqi-zhang force-pushed the admin-defined-reboots branch from 977a2af to b3ae862 Compare February 9, 2024 19:03

yuqi-zhang changed the title ~~MCO-507: admin defined reboot policy enhancement~~ MCO-507: admin defined node disruption policy enhancement Feb 9, 2024

yuqi-zhang marked this pull request as ready for review February 9, 2024 19:04

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 9, 2024

openshift-ci bot requested review from eparis and zaneb February 9, 2024 19:05

yuqi-zhang mentioned this pull request Feb 9, 2024

MCO-1010: Add node disruption policies to MachineConfiguration CRD openshift/api#1764

Merged

yuqi-zhang force-pushed the admin-defined-reboots branch 2 times, most recently from 0516047 to 3ac9a98 Compare February 9, 2024 20:27

sinnykumari reviewed Feb 13, 2024

View reviewed changes

JoelSpeed reviewed Feb 14, 2024

View reviewed changes

yuqi-zhang force-pushed the admin-defined-reboots branch 3 times, most recently from 2cbafab to 0af2246 Compare February 16, 2024 00:06

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 15, 2024

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 22, 2024

yuqi-zhang force-pushed the admin-defined-reboots branch from 0af2246 to 0264993 Compare March 22, 2024 21:37

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 22, 2024

djoshy reviewed Apr 18, 2024

View reviewed changes

MCO: admin defined node disruption policy enhancement

efff3e8

yuqi-zhang force-pushed the admin-defined-reboots branch from 0264993 to efff3e8 Compare April 18, 2024 20:52

openshift-ci bot assigned djoshy Apr 22, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 22, 2024

sinnykumari mentioned this pull request Apr 29, 2024

NO-ISSUE: Add NodeDisruptionPolicy docs openshift/machine-config-operator#4336

Merged

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2024

openshift-merge-bot bot merged commit c29c69d into openshift:master May 6, 2024
2 checks passed

JoelSpeed mentioned this pull request May 23, 2024

WIP: Add Change Management and Maintenance Schedules #1571

Open


		The special MCO cert paths ("/etc/kubernetes/kubelet-ca.crt", "/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem") are not listed in the clusterDefaultPolicies as they are no longer MachineConfig changes

		In terms of the policy itself, in case a change has multiple matching policies (e.g. two files changed, both with a different policy), all policies will be applied in order (e.g. reload multiple services). If any change in the list does not match any defined policy, the MCO will default to rebooting and skip over all the policies.

		apiVersion: machineconfiguration.openshift.io/v1
		kind: RebootPolicy


		### Goals

		* Create a new RebootPolicy CRD to allow users to define how OS updates are actioned on after application


		An admin of an OCP cluster can at any time create/update/delete RebootPolicy objects to reconfigure how they wish updates to be applied. There isn’t any immediate effect so long as the RebootPolicy CR itself is a valid object.

		The MachineConfigController’s drain controller will process any update to the object and validate the RebootPolicy for duplication, action validity (but not correctness), etc. If the object is valid, the drain controller will update the object’s status and observedGeneration to indicate success, and if not, update those and degrade the controller, example:


		The MachineConfigDaemon will process and apply the current RebootPolicy if it’s valid, indicated by generation == observedGeneration (MCC has processed the latest spec) and RebootPolicyValidated == true/RebootPolicyValidationFailing == false. If not, the daemon will also degrade (and queue a re-sync) waiting for the controller to validate the object.

		For now, we only support one cluster-wide rebootpolicy, and as such any rebootpolicy not named “cluster” will be ignored and the MCO will report errors via degrading the operator.


		For now, we only support one cluster-wide rebootpolicy, and as such any rebootpolicy not named “cluster” will be ignored and the MCO will report errors via degrading the operator.

		When the user first enables RebootPolicy via featuregate (or at some version, by default), the “cluster” RebootPolicy object will be generated with the existing MachineConfigDaemon defaults today, so the user can also view the default policies that came with the cluster. The user can update these fields if they wish to use a different workflow via userPolicies, but are not allowed to modify clusterDefaultPolicies, which is reserved for MCO to update these as we wish in the future. The fields that exist today are as highlighted above.


		The user thereby needs to understand that no additional verification is taking place, and that they accept any responsibility for RebootPolicies causing configuration not to take effect.

		Given the popularity of using pause on MachineConfigPools, it is possible some users will exploit this to skip upgrade reboots. That will likely not break the cluster but will put the configuration in an awkward spot, as we can no longer guarantee that the configuration a node claims to be on is actually applied.


		As noted in the non-goal sections, any RebootPolicy is not verified in the MCO, minus formatting. This means that if you accidentally define a change that needed a reboot to take effect, the cluster would “finish the update” without being able to use the new configuration.

		The user thereby needs to understand that no additional verification is taking place, and that they accept any responsibility for RebootPolicies causing configuration not to take effect.


		Which, when applied, will define actions for all future changes to these MachineConfig fields based on the diff of the update (rendered MachineConfigs).

		For this initial implementation we will support MachineConfig changes to:


		### Goals

		* Create a NodeDisruptionPolicyConfig subfield in the MachineConfiguration CRD to allow users to define how OS updates are actioned on after application


		### API Extensions

		- Adds fields to the MachineConfiguration CRD


		When the user first enables the new nodeDisruptionPolicySpec via featuregate (or at some version, by default), the "default" MachineConfiguration object will be generated with the existing cluster defaults by the MachineConfigController, so the user can also view the default policies that came with the cluster. The user can update these fields if they wish to use a different workflow via userPolicies, but are not allowed to modify clusterDefaultPolicies, which is reserved for MCO to update these as we wish in the future. The controller will enforce this when generating the status. The fields that exist today are as highlighted above.

		Note that /etc/containers/registries.conf has a “Special” keyword. This is due to the extra processing the MCO does today. No user defined policy may use the “Special” keyword.


		In terms of the policy itself, in case a change has multiple matching policies (e.g. two files changed, both with a different policy), all policies will be applied in order (e.g. reload multiple services). If any change in the list does not match any defined policy, the MCO will default to rebooting and skip over all the policies.

		The MachineConfigNodes CR will track the status of the upgrade, and will also be updated to indicate what the NodeDisruptionPolicy used for the update was.


		An admin of an OCP cluster can at any time create/update/delete nodeDisruptionPolicy fields in the MachineConfiguration object to reconfigure how they wish updates to be applied. There isn’t any immediate effect so long as the CR itself is valid.

		The MachineConfigOperator pod will process any update to the object and validate the NodeDisruptionPolicy for potential issues, although most, if not all are covered by API-level validation. The MCO pod will then merge the user provided contents to the cluster defaults (user contents will always override cluster defaults) and populate the nodeDisruptionPolicyStatus. Success and errors will also be reflected in the status:


		In terms of the policy itself, in case a change has multiple matching policies (e.g. two files changed, both with a different policy), all policies will be applied in order (e.g. reload multiple services). Any invalid policy, such as invalid actions or duplicate entries, will be rejected at the API level.

		The MachineConfigNodes CR will later be enhanced to track the status of the upgrade, and will also be updated to indicate what the NodeDisruptionPolicy used for the update was.

MCO-507: admin defined node disruption policy enhancement #1525

MCO-507: admin defined node disruption policy enhancement #1525

Conversation

yuqi-zhang commented Dec 6, 2023 • edited Loading

openshift-ci-robot commented Dec 6, 2023 • edited by openshift-ci bot Loading

openshift-ci bot commented Dec 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuqi-zhang commented Jan 4, 2024 • edited Loading

yuqi-zhang commented Feb 9, 2024

yuqi-zhang commented Feb 9, 2024

dhellmann commented Feb 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoelSpeed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-bot commented Mar 15, 2024

yuqi-zhang commented Dec 6, 2023 •

edited

Loading

openshift-ci-robot commented Dec 6, 2023 •

edited by openshift-ci bot

Loading

yuqi-zhang commented Jan 4, 2024 •

edited

Loading