feat: Add `reboot-required` annotation #715

timo-42 · 2023-01-10T23:28:59Z

ckotzbauer

Thanks for your PR. I added one smaller comment above.
I think it would make sense to only add/remove this annotation when the --annotate-nodes flag is set. This would streamline the new annotation with the others.

cmd/kured/main.go

timo-42 · 2023-01-13T18:41:54Z

Thanks for your PR. I added one smaller comment above. I think it would make sense to only add/remove this annotation when the --annotate-nodes flag is set. This would streamline the new annotation with the others.

Put the new annotation behind the annotation toggle.

timo-42 · 2023-01-17T08:08:37Z

@ckotzbauer @dholbach I pushed the changes you requested.

timo-42 · 2023-01-25T17:47:00Z

@ckotzbauer @dholbach Can you please have a look?

ckotzbauer · 2023-01-26T06:46:19Z

@timo-42 I have not forgotten, but I am a bit busy at the moment.

vitality411 · 2023-02-07T11:59:31Z

@timo-42 Thank you for this feature. I am very interested in this.

Does it work as expected? When I understand the code correctly, then annotateNodes is only executed during the reboot time window.

Outside reboot time window:

k logs kured-knvc5 -f
time="2023-02-07T09:45:58Z" level=info msg="Binding node-id command flag to environment variable: KURED_NODE_ID"
time="2023-02-07T09:45:58Z" level=info msg="Kubernetes Reboot Daemon: c7b7d6a"
time="2023-02-07T09:45:58Z" level=info msg="Node ID: node1"
time="2023-02-07T09:45:58Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
time="2023-02-07T09:45:58Z" level=info msg="Lock TTL not set, lock will remain until being released"
time="2023-02-07T09:45:58Z" level=info msg="Lock release delay set, lock release will be delayed by: 30m0s"
time="2023-02-07T09:45:58Z" level=info msg="PreferNoSchedule taint: "
time="2023-02-07T09:45:58Z" level=info msg="Blocking Pod Selectors: []"
time="2023-02-07T09:45:58Z" level=info msg="Reboot schedule: ---MonTueWedThuFri--- between 02:30 and 06:00 Europe/Berlin"
time="2023-02-07T09:45:58Z" level=info msg="Reboot check command: [test -f /var/run/reboot-required] every 1m0s"
time="2023-02-07T09:45:58Z" level=info msg="Reboot command: [/bin/systemctl reboot]"
time="2023-02-07T09:45:58Z" level=info msg="Will annotate nodes during kured reboot operations"

k exec -it  kured-knvc5 -- sh
/ # wget -qO- 127.0.0.1:8080/metrics | grep kured
# HELP kured_reboot_required OS requires reboot due to software updates.
# TYPE kured_reboot_required gauge
kured_reboot_required{node="node1"} 1

k describe node node1 | grep weave.works
<empty>

But when it is in the reboot time window, reboot is required (and it is possible to acquire the lock) then the drain and reboots follows quickly:

k exec -it  kured-pxkjr -- sh
/ # wget -qO- 127.0.0.1:8080/metrics | grep kured
# HELP kured_reboot_required OS requires reboot due to software updates.
# TYPE kured_reboot_required gauge
kured_reboot_required{node="node1"} 1

k logs kured-pxkjr -f
time="2023-02-07T10:20:21Z" level=info msg="Binding node-id command flag to environment variable: KURED_NODE_ID"
time="2023-02-07T10:20:21Z" level=info msg="Kubernetes Reboot Daemon: c7b7d6a"
time="2023-02-07T10:20:21Z" level=info msg="Node ID: node1"
time="2023-02-07T10:20:21Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
time="2023-02-07T10:20:21Z" level=info msg="Lock TTL not set, lock will remain until being released"
time="2023-02-07T10:20:21Z" level=info msg="Lock release delay set, lock release will be delayed by: 30m0s"
time="2023-02-07T10:20:21Z" level=info msg="PreferNoSchedule taint: "
time="2023-02-07T10:20:21Z" level=info msg="Blocking Pod Selectors: []"
time="2023-02-07T10:20:21Z" level=info msg="Reboot schedule: ---MonTueWedThuFri--- between 10:00 and 13:00 Europe/Berlin"
time="2023-02-07T10:20:21Z" level=info msg="Reboot check command: [test -f /var/run/reboot-required] every 1m0s"
time="2023-02-07T10:20:21Z" level=info msg="Reboot command: [/bin/systemctl reboot]"
time="2023-02-07T10:20:21Z" level=info msg="Will annotate nodes during kured reboot operations"
time="2023-02-07T10:22:56Z" level=info msg="Reboot required"
time="2023-02-07T10:22:56Z" level=info msg="Adding node node1 annotation: weave.works/reboot-required=true"
time="2023-02-07T10:22:56Z" level=info msg="Adding node node1 annotation: weave.works/kured-reboot-in-progress=2023-02-07T10:22:56Z"
time="2023-02-07T10:22:56Z" level=info msg="Adding node node1 annotation: weave.works/kured-most-recent-reboot-needed=2023-02-07T10:22:56Z"
time="2023-02-07T10:22:56Z" level=info msg="Acquired reboot lock"
time="2023-02-07T10:22:56Z" level=info msg="Draining node node1"
...
time="2023-02-07T10:23:32Z" level=info msg="Running command: [/usr/bin/nsenter -m/proc/1/ns/mnt -- /bin/systemctl reboot] for node: node1"
time="2023-02-07T10:23:32Z" level=info msg="Waiting for reboot"

k describe node node1 | grep weave.works
                    weave.works/kured-most-recent-reboot-needed: 2023-02-07T10:22:56Z
                    weave.works/kured-reboot-in-progress: 2023-02-07T10:22:56Z
                    weave.works/reboot-required: true

After the reboot:

k exec -it  kured-pxkjr -- sh
/ # wget -qO- 127.0.0.1:8080/metrics | grep kured
# HELP kured_reboot_required OS requires reboot due to software updates.
# TYPE kured_reboot_required gauge
kured_reboot_required{node="node1"} 0

k logs kured-pxkjr -f
...
time="2023-02-07T10:25:35Z" level=info msg="Deleting node node1 annotation weave.works/reboot-required"
time="2023-02-07T10:25:35Z" level=info msg="Holding lock"
time="2023-02-07T10:25:35Z" level=info msg="Uncordoning node node1"
time="2023-02-07T10:25:35Z" level=info msg="Deleting node node1 annotation weave.works/kured-reboot-in-progress"
time="2023-02-07T10:25:35Z" level=info msg="Delaying lock release by 30m0s"

k describe node node1 | grep weave.works
                    weave.works/kured-most-recent-reboot-needed: 2023-02-07T10:22:56Z

In my opinion, this defeats the purpose of your issue #702.

I think for this to work, a separate goroutine would be needed like the maintainRebootRequiredMetric goroutine, which updates the kured_reboot_required metric accordingly outside of the reboot time window.

Please let me know if it works for you as expected and if I have misunderstood something.

FYI: I opened #725 to discuss the issue of the "restart required" log message outside the restart window.

timo-42 · 2023-02-07T20:34:13Z

You're right. I did not consider checking the state outside the window! Moved the code to a separate goroutine.
The question is: Should we create a separate goroutine for prometheus metric and the annotations or should we merge it?

vitality411 · 2023-02-08T06:20:19Z

I'm not a developer per se, but I think it would have advantages for the memory footprint to have only one goroutine. Especially since they partially execute the same commands.

Signed-off-by: Timo Haas <haastimo@gmx.de>

timo-42 · 2023-02-08T11:30:56Z

Moved code into metric goroutine.

timo-42 · 2023-02-22T19:06:32Z

@vitality411 @ckotzbauer Is there anything more I should do?

vitality411 · 2023-02-23T06:13:12Z

Hi @timo-42, from my side it looks good.

jackfrancis · 2023-02-28T21:42:15Z

@timo-42 Thank you for your patience during the review process, I'm back online and able to do this responsively going forward.

As I understand it, what you'd like to accomplish here is to not schedule certain workloads on nodes that are running kured, because those nodes will responsively reboot, and cordon/drain is not sufficient to gracefully evict and reschedule those workloads. Is that correct?

If so, I wonder if this issue would be a more elegant solution: #737

My observation: there is a problem with using a node annotation via the kured daemonset to identify "I am running the kured daemonset", because if for any reason that daemonset is removed manually, and/or the appropriate node taints are added so that it is no longer running kured, that annotation will have to be manually removed (and inevitably this will not happen 100% of the time, and so that annotation will not be 100% accurate in reflecting the state of that node).

timo-42 · 2023-03-02T19:12:31Z

@jackfrancis

As I understand it, what you'd like to accomplish here is to not schedule certain workloads on nodes that are running kured, because those nodes will responsively reboot, and cordon/drain is not sufficient to gracefully evict and reschedule those workloads. Is that correct?

Yes. We want to not schedule new workload on it and want to be notified that a node needs a reboot, so we can start draining the pods in their respective maintenance windows.

If so, I wonder if this issue would be a more elegant solution: #737

If I understand this correctly, the proposal is to run an arbitrary kubectl command before draining. This is already to late in the reboot process for us. We need days to drain a node, therefore we need the annotation as soon as possible and before the kured acquires the cluster lock and runs the reboot command.

My observation: there is a problem with using a node annotation via the kured daemonset to identify "I am running the kured daemonset", because if for any reason that daemonset is removed manually, and/or the appropriate node taints are added so that it is no longer running kured, that annotation will have to be manually removed (and inevitably this will not happen 100% of the time, and so that annotation will not be 100% accurate in reflecting the state of that node).

If the concern is cleanup, than we could add a pre delete hook to delete all dangling annotations.

jackfrancis · 2023-03-02T19:42:30Z

@timo-42 Thanks for the explanation.

Here's my thinking: by putting the annotation on the node as soon as kured launches its daemonset runtime you are essentially declaring: don't schedule any non-evictable workloads on this node because I may be rebooted at some point in the future. And then you would begin the process of evicting existing workloads, which may take days. But it's not clear to me how having this advance notice will actually guarantee that the underlying host OS will not put a /var/run/reboot-required file before that days-long eviction completes. And so you are still at risk of the node rebooting during the workload eviction process.

It would seem to me that the workload requirements you're outlining would forbid kured from running at all on the nodes that are running those workloads. What is the downside of simply tainting such nodes and not having kured run on it. I'm not sure that kured is as it currently is a viable solution for these operational requirements.

@dholbach @ckotzbauer @evrardjp any other thoughts?

timo-42 · 2023-03-04T20:21:48Z

@jackfrancis

It would seem to me that the workload requirements you're outlining would forbid kured from running at all on the nodes that are running those workloads. What is the downside of simply tainting such nodes and not having kured run on it. I'm not sure that kured is as it currently is a viable solution for these operational requirements.

This is easy :) - I use --blocking-pod-selector to prevent rebooting nodes which have these kinds of workloads.

jackfrancis · 2023-03-06T17:23:30Z

@timo-42 Thanks for clarifying, that makes sense!

In this case this would be my suggestion (curious for others' thoughts as well):

Rather than add a third annotation to a node to describe kured reboot state, let's simply re-order the steps in the flow where we annotate "weave.works/kured-reboot-in-progress". At present, we short-circuit the flow if a matching blocking pod selector is found, but we can short-circuit due to that critera later in the flow. For example:

If a reboot is required:

Add the "weave.works/kured-reboot-in-progress" annotation
Add the "prefer no schedule" taint to the node
If reboot is blocked due to matching pod selector, continue
a. Else, cordon/drain and reboot node

The above should solve for your use-case. You may use the "weave.works/kured-reboot-in-progress" annotation in your own workflows to prevent new, non-interruptible workloads from landing on this node (or you may be able to rely upon the "prefer no schedule" taint, but that taint is not strict, so FYI). Only when existing, non-interruptible workloads have gracefully exited, and the pod selector finds no match, will the node actually cordon/drain + reboot.

@ckotzbauer @dholbach @evrardjp if they have any criticisms to the above

ckotzbauer · 2023-03-07T06:17:11Z

I like the idea @jackfrancis. I'm not fully aware of the current implementation of "weave.works/kured-reboot-in-progress": Would this be a breaking change?

jackfrancis · 2023-03-07T19:13:07Z

@ckotzbauer In my view it wouldn't be a breaking change. The original implementation of "weave.works/kured-reboot-in-progress" was meant to annotate the node as soon as we know it will eventually be rebooted. One particular condition that can delay a reboot from happening right away is the inability to acquire the exclusive node reboot lock. But we still want to annotate the node because in the next loop we may require that lock, and thus we still don't want applications that are consuming that annotation to consider that node as a viable node for scheduling new workloads.

It seems that the blocking pod selector criteria was overlooked when implementing the "weave.works/kured-reboot-in-progress" annotation, but that criteria is functionally equivalent to the exclusive node reboot lock: in the next loop there may be no more pods that match the blocking pod criteria, and so we are in a similar situation where we know that the node will eventually be rebooted.

tl;dr I think there's a bug in the implementation of the "weave.works/kured-reboot-in-progress" node annotation, and if we fix that bug, then @timo-42 and other folks can use it to build conditional scheduling flows around.

@timo-42 does that make sense? I'd be happy to fix that bug myself.

timo-42 · 2023-03-16T09:51:02Z

@jackfrancis This is exactly what we need 👍 . Thanks for the inside regarding the annotation and taking care of this bug yourself. I will than change our internal code to use weave.works/kured-reboot-in-progress annotation, but this is an easy change.
This makes this PR obsolete and it can be closed as soon as your Fix PR is opened.

timo-42 · 2023-04-15T07:41:15Z

Closing since #749 got merged.

timo-42 force-pushed the reboot-required-annoation branch from 3a15eec to 272c48b Compare January 10, 2023 23:29

timo-42 changed the title ~~feat: Add reboot required annotation~~ feat: Add reboot-required annotation Jan 11, 2023

ckotzbauer self-requested a review January 11, 2023 18:27

ckotzbauer requested changes Jan 11, 2023

View reviewed changes

cmd/kured/main.go Outdated Show resolved Hide resolved

ckotzbauer requested a review from jackfrancis January 11, 2023 18:39

ckotzbauer added the enhancement label Jan 11, 2023

ckotzbauer linked an issue Jan 11, 2023 that may be closed by this pull request

New Annotation reboot-required #702

Closed

timo-42 force-pushed the reboot-required-annoation branch 2 times, most recently from 939f18a to 53e009a Compare January 13, 2023 18:29

timo-42 force-pushed the reboot-required-annoation branch from 53e009a to c7b7d6a Compare January 13, 2023 18:44

timo-42 force-pushed the reboot-required-annoation branch from c7b7d6a to bacb687 Compare February 7, 2023 20:28

feat: Add reboot required annotation

11937fe

Signed-off-by: Timo Haas <haastimo@gmx.de>

timo-42 force-pushed the reboot-required-annoation branch from bacb687 to 11937fe Compare February 8, 2023 11:29

timo-42 closed this Apr 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `reboot-required` annotation #715

feat: Add `reboot-required` annotation #715

timo-42 commented Jan 10, 2023

ckotzbauer left a comment

timo-42 commented Jan 13, 2023

timo-42 commented Jan 17, 2023 •

edited

Loading

timo-42 commented Jan 25, 2023

ckotzbauer commented Jan 26, 2023

vitality411 commented Feb 7, 2023

timo-42 commented Feb 7, 2023

vitality411 commented Feb 8, 2023

timo-42 commented Feb 8, 2023

timo-42 commented Feb 22, 2023

vitality411 commented Feb 23, 2023

jackfrancis commented Feb 28, 2023

timo-42 commented Mar 2, 2023 •

edited

Loading

jackfrancis commented Mar 2, 2023

timo-42 commented Mar 4, 2023

jackfrancis commented Mar 6, 2023

ckotzbauer commented Mar 7, 2023

jackfrancis commented Mar 7, 2023

timo-42 commented Mar 16, 2023

timo-42 commented Apr 15, 2023

feat: Add reboot-required annotation #715

feat: Add reboot-required annotation #715

Conversation

timo-42 commented Jan 10, 2023

ckotzbauer left a comment

Choose a reason for hiding this comment

timo-42 commented Jan 13, 2023

timo-42 commented Jan 17, 2023 • edited Loading

timo-42 commented Jan 25, 2023

ckotzbauer commented Jan 26, 2023

vitality411 commented Feb 7, 2023

timo-42 commented Feb 7, 2023

vitality411 commented Feb 8, 2023

timo-42 commented Feb 8, 2023

timo-42 commented Feb 22, 2023

vitality411 commented Feb 23, 2023

jackfrancis commented Feb 28, 2023

timo-42 commented Mar 2, 2023 • edited Loading

jackfrancis commented Mar 2, 2023

timo-42 commented Mar 4, 2023

jackfrancis commented Mar 6, 2023

ckotzbauer commented Mar 7, 2023

jackfrancis commented Mar 7, 2023

timo-42 commented Mar 16, 2023

timo-42 commented Apr 15, 2023

feat: Add `reboot-required` annotation #715

feat: Add `reboot-required` annotation #715

timo-42 commented Jan 17, 2023 •

edited

Loading

timo-42 commented Mar 2, 2023 •

edited

Loading