Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate gorotine for "Reboot required" #725

Open
vitality411 opened this issue Feb 7, 2023 · 2 comments
Open

Separate gorotine for "Reboot required" #725

vitality411 opened this issue Feb 7, 2023 · 2 comments
Labels
keep This won't be closed by the stale bot.

Comments

@vitality411
Copy link

Hello,
to my knowledge, it is currently not possible to find out beforehand that a reboot is required over the logs.

When I understand the code correctly, then

log.Infof("Reboot required")

is only executed during the reboot time window in the rebootAsRequired gorotine.

Outside reboot time window:

k logs kured-knvc5 -f
time="2023-02-07T09:45:58Z" level=info msg="Binding node-id command flag to environment variable: KURED_NODE_ID"
time="2023-02-07T09:45:58Z" level=info msg="Kubernetes Reboot Daemon: c7b7d6a"
time="2023-02-07T09:45:58Z" level=info msg="Node ID: node1"
time="2023-02-07T09:45:58Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
time="2023-02-07T09:45:58Z" level=info msg="Lock TTL not set, lock will remain until being released"
time="2023-02-07T09:45:58Z" level=info msg="Lock release delay set, lock release will be delayed by: 30m0s"
time="2023-02-07T09:45:58Z" level=info msg="PreferNoSchedule taint: "
time="2023-02-07T09:45:58Z" level=info msg="Blocking Pod Selectors: []"
time="2023-02-07T09:45:58Z" level=info msg="Reboot schedule: ---MonTueWedThuFri--- between 02:30 and 06:00 Europe/Berlin"
time="2023-02-07T09:45:58Z" level=info msg="Reboot check command: [test -f /var/run/reboot-required] every 1m0s"
time="2023-02-07T09:45:58Z" level=info msg="Reboot command: [/bin/systemctl reboot]"
time="2023-02-07T09:45:58Z" level=info msg="Will annotate nodes during kured reboot operations"
<no further logs>

Due to separate gorotine gomaintainRebootRequiredMetric metric is updated accordingly and indicates that a reboot is in fact required:

k exec -it kured-knvc5 -- sh
/ # wget -qO- 127.0.0.1:8080/metrics | grep kured
# HELP kured_reboot_required OS requires reboot due to software updates.
# TYPE kured_reboot_required gauge
kured_reboot_required{node="node1"} 1

When kured is in the reboot time window, reboot is required (and it is possible to acquire the lock) then the drain and reboots follows quickly:

k exec -it kured-pxkjr -- sh
/ # wget -qO- 127.0.0.1:8080/metrics | grep kured
# HELP kured_reboot_required OS requires reboot due to software updates.
# TYPE kured_reboot_required gauge
kured_reboot_required{node="node1"} 1

k logs kured-pxkjr -f
time="2023-02-07T10:20:21Z" level=info msg="Binding node-id command flag to environment variable: KURED_NODE_ID"
time="2023-02-07T10:20:21Z" level=info msg="Kubernetes Reboot Daemon: c7b7d6a"
time="2023-02-07T10:20:21Z" level=info msg="Node ID: node1"
time="2023-02-07T10:20:21Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
time="2023-02-07T10:20:21Z" level=info msg="Lock TTL not set, lock will remain until being released"
time="2023-02-07T10:20:21Z" level=info msg="Lock release delay set, lock release will be delayed by: 30m0s"
time="2023-02-07T10:20:21Z" level=info msg="PreferNoSchedule taint: "
time="2023-02-07T10:20:21Z" level=info msg="Blocking Pod Selectors: []"
time="2023-02-07T10:20:21Z" level=info msg="Reboot schedule: ---MonTueWedThuFri--- between 10:00 and 13:00 Europe/Berlin"
time="2023-02-07T10:20:21Z" level=info msg="Reboot check command: [test -f /var/run/reboot-required] every 1m0s"
time="2023-02-07T10:20:21Z" level=info msg="Reboot command: [/bin/systemctl reboot]"
time="2023-02-07T10:20:21Z" level=info msg="Will annotate nodes during kured reboot operations"
time="2023-02-07T10:22:56Z" level=info msg="Reboot required"
time="2023-02-07T10:22:56Z" level=info msg="Adding node node1 annotation: weave.works/kured-reboot-in-progress=2023-02-07T10:22:56Z"
time="2023-02-07T10:22:56Z" level=info msg="Adding node node1 annotation: weave.works/kured-most-recent-reboot-needed=2023-02-07T10:22:56Z"
time="2023-02-07T10:22:56Z" level=info msg="Acquired reboot lock"
time="2023-02-07T10:22:56Z" level=info msg="Draining node node1"
...
time="2023-02-07T10:23:32Z" level=info msg="Running command: [/usr/bin/nsenter -m/proc/1/ns/mnt -- /bin/systemctl reboot] for node: node1"
time="2023-02-07T10:23:32Z" level=info msg="Waiting for reboot"

In my opinion, it defeats the purpose to log "Reboot required" just before the reboot.

I think it would be better to check the sentinel and log "Reboot required" over a separate gorotine outside the reboot time window like it is done with the maintainRebootRequiredMetric gorotine. Then there would also be no mismatch between the two.

What do you think?

@github-actions
Copy link

github-actions bot commented Apr 9, 2023

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

@vitality411
Copy link
Author

Let's keep this opened

@ckotzbauer ckotzbauer added keep This won't be closed by the stale bot. and removed no-issue-activity labels Apr 13, 2023
evrardjp added a commit to evrardjp/kured that referenced this issue Nov 11, 2024
Without this patch, one metric could say "reboot is required"
while the rebootAsRequired tick did not run (long period for
example).

This is a problem, as it leads to misexpectations: "Why
did the system not reboot, while the metrics indicate a reboot
was required".

This solves it by inlining the metrics management within the
rebootAsRequired goroutine.

Closes: kubereboot#725

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep This won't be closed by the stale bot.
Projects
None yet
Development

No branches or pull requests

2 participants