Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerd is not upgraded because it is not restarted #9019

Closed
snowmansora opened this issue Jun 21, 2022 · 5 comments · Fixed by #9322
Closed

containerd is not upgraded because it is not restarted #9019

snowmansora opened this issue Jun 21, 2022 · 5 comments · Fixed by #9322
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@snowmansora
Copy link

Environment:

  • Version of Ansible (ansible --version):
    ansible==4.10.0
    ansible-core==2.11.12
  • Version of Python (python --version):
    2.7.5

Kubespray version (commit) (git rev-parse --short HEAD):
v2.19.0

Command used to invoke ansible:
ansible-playbook upgrade-cluster.yml

Anything else do we need to know:
On a cluster deployed using kubespray-v2.18.0, after running ansible-playbook upgrade-cluster.yml using kubespray-v2.19.0, containerd is still using old version as shown in kubectl get nodes -o wide. After restarting the containerd service, new version is used.

The problem is that the handler notify: restart containerd is never ran, despite that is notified multiple times in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/containerd/tasks/main.yml.

After some troubleshooting, the problem should be caused by the import_role in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/validate-container-engine/tasks/main.yml#L90, because after I updated it to include_role, the problem is gone.

My GUESS is that the when conditions are added to the containerd handlers when import_role in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/validate-container-engine/tasks/main.yml#L90, and since my upgrade doesn't match those conditions (e.g. container_manager != "containerd"), the handler is not ran... What a confusing ansible behaviour...

Also, if you just run ansible-playbook cluster.yml on a fresh node, you will see the handler is never ran as well, but containerd will still be running because of https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/containerd/tasks/main.yml#L100. I tried that with ansible==5.7.1 ansible-core==2.12.6 python 3.10, and the problem happens as well. So the problem doesn't look like an old ansible behaviour.

@snowmansora snowmansora added the kind/bug Categorizes issue or PR as related to a bug. label Jun 21, 2022
@cloud-66
Copy link
Contributor

cloud-66 commented Aug 1, 2022

i also had this problem today. Even then i change containerd preferences and run cluster.yml --tags=containerd , config changes on nodes but containerd wasnt been restarted. I have to restart it manually to take effect of new options in config.

@fungusakafungus
Copy link
Contributor

Can confirm by using debugger: always on handler, handler is skipped because it was loaded before with a false condition:

RUNNING HANDLER [container-engine/containerd : restart containerd] *************************************************************************************************************************************************************************************************************
Friday 23 September 2022  16:42:37 +0200 (0:00:00.181)       0:01:18.666 ******
skipping: [node-2.company.com]
[node-2.company.com] HANDLER: container-engine/containerd : restart containerd (debug)> p task.when
['not (is_ostree or (ansible_distribution == "Flatcar Container Linux by '
 'Kinvolk") or (ansible_distribution == "Flatcar"))',
 'container_manager != "containerd"',
 'docker_installed.matched == 0',
 'containerd_installed.matched > 0',
 "ansible_facts.services[service_name]['state'] == 'running'"]
[node-2.company.com] HANDLER: container-engine/containerd : restart containerd (debug)> User interrupted execution

@rptaylor
Copy link
Contributor

rptaylor commented Sep 23, 2022

My GUESS is that the when conditions are added to the containerd handlers when import_role

That is exactly how import_* works. (See also #9279 , https://serverfault.com/questions/875247/whats-the-difference-between-include-tasks-and-import-tasks )

since my upgrade doesn't match those conditions (e.g. container_manager != "containerd"

That part seems less clear, why is container_manager != "containerd" ? Since #8439 was added it looks like this would cause containerd to be uninstalled if container_manager != "containerd".

In any case the last handler of a given name that is invoked should override earlier ones, so the handlers invoked normally in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/containerd/tasks/main.yml should override this one: https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/validate-container-engine/tasks/main.yml#L90
unless there is an ansible bug...

@rptaylor
Copy link
Contributor

It's also slightly odd that the restart containerd handler is a no-op that just calls another handler:
https://github.com/kubernetes-sigs/kubespray/blob/master/roles/container-engine/containerd/handlers/main.yml
Probably the other handlers should listen instead.

It should work though but might be part of the corner case that could explain why this might be hitting some obscure bug (I looked through ansible issues but didn't find one that seemed pertinent to this).

@rptaylor
Copy link
Contributor

"the location where it is inserted doesn't affect when the handlers are added"
ansible/ansible#78871 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants