TASK [k3s/post : Wait for MetalLB resources]
fails when updating metallb
#146
-
Expected Behaviorrunning ansible-playbook should update the cluster and run all tasks successfully Current BehaviorWhen running ansible-playbook for updating the cluster, it fails on Steps to Reproduce
Context (variables)Operating system: Ubuntu 22.04.1 LTS Hardware: 2x Raspberry Pi 4b 8GB Variables Used
---
k3s_version: v1.24.6+k3s1
# this is the user that has ssh access to these machines
ansible_user: pirate
systemd_dir: /etc/systemd/system
# Set your timezone
system_timezone: "Europe/Prague"
# interface which will be used for flannel
flannel_iface: "eth0"
# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.10.10"
# k3s_token is required masters can talk together securely
# this token should be alpha numeric only
k3s_token: ''
# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'
# Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"
# these arguments are recommended for servers as well as agents:
extra_args: >-
--flannel-iface={{ flannel_iface }}
--node-ip={{ k3s_node_ip }}
# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
{{ extra_args }}
{{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
--tls-san {{ apiserver_endpoint }}
--disable servicelb
extra_agent_args: >-
{{ extra_args }}
# image tag for kube-vip
kube_vip_tag_version: "v0.5.5"
# image tag for metal lb
metal_lb_speaker_tag_version: "v0.13.6"
metal_lb_controller_tag_version: "v0.13.6"
# metallb ip range for load balancer
metal_lb_ip_range: "192.168.10.11-192.168.30.49"
Hosts
[master]
192.168.10.8
192.168.10.9
[node]
[k3s_cluster:children]
master
node
Possible Solutiondelete old replicasets. Once there are no old replicasets the tasks finish correctly. Maybe there could be a task that deletes the replicasets after the metallb is updated or changed. Now it leaves it there with everything set to 0.
Why it is happeningI have identified the issue with replicasets that are left behind after metallb update, and the task expects to find only one when checking. it fails on this Task fails with this error.
|
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
@vdovhanych can you please fill out the issue template properly? It's impossible to to what version of anything you are using |
Beta Was this translation helpful? Give feedback.
-
@timothystewart6 sorry for that. I updated it according to the template. I assume this is here for at least 3 k3s version updates, this has failed for me for 3rd time now, and I started to investigate. As I mentioned in the issue, it's related to the old replicates that are, for some reason, left behind, and we specify in the task |
Beta Was this translation helpful? Give feedback.
-
Was updating to the issue is still there, ansible-playbook failed on the same task It was resolved by deleting the old replicaset. If you take a look below, there is an old replicaset for metallb left behind, and everything is at zero. After I deleted the replicaset |
Beta Was this translation helpful? Give feedback.
-
thanks that worked for me too, even had two "zombi" controller |
Beta Was this translation helpful? Give feedback.
-
This was super helpful for me as well. Thank you for documenting your troubleshooting process! |
Beta Was this translation helpful? Give feedback.
-
Same issue for me, thanks for the solution! |
Beta Was this translation helpful? Give feedback.
Was updating to
v1.24.7+k3s1
andmetal_lb_speaker_tag_version: "v0.13.7"
,metal_lb_controller_tag_version: "v0.13.7"
the issue is still there, ansible-playbook failed on the same task
failed: [192.168.10.8] (item=ready replicas of controller) => changed=false
It was resolved by deleting the old replicaset. If you take a look below, there is an old replicaset for metallb left behind, and everything is at zero.
After I deleted the replicaset
k delete replicasets.apps -n metallb-system controller-5888676bc9
and run theansible-playbook site.yml -i inventory/my-cluster/hosts.ini
all of the tasks finished correctly.