`TASK [k3s/post : Wait for MetalLB resources]` fails when updating metallb #146

vdovhanych · 2022-10-22T13:25:06Z

vdovhanych
Oct 22, 2022

Expected Behavior

running ansible-playbook should update the cluster and run all tasks successfully

Current Behavior

When running ansible-playbook for updating the cluster, it fails on TASK [k3s/post : Wait for MetalLB resources]

Steps to Reproduce

try to update the cluster with ansible-playbook site.yml -i inventory/my-cluster/hosts.ini
wait for ansible to update everything
metallb leaves previous replicaset
task Wait for MetalLB resources fails due to > 1 replicaset in metallb-system namespace

Context (variables)

Operating system: Ubuntu 22.04.1 LTS

Hardware: 2x Raspberry Pi 4b 8GB

Variables Used

all.yml

---
k3s_version: v1.24.6+k3s1
# this is the user that has ssh access to these machines
ansible_user: pirate
systemd_dir: /etc/systemd/system

# Set your timezone
system_timezone: "Europe/Prague"

# interface which will be used for flannel
flannel_iface: "eth0"

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.10.10"

# k3s_token is required  masters can talk together securely
# this token should be alpha numeric only
k3s_token: ''

# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'

# Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"

# these arguments are recommended for servers as well as agents:
extra_args: >-
  --flannel-iface={{ flannel_iface }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
extra_agent_args: >-
  {{ extra_args }}

# image tag for kube-vip
kube_vip_tag_version: "v0.5.5"

# image tag for metal lb
metal_lb_speaker_tag_version: "v0.13.6"
metal_lb_controller_tag_version: "v0.13.6"

# metallb ip range for load balancer
metal_lb_ip_range: "192.168.10.11-192.168.30.49"

Hosts

host.ini

[master]
192.168.10.8
192.168.10.9

[node]

[k3s_cluster:children]
master
node

Possible Solution

delete old replicasets. Once there are no old replicasets the tasks finish correctly.

Maybe there could be a task that deletes the replicasets after the metallb is updated or changed. Now it leaves it there with everything set to 0.

I've checked the General Troubleshooting Guide

Why it is happening

I have identified the issue with replicasets that are left behind after metallb update, and the task expects to find only one when checking. it fails on this

Task fails with this error.

TASK [k3s/post : Wait for MetalLB resources] *********************************************************************************************************************************************************
Saturday 22 October 2022  14:45:19 +0200 (0:00:02.281)       0:02:06.666 ****** 
ok: [192.168.10.8] => (item=controller)
ok: [192.168.10.8] => (item=webhook service)
ok: [192.168.10.8] => (item=pods in replica sets)
failed: [192.168.10.8] (item=ready replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "replicaset", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.readyReplicas}=1", "--timeout=120s"], "delta": "0:00:00.688827", "end": "2022-10-22 14:45:26.583497", "item": {"condition": "--for=jsonpath='{.status.readyReplicas}'=1", "description": "ready replicas of controller", "resource": "replicaset", "selector": "component=controller,app=metallb"}, "msg": "non-zero return code", "rc": 1, "start": "2022-10-22 14:45:25.894670", "stderr": "readyReplicas is not found\nreadyReplicas is not found", "stderr_lines": ["readyReplicas is not found", "readyReplicas is not found"], "stdout": "replicaset.apps/controller-5888676bc9 condition met", "stdout_lines": ["replicaset.apps/controller-5888676bc9 condition met"]}
failed: [192.168.10.8] (item=fully labeled replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "replicaset", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.fullyLabeledReplicas}=1", "--timeout=120s"], "delta": "0:00:00.605612", "end": "2022-10-22 14:45:28.071205", "item": {"condition": "--for=jsonpath='{.status.fullyLabeledReplicas}'=1", "description": "fully labeled replicas of controller", "resource": "replicaset", "selector": "component=controller,app=metallb"}, "msg": "non-zero return code", "rc": 1, "start": "2022-10-22 14:45:27.465593", "stderr": "fullyLabeledReplicas is not found\nfullyLabeledReplicas is not found", "stderr_lines": ["fullyLabeledReplicas is not found", "fullyLabeledReplicas is not found"], "stdout": "replicaset.apps/controller-5888676bc9 condition met", "stdout_lines": ["replicaset.apps/controller-5888676bc9 condition met"]}
failed: [192.168.10.8] (item=available replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "replicaset", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.availableReplicas}=1", "--timeout=120s"], "delta": "0:00:01.894136", "end": "2022-10-22 14:45:30.815935", "item": {"condition": "--for=jsonpath='{.status.availableReplicas}'=1", "description": "available replicas of controller", "resource": "replicaset", "selector": "component=controller,app=metallb"}, "msg": "non-zero return code", "rc": 1, "start": "2022-10-22 14:45:28.921799", "stderr": "availableReplicas is not found\navailableReplicas is not found", "stderr_lines": ["availableReplicas is not found", "availableReplicas is not found"], "stdout": "replicaset.apps/controller-5888676bc9 condition met", "stdout_lines": ["replicaset.apps/controller-5888676bc9 condition met"]}

Answered by vdovhanych

Nov 12, 2022

Was updating to v1.24.7+k3s1 and metal_lb_speaker_tag_version: "v0.13.7", metal_lb_controller_tag_version: "v0.13.7"

the issue is still there, ansible-playbook failed on the same task
failed: [192.168.10.8] (item=ready replicas of controller) => changed=false

It was resolved by deleting the old replicaset. If you take a look below, there is an old replicaset for metallb left behind, and everything is at zero.

After I deleted the replicaset k delete replicasets.apps -n metallb-system controller-5888676bc9 and run the ansible-playbook site.yml -i inventory/my-cluster/hosts.ini all of the tasks finished correctly.

View full answer

timothystewart6 · 2022-10-22T19:15:18Z

timothystewart6
Oct 22, 2022
Maintainer

@vdovhanych can you please fill out the issue template properly? It's impossible to to what version of anything you are using

0 replies

vdovhanych · 2022-10-22T21:16:47Z

vdovhanych
Oct 22, 2022
Author

@timothystewart6 sorry for that. I updated it according to the template.

I assume this is here for at least 3 k3s version updates, this has failed for me for 3rd time now, and I started to investigate. As I mentioned in the issue, it's related to the old replicates that are, for some reason, left behind, and we specify in the task =1 for the controller its this line here

0 replies

vdovhanych · 2022-11-12T21:35:34Z

vdovhanych
Nov 12, 2022
Author

Was updating to v1.24.7+k3s1 and metal_lb_speaker_tag_version: "v0.13.7", metal_lb_controller_tag_version: "v0.13.7"

the issue is still there, ansible-playbook failed on the same task
failed: [192.168.10.8] (item=ready replicas of controller) => changed=false

It was resolved by deleting the old replicaset. If you take a look below, there is an old replicaset for metallb left behind, and everything is at zero.

After I deleted the replicaset k delete replicasets.apps -n metallb-system controller-5888676bc9 and run the ansible-playbook site.yml -i inventory/my-cluster/hosts.ini all of the tasks finished correctly.

0 replies

badsmoke · 2022-12-08T19:52:22Z

badsmoke
Dec 8, 2022

thanks that worked for me too, even had two "zombi" controller

0 replies

l50 · 2023-01-10T06:23:24Z

l50
Jan 10, 2023

This was super helpful for me as well. Thank you for documenting your troubleshooting process!

0 replies

nilzen · 2023-03-30T17:49:12Z

nilzen
Mar 30, 2023

Same issue for me, thanks for the solution!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`TASK [k3s/post : Wait for MetalLB resources]` fails when updating metallb #146

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

TASK [k3s/post : Wait for MetalLB resources] fails when updating metallb #146

vdovhanych Oct 22, 2022

Expected Behavior

Current Behavior

Steps to Reproduce

Context (variables)

Variables Used

Hosts

Possible Solution

Why it is happening

Replies: 6 comments

timothystewart6 Oct 22, 2022 Maintainer

vdovhanych Oct 22, 2022 Author

vdovhanych Nov 12, 2022 Author

badsmoke Dec 8, 2022

l50 Jan 10, 2023

nilzen Mar 30, 2023

`TASK [k3s/post : Wait for MetalLB resources]` fails when updating metallb #146

vdovhanych
Oct 22, 2022

timothystewart6
Oct 22, 2022
Maintainer

vdovhanych
Oct 22, 2022
Author

vdovhanych
Nov 12, 2022
Author

badsmoke
Dec 8, 2022

l50
Jan 10, 2023

nilzen
Mar 30, 2023