You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With this broken, cgroups are not configured correctly and k3s will not start.
Expected Behavior
When installing on Bookworm, the config.txt file should be updated correctly to enable cgroups.
Current Behavior
During setup, the master/control nodes cannot start k3s server because the cgroups are not configured. The task completes but is modifying the wrong file.
TASK [k3s_server : Verify that all nodes actually joined (check k3s-init.service if this fails)] **********************************************************************************************************
FAILED - RETRYING: [192.168.1.150]: Verify that all nodes actually joined (check k3s-init.service if this fails) (20 retries left).
...
FAILED - RETRYING: [192.168.1.150]: Verify that all nodes actually joined (check k3s-init.service if this fails) (1 retries left).
fatal: [192.168.1.150]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["k3s", "kubectl", "get", "nodes", "-l", "node-role.kubernetes.io/master=true", "-o=jsonpath={.items[*].metadata.name}"], "delta": "0:00:00.273622", "end": "2024-02-25 18:10:42.404607", "msg": "non-zero return code", "rc": 1, "start": "2024-02-25 18:10:42.130985", "stderr": "The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?", "stderr_lines": ["The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?"], "stdout": "", "stdout_lines": []}
Checking the logs with journalctl
Feb 25 18:08:28 control-1 k3s[2271]: time="2024-02-25T18:08:28-08:00" level=fatal msg="failed to find memory cgroup (v2)"
Feb 25 18:08:28 control-1 systemd[1]: k3s-init.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 18:08:28 control-1 systemd[1]: k3s-init.service: Failed with result 'exit-code'.
Steps to Reproduce
Install fresh Bookworm based Raspberry Pi OS on a Pi
Install k3s using ansible-playbook site.yml
Context (variables)
Operating system:
Hardware:
9 Raspberry Pi 4s
Variables Used
all.yml
---
k3s_version: v1.29.1+k3s2# this is the user that has ssh access to these machinesansible_user: pico-k3ssystemd_dir: /etc/systemd/system# Set your timezonesystem_timezone: "America/Vancouver"# interface which will be used for flannelflannel_iface: "eth0"# uncomment calico_iface to use tigera operator/calico cni instead of flannel https://docs.tigera.io/calico/latest/about# calico_iface: "eth0"calico_ebpf: false # use eBPF dataplane instead of iptablescalico_tag: "v3.27.0"# calico version tag# uncomment cilium_iface to use cilium cni instead of flannel or calico# ensure v4.19.57, v5.1.16, v5.2.0 or more recent kernel#cilium_iface: "eth0"cilium_mode: "native"# native when nodes on same subnet or using bgp, else set routedcilium_tag: "v1.15.1"# cilium version tagcilium_hubble: true # enable hubble observability relay and ui# if using calico or cilium, you may specify the cluster pod cidr poolcluster_cidr: "10.52.0.0/16"# enable cilium bgp control plane for lb services and pod cidrs. disables metallb.cilium_bgp: true# bgp parameters for cilium cni. only active when cilium_iface is defined and cilium_bgp is true.cilium_bgp_my_asn: "64513"cilium_bgp_peer_asn: "64512"cilium_bgp_peer_address: "192.168.1.1"cilium_bgp_lb_cidr: "192.168.10.0/24"# cidr for cilium loadbalancer ipam# apiserver_endpoint is virtual ip-address which will be configured on each masterapiserver_endpoint: "192.168.1.250"# k3s_token is required masters can talk together securely# this token should be alpha numeric onlyk3s_token: "damn-i-should-have-changed-this-oh-well-it-is-just-a-home-lab-cluster-i-can-nuke"# The IP on which the node is reachable in the cluster.# Here, a sensible default is provided, you can still override# it for each of your hosts, though.k3s_node_ip: "{{ ansible_facts[(cilium_iface | default(calico_iface | default(flannel_iface)))]['ipv4']['address'] }}"# Disable the taint manually by setting: k3s_master_taint = falsek3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"# these arguments are recommended for servers as well as agents:extra_args: >- {{ '--flannel-iface=' + flannel_iface if calico_iface is not defined and cilium_iface is not defined else '' }} --node-ip={{ k3s_node_ip }}# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}# the contents of the if block is also required if using calico or ciliumextra_server_args: >- {{ extra_args }} {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }} {% if calico_iface is defined or cilium_iface is defined %} --flannel-backend=none --disable-network-policy --cluster-cidr={{ cluster_cidr | default('10.52.0.0/16') }} {% endif %} --tls-san {{ apiserver_endpoint }} --disable servicelb --disable traefikextra_agent_args: >- {{ extra_args }}# image tag for kube-vipkube_vip_tag_version: "v0.6.4"# tag for kube-vip-cloud-provider manifest# kube_vip_cloud_provider_tag_version: "main"# kube-vip ip range for load balancer# (uncomment to use kube-vip for services instead of MetalLB)# kube_vip_lb_ip_range: "192.168.30.80-192.168.30.90"# metallb type frr or nativemetal_lb_type: "native"# metallb mode layer2 or bgpmetal_lb_mode: "layer2"# bgp options# metal_lb_bgp_my_asn: "64513"# metal_lb_bgp_peer_asn: "64512"# metal_lb_bgp_peer_address: "192.168.30.1"# image tag for metal lbmetal_lb_speaker_tag_version: "v0.13.12"metal_lb_controller_tag_version: "v0.13.12"# metallb ip range for load balancermetal_lb_ip_range: "192.168.30.80-192.168.30.90"# Only enable if your nodes are proxmox LXC nodes, make sure to configure your proxmox nodes# in your hosts.ini file.# Please read https://gist.github.com/triangletodd/02f595cd4c0dc9aac5f7763ca2264185 before using this.# Most notably, your containers must be privileged, and must not have nesting set to true.# Please note this script disables most of the security of lxc containers, with the trade off being that lxc# containers are significantly more resource efficient compared to full VMs.# Mixing and matching VMs and lxc containers is not supported, ymmv if you want to do this.# I would only really recommend using this if you have particularly low powered proxmox nodes where the overhead of# VMs would use a significant portion of your available resources.proxmox_lxc_configure: false# the user that you would use to ssh into the host, for example if you run ssh some-user@my-proxmox-host,# set this value to some-userproxmox_lxc_ssh_user: root# the unique proxmox ids for all of the containers in the cluster, both worker and master nodesproxmox_lxc_ct_ids:
- 200
- 201
- 202
- 203
- 204# Only enable this if you have set up your own container registry to act as a mirror / pull-through cache# (harbor / nexus / docker's official registry / etc).# Can be beneficial for larger dev/test environments (for example if you're getting rate limited by docker hub),# or air-gapped environments where your nodes don't have internet access after the initial setup# (which is still needed for downloading the k3s binary and such).# k3s's documentation about private registries here: https://docs.k3s.io/installation/private-registrycustom_registries: false# The registries can be authenticated or anonymous, depending on your registry server configuration.# If they allow anonymous access, simply remove the following bit from custom_registries_yaml# configs:# "registry.domain.com":# auth:# username: yourusername# password: yourpassword# The following is an example that pulls all images used in this playbook through your private registries.# It also allows you to pull your own images from your private registry, without having to use imagePullSecrets# in your deployments.# If all you need is your own images and you don't care about caching the docker/quay/ghcr.io images,# you can just remove those from the mirrors: section.custom_registries_yaml: | mirrors: docker.io: endpoint: - "https://registry.domain.com/v2/dockerhub" quay.io: endpoint: - "https://registry.domain.com/v2/quayio" ghcr.io: endpoint: - "https://registry.domain.com/v2/ghcrio" registry.domain.com: endpoint: - "https://registry.domain.com" configs: "registry.domain.com": auth: username: yourusername password: yourpassword# Only enable and configure these if you access the internet through a proxy# proxy_env:# HTTP_PROXY: "http://proxy.domain.local:3128"# HTTPS_PROXY: "http://proxy.domain.local:3128"# NO_PROXY: "*.domain.local,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
Hosts
host.ini
[master]
192.168.1.150 ansible_ssh_private_key_file=~/.ssh/pico-ecdsa
[node]
192.168.1.151 ansible_ssh_private_key_file=~/.ssh/pico-ecdsa
192.168.1.152 ansible_ssh_private_key_file=~/.ssh/pico-ecdsa
192.168.1.153 ansible_ssh_private_key_file=~/.ssh/pico-ecdsa
192.168.1.154 ansible_ssh_private_key_file=~/.ssh/pico-ecdsa
192.168.1.155 ansible_ssh_private_key_file=~/.ssh/pico-ecdsa
192.168.1.156 ansible_ssh_private_key_file=~/.ssh/pico-ecdsa
192.168.1.157 ansible_ssh_private_key_file=~/.ssh/pico-ecdsa
192.168.1.158 ansible_ssh_private_key_file=~/.ssh/pico-ecdsa
# only required if proxmox_lxc_configure: true# must contain all proxmox instances that have a master or worker node# [proxmox]# 192.168.30.43[k3s_cluster:children]
master
node
Possible Solution
In this file is where the path to config.txt is specified. I suspect in main.yml when Bookworm is detected, some variable or something needs to change the path.
In the Raspberry Pi Documentation, it states since Bookworm the boot partition https://www.raspberrypi.com/documentation/computers/config_txt.html has been moved from
/boot
to/boot/firmware/
. TheActivating cgroup support
needs to change the path based on the distribution of Raspbian/Raspberry Pi OS.With this broken, cgroups are not configured correctly and k3s will not start.
Expected Behavior
When installing on Bookworm, the
config.txt
file should be updated correctly to enable cgroups.Current Behavior
During setup, the master/control nodes cannot start k3s server because the cgroups are not configured. The task completes but is modifying the wrong file.
Then on the verify task it times out and errors,
Checking the logs with journalctl
Steps to Reproduce
ansible-playbook site.yml
Context (variables)
Operating system:
Hardware:
9 Raspberry Pi 4s
Variables Used
all.yml
Hosts
host.ini
Possible Solution
In this file is where the path to
config.txt
is specified. I suspect inmain.yml
when Bookworm is detected, some variable or something needs to change the path.The text was updated successfully, but these errors were encountered: