MAAS Cilium RollingUpgrade 1.27 to 1.28 makes old machine deleted before pods in old-version node are scheduled in different node #62

Kun483 · 2024-07-09T19:32:48Z

make folders,
.cluster-api/overrides/infrastructure-maas/v0.5.0
under v0.5.0 folder, create files cluster-template.yaml, infrastructure-components.yaml, metadata.yaml. They are from our repo.
https://github.com/spectrocloud/cluster-api-provider-maas/blob/main/templates/cluster-template.yaml
https://github.com/spectrocloud/cluster-api-provider-maas/blob/main/spectro/generated/core-global.yaml
https://github.com/spectrocloud/cluster-api-provider-maas/blob/main/metadata.yaml
Then,

kind create cluster
clusterctl init --infrastructure maas:v0.5.0 --bootstrap microk8s --control-plane microk8s

Then, kubectl apply manifest (pls replace variables):
maas_microk8s_cilium_share.yaml.zip
Then, in target cluster, install cilium:

helm install cilium cilium/cilium  \
    --namespace kube-system \
    --set cni.confPath=/var/snap/microk8s/current/args/cni-network \
    --set cni.binPath=/var/snap/microk8s/current/opt/cni/bin \
    --set daemon.runPath=/var/snap/microk8s/current/var/run/cilium \
    --set operator.replicas=1 \
    --set ipam.operator.clusterPoolIPv4PodCIDRList="10.1.0.0/16" \
    --set nodePort.enabled=true

Please execute clusterctl init --infrastructure maas:v0.5.0 --bootstrap microk8s --control-plane microk8s in target cluster after all pods in the first-launched are running.
To triggered RollingUpgrade for CP nodes, I change 1.27.13 to 1.28.9 and 1.27 to 1.28 in - /capi-scripts/00-install-microk8s.sh '--channel 1.27/stable --classic' in preRunCommands in mcp
I observed that
New Node with 1.28 version join the cluster, then it forced cilium pod in old node deleted. Then that old machine is deleted. However, pods in that old machine are not scheduled to a different node yet.

e.g. when executing clusterctl init --infrastructure maas:v0.5.0 --bootstrap microk8s --control-plane microk8s, capi-microk8s-bootstrap-controller-manager, capi-microk8s-control-plane-controller-manager, and capi-controller-manager are on the node called 07. When RollingUpgrading, new machine (naming 08) is up and want to replace 07, these pods on 07 disappeared in the cluster since machine 07 is deleted before those pods are scheduled in different node. However, the deployment of those pods shows those pods are still READY 1/1. Then, ssh into machine 08, journalctl -u snap.microk8s.daemon-kubelite has an error below:

microk8s.daemon-kubelite[10219]: E0710 02:40:27.853000   10219 kubelet.go:2855] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

Environment:
infrastructure-maas: v0.5.0
Kernel: 5.15.0-113-generic
CAPI: v1.7.4
Microk8s Boostrap: v0.6.6
Microk8s Control Plane: v0.6.6
Container Runtime: containerd://1.6.28
OS: Ubuntu 22.04.3

The text was updated successfully, but these errors were encountered:

eaudetcobello self-assigned this Jul 10, 2024

eaudetcobello closed this as completed Sep 5, 2024

eaudetcobello reopened this Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAAS Cilium RollingUpgrade 1.27 to 1.28 makes old machine deleted before pods in old-version node are scheduled in different node #62

MAAS Cilium RollingUpgrade 1.27 to 1.28 makes old machine deleted before pods in old-version node are scheduled in different node #62

Kun483 commented Jul 9, 2024 •

edited

Loading

MAAS Cilium RollingUpgrade 1.27 to 1.28 makes old machine deleted before pods in old-version node are scheduled in different node #62

MAAS Cilium RollingUpgrade 1.27 to 1.28 makes old machine deleted before pods in old-version node are scheduled in different node #62

Comments

Kun483 commented Jul 9, 2024 • edited Loading

Kun483 commented Jul 9, 2024 •

edited

Loading