fix --wait's failure to work on coredns pods #19748

ComradeProgrammer · 2024-10-03T20:44:22Z

FIX #19288
Before: minikube start --wait=all may end when coredns is not ready
After: minikube start --wait=all will be able to wait until coredns is completly ready

The situation mentioned in #19288 was actually introduced by the HA-cluster PR. By default coredns has a deployment consists 2 coredns pods. However in pkg/minikube/node/start.go:158 it was manyally scaled down to 1. This happens before we start to wait for those essential nodes(minikube waits for nodes at line 236).

When minikube waits for system pods, there were 2 checks which will check the system pods's status:

WaitExtra (pkg/minikube/bootstrapper/bsutil/kverify/pod_ready.go) will list all pods with the given labels, and check whether they are ready
ExpectAppsRunning will list all the pods in kube-system namespace (pkg/minikube/bootstrapper/bsutil/kverify/system_pods.go:91 , in function ExpectAppsRunning), and check whether there are at least 1 running pod for some essential labels. But the bug is that it only check the running state, and do not check the ready state

After the HA-cluster PR was introduced, when minikube run WaitExtra funtion(the 1st check), one of the coredns pod's status can be Succeed. WaitExtra don't recognize this state and will print an error and break the checking loop. This logic is written at pkg/minikube/bootstrapper/bsutil/kverify/pod_ready.go line 99 and line 69.

The error I see is

E1003 22:37:04.296076   17140 pod_ready.go:66] WaitExtra: waitPodCondition: pod "coredns-7db6d8ff4d-nljst" in "kube-system" namespace has status phase "Succeeded" (skipping!): {Phase:Succeeded Conditions:[{Type:PodReadyToStartContainers Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2024-10-03 22:37:04 +0200 CEST Reason: Message:} {Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2024-10-03 22:36:53 +0200 CEST Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2024-10-03 22:36:53 +0200 CEST Reason:PodCompleted Message:} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2024-10-03 22:36:53 +0200 CEST Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2024-10-03 22:36:53 +0200 CEST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.49.2 HostIPs:[{IP:192.168.49.2}] PodIP: PodIPs:[] StartTime:2024-10-03 22:36:53 +0200 CEST InitContainerStatuses:[] ContainerStatuses:[{Name:coredns State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2024-10-03 22:36:54 +0200 CEST,FinishedAt:2024-10-03 22:37:04 +0200 CEST,ContainerID:docker://281a8c3106510cdc16a6d3f91ad2e9f5d7aa1609fac4f0e8f7494af67cb8b5d6,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.k8s.io/coredns/coredns:v1.11.1 ImageID:docker-pullable://registry.k8s.io/coredns/coredns@sha256:1eeb4c7316bacb1d4c8ead65571cd92dd21e27359f0d4917f1a5822a73b75db1 ContainerID:docker://281a8c3106510cdc16a6d3f91ad2e9f5d7aa1609fac4f0e8f7494af67cb8b5d6 Started:0x14001e58e00 AllocatedResources:map[] Resources:nil VolumeMounts:[]}] QOSClass:Burstable EphemeralContainerStatuses:[] Resize: ResourceClaimStatuses:[]}

The when minikube arrives at ExpectAppsRunning(the 2nd check), it doesn't check the ready state, so it believes that all pods are ok. This causes the #19288

So the fix is to make ExpectAppsRunning(the 2nd check) check the ready state as well.

(The reason why I didn't make the 1st check function to recognized the Succeed state is that: if for some reason there is a job (e.g. init job for some containers) in kube-system namespace, and we change the WaitExtra's logic to reject Succeed state, there will be problems.)

ComradeProgrammer · 2024-10-03T22:28:44Z

/ok-to-test

prezha · 2024-10-06T20:34:28Z

hey @ComradeProgrammer thanks for looking into this

here's a bit more context that might help you:

wait for system-critical pods' Ready condition implementation was intentionally made alongside the Running checks, ie, the former would be called only if wait was explicitly requested and would add some startup delay, whereas the latter would almost always be called for quickest startup time

so, as the ExpectAppsRunning is called by WaitForAppsRunning, which, in turn, is called by WaitForNode that is always called (with the exception for the first ha control plane), by adding the Ready check to the ExpectAppsRunning, we'd effectively always wait, which is not the intention

on the other hand, only if wait is required, WaitExtra is called by WaitForNode (but before the WaitForAppsRunning) or by restartPrimaryControlPlane

also, for ha + wait, iirc the idea was that we'd be ok with waiting for at least one coredns pod to be ready, as the kube-dns service would take care of routing requests to the pod(s) that can process them

example from #19288 shows only one coredns, so it's not a ha cluster, but makes a very good point:

sometimes the list of pods is pulled before some of the pods have even been created, resulting in them not being in the waiting check

i think that the fix should be made in WaitExtra, and there we could eg, invert the logic so not to list all pods once and then loop through it waiting for each pod that's also on a system-critical list to become Ready (as we do now), but instead to wait until all system-critical pods (which is a fixed list: kverify.CorePodsLabels) became Ready, re-fetching the pod's status as needed

as for Succeed status phase, that means that "All containers in the Pod have terminated in success, and will not be restarted", so it is handled - by skipping it (ie, it will never become Ready, so no point waiting for it)

spowelljr

I just tried this but coredns still wasn't ready.

$ minikube start --wait=all
😄  minikube v1.34.0 on Debian rodete (kvm/amd64)
✨  Automatically selected the docker driver
📌  Using Docker driver with root privileges
👍  Starting "minikube" primary control-plane node in "minikube" cluster
🚜  Pulling base image v0.0.45-1727731891-master ...
🔥  Creating docker container (CPUs=2, Memory=26100MB) ...
🐳  Preparing Kubernetes v1.31.1 on Docker 27.3.1 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ kubectl get pods -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system   coredns-7c65d6cfc9-qjrwh           0/1     Running   0          9s
kube-system   etcd-minikube                      1/1     Running   0          14s
kube-system   kube-apiserver-minikube            1/1     Running   0          14s
kube-system   kube-controller-manager-minikube   1/1     Running   0          15s
kube-system   kube-proxy-c8mdp                   1/1     Running   0          9s
kube-system   kube-scheduler-minikube            1/1     Running   0          16s
kube-system   storage-provisioner                1/1     Running   0          9s

minikube-pr-bot · 2024-12-12T02:31:05Z

Here are the number of top 10 failed tests in each environments with lowest flake rate.

Environment	Test Name	Flake Rate
Docker_macOS (1 failed)	TestMultiControlPlane/serial/StartCluster(gopogh)	0.00% (chart)
Docker_Linux (1 failed)	TestMultiControlPlane/serial/StartCluster(gopogh)	0.00% (chart)
Docker_Linux_containerd (1 failed)	TestMultiControlPlane/serial/StartCluster(gopogh)	0.00% (chart)
Docker_Linux_crio (3 failed)	TestMultiControlPlane/serial/StartCluster(gopogh)	0.00% (chart)
Docker_Linux_crio_arm64 (5 failed)	TestMultiControlPlane/serial/StartCluster(gopogh)	0.00% (chart)
Docker_Linux_crio_arm64 (5 failed)	TestFunctional/parallel/PersistentVolumeClaim(gopogh)	1.10% (chart)
Docker_Linux_crio_arm64 (5 failed)	TestScheduledStopUnix(gopogh)	2.20% (chart)
KVM_Linux_crio (10 failed)	TestMultiControlPlane/serial/StartCluster(gopogh)	0.00% (chart)
KVM_Linux_crio (10 failed)	TestStartStop/group/newest-cni/serial/SecondStart(gopogh)	0.00% (chart)
Docker_Linux_docker_arm64 (1 failed)	TestMultiControlPlane/serial/StartCluster(gopogh)	0.00% (chart)
KVM_Linux_containerd (3 failed)	TestMultiControlPlane/serial/StartCluster(gopogh)	0.00% (chart)
KVM_Linux_containerd (3 failed)	TestStartStop/group/no-preload/serial/SecondStart(gopogh)	0.00% (chart)
KVM_Linux_containerd (3 failed)	TestStartStop/group/default-k8s-diff-port/serial/SecondStart(gopogh)	0.00% (chart)
Hyper-V_Windows (9 failed)	TestMultiControlPlane/serial/StartCluster(gopogh)	0.00% (chart)
Hyper-V_Windows (9 failed)	TestPause/serial/VerifyDeletedResources(gopogh)	0.00% (chart)
Docker_Linux_containerd_arm64 (1 failed)	TestMultiControlPlane/serial/StartCluster(gopogh)	0.00% (chart)

Besides the following environments also have failed tests:

Hyperkit_macOS: 25 failed (gopogh)
Docker_Cloud_Shell: 3 failed (gopogh)

To see the flake rates of all tests by environment, click here.

pkg/minikube/bootstrapper/bsutil/kverify/system_pods.go

ComradeProgrammer · 2024-12-22T15:55:15Z

I just tried this but coredns still wasn't ready.

$ minikube start --wait=all
😄  minikube v1.34.0 on Debian rodete (kvm/amd64)
✨  Automatically selected the docker driver
📌  Using Docker driver with root privileges
👍  Starting "minikube" primary control-plane node in "minikube" cluster
🚜  Pulling base image v0.0.45-1727731891-master ...
🔥  Creating docker container (CPUs=2, Memory=26100MB) ...
🐳  Preparing Kubernetes v1.31.1 on Docker 27.3.1 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ kubectl get pods -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system   coredns-7c65d6cfc9-qjrwh           0/1     Running   0          9s
kube-system   etcd-minikube                      1/1     Running   0          14s
kube-system   kube-apiserver-minikube            1/1     Running   0          14s
kube-system   kube-controller-manager-minikube   1/1     Running   0          15s
kube-system   kube-proxy-c8mdp                   1/1     Running   0          9s
kube-system   kube-scheduler-minikube            1/1     Running   0          16s
kube-system   storage-provisioner                1/1     Running   0          9s

could you plz have a try again to see if it works now? thx

ComradeProgrammer · 2024-12-30T19:11:03Z

/test pull-minikube-build

medyagh · 2024-12-30T19:43:07Z

/ok-to-test

minikube-pr-bot · 2025-01-06T01:36:33Z

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 19748) |
+----------------+----------+---------------------+
| minikube start | 54.6s    | 53.3s               |
| enable ingress | 17.9s    | 19.3s               |
+----------------+----------+---------------------+

Times for minikube start: 57.0s 54.0s 51.9s 54.8s 55.1s
Times for minikube (PR 19748) start: 52.8s 52.7s 53.1s 53.8s 54.0s

Times for minikube ingress: 17.3s 18.2s 17.2s 17.7s 19.3s
Times for minikube (PR 19748) ingress: 21.3s 18.1s 19.7s 17.7s 19.7s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 19748) |
+----------------+----------+---------------------+
| minikube start | 24.9s    | 23.8s               |
| enable ingress | 13.0s    | 13.0s               |
+----------------+----------+---------------------+

Times for minikube start: 25.4s 25.4s 24.8s 23.5s 25.5s
Times for minikube (PR 19748) start: 22.4s 23.0s 26.1s 21.7s 25.6s

Times for minikube ingress: 12.4s 13.4s 13.5s 13.4s 12.4s
Times for minikube (PR 19748) ingress: 13.4s 13.3s 12.4s 13.4s 12.5s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 19748) |
+----------------+----------+---------------------+
| minikube start | 23.5s    | 22.1s               |
| enable ingress | 22.9s    | 22.9s               |
+----------------+----------+---------------------+

Times for minikube start: 25.0s 23.9s 22.3s 24.6s 21.5s
Times for minikube (PR 19748) start: 21.7s 22.1s 20.9s 21.6s 24.1s

Times for minikube ingress: 22.9s 22.8s 22.9s 22.8s 22.9s
Times for minikube (PR 19748) ingress: 22.9s 22.9s 22.9s 22.8s 23.0s

medyagh · 2025-01-06T21:25:08Z

@ComradeProgrammer Docker_Linux_containerd seems weird, there is no failure in gopogh but it exited as failure, could the None Zero code be related to waiting for DNS? https://storage.googleapis.com/minikube-builds/logs/19748/37784/Docker_Linux_containerd.html

it shows as
Docker_Linux_containerd — Jenkins: completed with 0 / 282 failures in 120.01 minutes.
but it lists them as Red (total test exited with non zero)

medyagh · 2025-01-08T19:18:15Z

/lgtm

k8s-ci-robot · 2025-01-08T19:18:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ComradeProgrammer, medyagh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [medyagh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from prezha and spowelljr October 3, 2024 20:44

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 3, 2024

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Oct 3, 2024