-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix --wait's failure to work on coredns pods #19748
Conversation
/ok-to-test |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
hey @ComradeProgrammer thanks for looking into this here's a bit more context that might help you: wait for system-critical pods' Ready condition implementation was intentionally made alongside the Running checks, ie, the former would be called only if so, as the ExpectAppsRunning is called by on the other hand, only if wait is required, WaitExtra is called by also, for ha + wait, iirc the idea was that we'd be ok with waiting for at least one coredns pod to be ready, as the kube-dns service would take care of routing requests to the pod(s) that can process them example from #19288 shows only one coredns, so it's not a ha cluster, but makes a very good point:
i think that the fix should be made in as for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried this but coredns
still wasn't ready.
$ minikube start --wait=all
😄 minikube v1.34.0 on Debian rodete (kvm/amd64)
✨ Automatically selected the docker driver
📌 Using Docker driver with root privileges
👍 Starting "minikube" primary control-plane node in "minikube" cluster
🚜 Pulling base image v0.0.45-1727731891-master ...
🔥 Creating docker container (CPUs=2, Memory=26100MB) ...
🐳 Preparing Kubernetes v1.31.1 on Docker 27.3.1 ...
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔗 Configuring bridge CNI (Container Networking Interface) ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟 Enabled addons: storage-provisioner, default-storageclass
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7c65d6cfc9-qjrwh 0/1 Running 0 9s
kube-system etcd-minikube 1/1 Running 0 14s
kube-system kube-apiserver-minikube 1/1 Running 0 14s
kube-system kube-controller-manager-minikube 1/1 Running 0 15s
kube-system kube-proxy-c8mdp 1/1 Running 0 9s
kube-system kube-scheduler-minikube 1/1 Running 0 16s
kube-system storage-provisioner 1/1 Running 0 9s
cbb6378
to
5ccd0c4
Compare
This comment has been minimized.
This comment has been minimized.
Here are the number of top 10 failed tests in each environments with lowest flake rate.
Besides the following environments also have failed tests: To see the flake rates of all tests by environment, click here. |
5ccd0c4
to
c40ccf5
Compare
This comment has been minimized.
This comment has been minimized.
c40ccf5
to
4aadcd8
Compare
could you plz have a try again to see if it works now? thx |
/test pull-minikube-build |
/ok-to-test |
This comment has been minimized.
This comment has been minimized.
4aadcd8
to
8d91e62
Compare
This comment has been minimized.
This comment has been minimized.
kvm2 driver with docker runtime
Times for minikube start: 57.0s 54.0s 51.9s 54.8s 55.1s Times for minikube ingress: 17.3s 18.2s 17.2s 17.7s 19.3s docker driver with docker runtime
Times for minikube start: 25.4s 25.4s 24.8s 23.5s 25.5s Times for minikube ingress: 12.4s 13.4s 13.5s 13.4s 12.4s docker driver with containerd runtime
Times for minikube start: 25.0s 23.9s 22.3s 24.6s 21.5s Times for minikube ingress: 22.9s 22.8s 22.9s 22.8s 22.9s |
@ComradeProgrammer Docker_Linux_containerd seems weird, there is no failure in gopogh but it exited as failure, could the None Zero code be related to waiting for DNS? https://storage.googleapis.com/minikube-builds/logs/19748/37784/Docker_Linux_containerd.html it shows as |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ComradeProgrammer, medyagh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
FIX #19288
Before:
minikube start --wait=all
may end when coredns is not readyAfter:
minikube start --wait=all
will be able to wait until coredns is completly readyThe situation mentioned in #19288 was actually introduced by the HA-cluster PR. By default coredns has a deployment consists 2 coredns pods. However in pkg/minikube/node/start.go:158 it was manyally scaled down to 1. This happens before we start to wait for those essential nodes(minikube waits for nodes at line 236).
When minikube waits for system pods, there were 2 checks which will check the system pods's status:
After the HA-cluster PR was introduced, when minikube run WaitExtra funtion(the 1st check), one of the coredns pod's status can be Succeed. WaitExtra don't recognize this state and will print an error and break the checking loop. This logic is written at pkg/minikube/bootstrapper/bsutil/kverify/pod_ready.go line 99 and line 69.
The error I see is
The when minikube arrives at ExpectAppsRunning(the 2nd check), it doesn't check the ready state, so it believes that all pods are ok. This causes the #19288
So the fix is to make ExpectAppsRunning(the 2nd check) check the ready state as well.
(The reason why I didn't make the 1st check function to recognized the Succeed state is that: if for some reason there is a job (e.g. init job for some containers) in kube-system namespace, and we change the WaitExtra's logic to reject Succeed state, there will be problems.)