-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HNC slow to become Ready on 1.21 and earlier #170
Comments
This looks vaguely related to kubernetes/kubernetes#101064 but that bug was introduced in 1.21 (whereas I'm seeing the same behaviour in 1.20) and besides, it doesn't seem to be the same issue. |
Hmm, kubernetes/kubernetes#98376 looks possibly related... except that went into 1.21 🤷 |
kubernetes/kubernetes#101738 looks like a more likely candidate - introduced in 1.22, designed to fix kubernetes/kubernetes#99979. It explicitly was not backported to 1.21 (kubernetes/kubernetes#102681 (comment)). |
/cc @BenTheElder ... in case you care :) |
Note that I verified that the healthz and readyz endpoints were actually working well this entire time. E.g. I started a busybox, pinged the endpoints directly, and verified that they returned |
On GKE 1.21 and earlier, I noticed HNC taking a long time (~80s) to become Ready (for more details, see kubernetes-sigs#170). Adding a readiness probe fixes the problem. Tested: before this change, on GKE 1.20 and 1.21, I manually see HNC taking a long time to start, and the e2e tests that require reinstalling HNC fail because it the deadlines are exceeded. With this change, I can see HNC becoming ready in ~10s on GKE 1.20 and all the e2e tests pass.
On GKE 1.21 and earlier, I noticed HNC taking a long time (~80s) to become Ready (for more details, see kubernetes-sigs#170). Adding a readiness probe fixes the problem. Tested: before this change, on GKE 1.20 and 1.21, I manually see HNC taking a long time to start, and the e2e tests that require reinstalling HNC fail because it the deadlines are exceeded. With this change, I can see HNC becoming ready in ~10s on GKE 1.20 and all the e2e tests pass.
I've found that HNC becomes Ready on 1.22 after about 10s, while on 1.20 and 1.21 it takes about 80+ seconds. Adding a Readiness probe seems to solve the problem - I'm not sure why, since "no readiness probe" is supposed to be equivalent to "always ready" and this does seem to be the case in 1.22. But I can't find any evidence of a bug that got fixed in 1.22 and I'm surprised that something this obviously wrong would have survived so long.
Either way, the solution seems pretty straightforward and I'll make the fix today.
/cc @erikgb
The text was updated successfully, but these errors were encountered: