Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HNC slow to become Ready on 1.21 and earlier #170

Closed
adrianludwin opened this issue Mar 29, 2022 · 5 comments · Fixed by #171
Closed

HNC slow to become Ready on 1.21 and earlier #170

adrianludwin opened this issue Mar 29, 2022 · 5 comments · Fixed by #171
Milestone

Comments

@adrianludwin
Copy link
Contributor

I've found that HNC becomes Ready on 1.22 after about 10s, while on 1.20 and 1.21 it takes about 80+ seconds. Adding a Readiness probe seems to solve the problem - I'm not sure why, since "no readiness probe" is supposed to be equivalent to "always ready" and this does seem to be the case in 1.22. But I can't find any evidence of a bug that got fixed in 1.22 and I'm surprised that something this obviously wrong would have survived so long.

Either way, the solution seems pretty straightforward and I'll make the fix today.

/cc @erikgb

@adrianludwin adrianludwin added this to the release-v1.0 milestone Mar 29, 2022
@adrianludwin
Copy link
Contributor Author

This looks vaguely related to kubernetes/kubernetes#101064 but that bug was introduced in 1.21 (whereas I'm seeing the same behaviour in 1.20) and besides, it doesn't seem to be the same issue.

@adrianludwin
Copy link
Contributor Author

adrianludwin commented Mar 29, 2022

Hmm, kubernetes/kubernetes#98376 looks possibly related... except that went into 1.21 🤷

@adrianludwin
Copy link
Contributor Author

kubernetes/kubernetes#101738 looks like a more likely candidate - introduced in 1.22, designed to fix kubernetes/kubernetes#99979. It explicitly was not backported to 1.21 (kubernetes/kubernetes#102681 (comment)).

@adrianludwin
Copy link
Contributor Author

/cc @BenTheElder

... in case you care :)

@adrianludwin
Copy link
Contributor Author

Note that I verified that the healthz and readyz endpoints were actually working well this entire time. E.g. I started a busybox, pinged the endpoints directly, and verified that they returned ok long before the containers actually became Ready. The fact that adding a readiness probe with no change to the container fixes the problem is more evidence IMO that this is a workaround to a K8s-side problem.

adrianludwin added a commit to adrianludwin/hierarchical-namespaces that referenced this issue Mar 29, 2022
On GKE 1.21 and earlier, I noticed HNC taking a long time (~80s) to
become Ready (for more details, see kubernetes-sigs#170). Adding a readiness probe
fixes the problem.

Tested: before this change, on GKE 1.20 and 1.21, I manually see HNC
taking a long time to start, and the e2e tests that require reinstalling
HNC fail because it the deadlines are exceeded. With this change, I can
see HNC becoming ready in ~10s on GKE 1.20 and all the e2e tests pass.
adrianludwin added a commit to adrianludwin/hierarchical-namespaces that referenced this issue Mar 29, 2022
On GKE 1.21 and earlier, I noticed HNC taking a long time (~80s) to
become Ready (for more details, see kubernetes-sigs#170). Adding a readiness probe
fixes the problem.

Tested: before this change, on GKE 1.20 and 1.21, I manually see HNC
taking a long time to start, and the e2e tests that require reinstalling
HNC fail because it the deadlines are exceeded. With this change, I can
see HNC becoming ready in ~10s on GKE 1.20 and all the e2e tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant