-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watching for events stops working after ~10 min stale time #228
Comments
Hi @audunsol You have a very weird scenario. Let me explain why: The session close occurs on most cloud provided k8s installations at around 40-45 mins (plus minus 10 minutes), so seeing the logs you sent it seems that socket timeout works fine. There were versions of k8s which had issues with issuing events on time (basically subscribers that idled were still connected but weren't getting the events), which seems to be your scenario here, but what's weird is that they usually got disconnected after 20-30 mins. My suggestion is either upgrade k8s to a newer version because something is definitely wrong there and I'm suspecting it's the older k8s issue (for the love of me if I can remember which versions had the issue). |
Thanks for your swift reply @winromulus! I might get around to upgrade the cluster at some point, and I can report back to you then. But 1.19 is not that old, and the previous versions of the reflector has survived quite a few older versions than that as well, here before upgrading it to v6. I have sifted through metrics in Azure and in Grafana now, but I could not spot anything unusual. Even if it felt very random at first, the issue is 100% reproducible on my end now. Even now this late in the evening, when we have a lot less traffic on our cluster (this is our TEST cluster used by our devs/testers, and right now very few of them are online). Here is the gist of it:
Anyways, thanks for confirming how it works. I understood that you were using a few |
@audunsol yup, that's the exact issue I've seen in the past, it's just the version of k8s that I can't confirm (I don't want to influence but I think it was 1.18 and 1.19). |
Hello,
We are using the reflector for copying secrets into dynamically created namespaces (for review apps in a GitOps-setup) following a naming pattern, so we want it to trigger as soon as possible upon newly created namespaces (and/or of course changes to the source secrets).
Upgraded from 5.4.17 => 6.0.21 on Nov 5 via helm install in our AKS cluster, by just
(This was in order to get around some
OutOfMemory
exceptions from the logs that we were seeing randomly, but more and more often in the previous version, and version 6 aims to improves performance, as I understand from the release log).Did some testing just after upgrade, and it seemed to work more or less as before.
But after a while, the sync seems to trigger very late (a bit random, sometimes not before 15 to 30 minutes after creating a new namespace or so). Log sample here:
(Secret and namespace names altered for the sake of this ticket.)
For instance, in the above log, the namespace
review-app-mytest
was created at2021-11-08T15:42:26Z
, so it took ~24 minutes before the secret reflection kicked in in this case.After several attempts, it seems that the log events like
[INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:55:59.9519413. Faulted: False.
resets the watcher and makes the reflector work again. If nothing happens for 10 min, it has to wait until next such event before anything happens.I have looked at the config options to see if I could/should tweak some setting, but could not find any. This is the closest thing to something relevant: kubernetes-client/csharp#533 (but I might looking in the wrong direction).
Kubernetes version:
Please let me know if you want me to provide some particular info/logs/setting details in addition to the above.
PS: thanks for providing and maintaining this simple tool, by the way. We have used it for 9 months or so now, with not too big hiccups (other than the ones mentioned above).
The text was updated successfully, but these errors were encountered: