-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix that the NGF pod/deployment can occasionally "disappear"! #281
Comments
The below is some copied information obtained while debugging the issue. (search Slack DMs ~2024-03-18 for more details) Okay, I checked the logs of the internal "systemd-journal", and the logs for the "loki visible" pods in each namespace, and the very first log that I was able to find (for the sequence that ends in the ngf destruction), is this line from above:
With these fields on the log entry:
That is, the first log entry in that sequence was made by the "nginfx-gateway-fabric" pod. So the NGF pod is apparently looking at the Secret and trying to reconcile the cluster state to "upsert to it", for some reason. Maybe it's a timer, maybe it's something else. To try to figure that out, I tracked down the line that is causing that log line (I think this is it, anyway): https://github.com/nginxinc/nginx-gateway-fabric/blob/e1d6ebb5065bab73af3a89faba4f49c7a5b971cd/internal/framework/controller/reconciler.go#L76 (edited) |
…ad. (the "helm releases disappearing" issue, #281, happened again, and helm_remote presumably avoids it)
While it is a "workaround" rather than a "proper fix", I ended up changing the tiltfiles from using "helm_resource" to the older "helm_remote", and this seems to have resolved the issue (no recurrence in the last several days). Basically: Something was calling "uninstall" on the helm charts marked within the remote cluster. By using "helm_remote" instead, we just deploy the individual resources rather than under a "chart" entry, making this "unwanted top-down uninstall" unable to happen. I'll close this issue for now, since "helm_remote" resolves the problem, and works fine. But of course, if the "root cause" of this unwanted install is ever discovered, it's preferable to resolve that rather than having to use this (semi) workaround. |
Summary
General:
Possibly related:
PENDING_UPGRADE
and multipleDEPLOYED
revisions arise soon helm/helm#4558Occurrences
Discovered: 2024-03-17 11:59am (PT, by Venryx)
Discovered: 2024-03-18 9:15pm (PT, by Jamie)
Discovered: 2024-03-19 3:44am (PT, by Venryx)
Discovered: 2024-03-19 5:34am (PT, by Venryx)
Discovered: 2024-03-19 8:29pm (PT, by Venryx)
The text was updated successfully, but these errors were encountered: