-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(controller): add stack dump for better debugging #2360
Conversation
✅ Deploy Preview for docs-kargo-akuity-io ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
5e91fa3
to
d7359bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would not be better to configure this by globally enabling recovery from panics (see: https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.18.4/pkg/controller#Options -> RecoverPanic
), but I have to admit that I lack historical insight here (cc: @krancour).
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2360 +/- ##
==========================================
- Coverage 47.64% 47.58% -0.07%
==========================================
Files 244 244
Lines 17392 17414 +22
==========================================
Hits 8287 8287
- Misses 8694 8716 +22
Partials 411 411 ☔ View full report in Codecov by Sentry. |
@hiddeco I would prefer that as well. I'm not sure why the current panic recovery is limited to this one spot. |
@loafoe do you feel up for making this suggested change? It should be introduced at these points: https://github.com/search?q=repo%3Aakuity%2Fkargo%20NewManager&type=code. Where the config.Controller{
RecoverPanic: ptr.To(true),
} as a value. |
Sure, I'll have a look |
@hiddeco @krancour added the controller Config across. However, the panic test function now fails as the |
@loafoe the test fails because the wrapping of the panic now happens at the point where However, for this "special" use case, we shell out to a series of other binaries as part of the promotion mechanics. Given this, it would be OK to keep the panic logic in place to surface the error (and be able to persist it to the Kubernetes API). With the new addition, we can now guarantee the controller will not end up in a crash loop if we make a mistake anywhere else. |
Okay, I'll bring the wrapper code back and do some linting, thanks |
3997f7e
to
1f01140
Compare
Signed-off-by: Andy Lo-A-Foe <andy.loafoe@gmail.com>
1f01140
to
39cebf7
Compare
@hiddeco fixed linter issues and all unit tests pass again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up with the requested changes! 🙇
With the upgrade to controller-runtime v0.19.0 (via #2431), this has now become obsolete as it has been enabled by default upstream. I want to thank you nonetheless for the work you put in @loafoe. 🙇 |
Encountered a nil error in controller in current version. Using
tilt
up does not immediately show the location of the issue, so added a stack dump usingruntime/debug
Go package to make this visible.