-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dumping stacks during klog.Fatal #316
Comments
Whether structured output is possible remains to be seen. We also would need to decide how to structure it. |
I took a cursory look and found that we maybe missing a "line" version. This is present in the other APIs. For example : Do we need such a version for this too ? |
The "line" variants are applicable to functions which do unstructured output formatting. What would be the effect here? |
A better API is probably to use functional options:
This avoids the proliferation of different variants with different semantic and allows us to extend the API in the future. |
I have a couple of questions.
While looking at the implementation specifics, the BacktraceAll (Depth) presents a problem. The current BackTrace goroutine stack versions of depth/numframes size can be accomplished using runtime.callers API, which returns program counters of the calling goroutine stack. But I couldn't locate similar API (like runtime.callers) for all BacktraceAll (depth). The only API which returns all stack info is runtime.stacks(true) which doesn't have a depth option. We need to explore other API, package or roll our own text processing to keep things standardized if we need BacktraceAllDepth. |
With
No, it will limit how far unwinding goes. It will still start at the usual stack frame. For This will be useful for testing: the direct caller and (depending on the test code) a few levels above it are fixed, but go too far and we end up in the Go standard library with unknown source code locations. In production it may be useful to avoid excessively large stack backtraces.
All goroutines, instead of just the current one. We can rename to "BacktraceAllGoroutines" to make this clearer. |
Will the size do the opposite i.e. return the top numFrames ?
No, it will limit how far unwinding goes. It will still start at the usual
stack frame. For numFrames = 1, only a single stack frame gets recorded.
Yeah that's what I meant.
What does "all" bool parameter mean in BackTraceAll ?
All goroutines, instead of just the current one. We can rename to
"BacktraceAllGoroutines" to make this clearer.
I understand what All means. The BacktraceAll API itself means "all" is
enabled right ? Why do we need the bool parameter? A sample implementation
[1] below to better understand the problem.
Does the skip/size make sense for the BacktraceAll? Can we ignore skip/size
parameters if "all" is specified and just return the entire backtrace of
all goroutines.
[1]
func BacktraceAll() BacktraceOption {
return func (b *Backtrace) {
b.all = true
}
}
type BacktraceOption func(*Backtrace)
type BackTrace struct {
bool all;
int numFrames; // the semantics of this depends on the select parameter
below
bool select; //select true implies we select the top numFrames and false
implies we drop the top numFrames and select the rest.
// Some data that represents the trace
}
… —
Reply to this email directly, view it on GitHub
<#316 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACW4GIQFTWY7SCMUJD7HSDVCSFCVANCNFSM5ROMHLJA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
This is open for debate. gRPC doesn't use a parameter (for example, WithDisableRetry). In zapr, I used boolean parameters and Tim was fine with that. |
Ah, Thanks for the context. I will keep the bool parameter for
BacktraceAllGoroutines.
…On Fri, Apr 1, 2022 at 2:11 PM Patrick Ohly ***@***.***> wrote:
BacktraceAllGoroutines has a parameter because is sometimes convenient to
write code like this:
allGoroutines := <some expression>
klog.ErrorS(err, "fatal error", "callstack", klog.Backtrace(klog.BacktraceAllGoroutines(allGoroutines)))
This is open for debate. gRPC doesn't use a parameter (for example,
WithDisableRetry
<https://pkg.go.dev/google.golang.org/grpc#WithDisableRetry>. In zapr, I
used boolean parameters and Tim was fine with that
<go-logr/zapr#37 (comment)>.
—
Reply to this email directly, view it on GitHub
<#316 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACW4GKZ7RTYM4ZOJPAGKK3VC2ZDRANCNFSM5ROMHLJA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I modified the BacktraceAll API signature to reflect the fact that it uses a different mechamism. Also the skip/size will not apply to it. I added a simple implementation as reference in #323 to take this forward.
|
Now |
One of the reasons why I chose this API setup is because "All" is incompatible with skip and size. So we now need to educate the users to not use "All" in conjunction with skip/size, define the behavior if they do so and implement validation if such a combination is used. The following should raise an error:
I found it cleaner to communciate the difference to the user by virtue of the API itself, rather than using documentation and validation. If indeed, BacktraceAll(false) is used it will simply call Backtrace() in the background. The only caveat is if we the "All" APIs can change to support skip/size in the future, but I am not sure if that makes sense at all. Anyways, I am fine with both approaches. If we decide to use BacktraceAll as an option, we need to decide if we should return an error, when the user exercises a wrong combination. |
We can simply define that skip/size don't have an effect when combined with All.
It's not just that. I still prefer a single, extensible API function. |
Do we need to issue a warning to the user that skip/size is ignored ? Maybe mention that in the head of the log message.
Sure, I will make it a single API. |
I wouldn't mention it. I someone intentionally does it (for example, because "all" gets added based on some flag), then such a warning would just be noise. If it was unintentional, then it will be obvious from the output that they were skipped. |
I have updated the APIs to reflect the discussion. I will still need to add unit-tests, will be doing it in the next patch. |
/close Resolved by: |
@pohly: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
What steps did you take and what happened:
#79 changed the behavior of klog.Fatal so that it prints all goroutines when writing to stderr (the default). Depending on the program, this generates a huge amount of output which often is harmful (console scrollback buffer overlows, large log files in Kubernetes Prow jobs).
What did you expect to happen:
Only the current goroutine should be printed. This is the original behavior that several users of klog are expecting.
Anything else you would like to add:
A simple revert of #79 fixes this.
However, we can do better than that. In Kubernetes, we already recommend to replace
klog.Fatal
with a corresponding structured logging calls likeklog.ErrorS
plusklog.FlushAndExit
. That replacement no longer dumps any stack trace. Often this is the right choice, for example after a lost leader election (one of the places where the program has to exit).But sometimes, a developer might want to dump the current backtrace or all backtraces in the log entry. We could add a helper function for this:
The text was updated successfully, but these errors were encountered: