-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow printing task backtraces via profiling peek mechanism #56043
base: master
Are you sure you want to change the base?
Conversation
base/Base.jl
Outdated
# Needs to be in Base because Profile is no longer loaded on boot | ||
function profile_printing_listener(cond::Base.AsyncCondition) | ||
profile = nothing | ||
try | ||
while _trywait(cond) | ||
profile = @something(profile, require_stdlib(PkgId(UUID("9abbd945-dff8-562f-b5e8-e1ebf5ef1b79"), "Profile")))::Module | ||
invokelatest(profile.peek_report[]) | ||
if Base.get_bool_env("JULIA_PROFILE_PEEK_TASK_BACKTRACES", false) === true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth making this the default even
if Base.get_bool_env("JULIA_PROFILE_PEEK_TASK_BACKTRACES", false) === true | |
if Base.get_bool_env("JULIA_PROFILE_PEEK_TASK_BACKTRACES", true) === true |
It would also potentially be great to put this in signals-unix.c : signal_listener
directly, so it operates on fatal signals too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the PR so that we print the backtraces by default 👍 As for calling this from the signal handler directly -- do you mean we should do also print these for different signals than SIGUSR1/SIGINFO? I think I'd need some guidance with that as I'm not sure which signals would be suitable to be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: at least for us, with our huge sysimg, this can be really slow. Like >5 minutes slow.
So if that impacts others too, do we want to do this by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does system image size affect this? Yes, that would be the list of all "critical" signals defined there. There's already a place we call jl_print_bt_entry_codeloc
on the threads, so it would just tack on right after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We believe the reason is because symbolizing the traces is super slow with our giant binary size. But to be honest i think we don't know why it's so slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha, that does make sense. Symbolizing something big (such as the LLVM debug info when asserts are on) can take quite some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although that happens in parallel with the runtime, so I think that still might be okay, for something the user has to request pretty explicitly
# Needs to be in Base because Profile is no longer loaded on boot | ||
function profile_printing_listener(cond::Base.AsyncCondition) | ||
profile = nothing | ||
try | ||
while _trywait(cond) | ||
profile = @something(profile, require_stdlib(PkgId(UUID("9abbd945-dff8-562f-b5e8-e1ebf5ef1b79"), "Profile")))::Module | ||
invokelatest(profile.peek_report[]) | ||
if Base.get_bool_env("JULIA_PROFILE_PEEK_TASK_BACKTRACES", false) === true | ||
println(stderr, "Printing Julia task backtraces...") | ||
ccall(:jl_print_task_backtraces, Cvoid, ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prints to a different abstraction over stderr than the other println calls here, so may need to explicitly call flush around it to ensure the output doesn't get jumbled
Aside, it can also be better to use print(stderr, "<text>\n")
instead of println
where reliable async behavior is desired, since it ensures the whole text prints in the same syscall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very helpful, thanks!
db82a3c
to
460bab6
Compare
460bab6
to
0998426
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditional approval, since it would probably be better to print this async (in signal_listener) immediately instead of waiting for 1 second and then printing just before the next next yield point finally returns to the current Task (here), which could be a very long delay
This would be useful e.g. for debugging stuck tests run via
ReTestItems.jl
(which uses multiple Julia processes to run the tests)