Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal handler is unsafe #36

Open
umanwizard opened this issue Aug 11, 2020 · 2 comments
Open

Signal handler is unsafe #36

umanwizard opened this issue Aug 11, 2020 · 2 comments

Comments

@umanwizard
Copy link
Contributor

In perf_signal_handler, backtrace::trace_unsychronized is called. This will not produce any bugs if the user is just using pprof-rs, since a lock is taken, so the main body of perf_signal_handler cannot be executed more than once at a time.

However, if the user is calling backtrace::trace from any other part of the code at the same time, this will result in UB.

I suspect (but I'm not sure) that this is why we are seeing deadlocks in https://github.com/MaterializeInc/materialize when using both jemalloc heap profiling and pprof-rs profiling at the same time.

@YangKeao
Copy link
Member

YangKeao commented Aug 12, 2020

Yes. I have mentioned this in README. (oops, it seems not clear enough)

Unfortunately, there is no 100% robust stack tracing method. Some related researches have been done by gperftools. pprof-rs uses backtrace-rs which finally uses libunwind provided by libgcc

WARN: as described in former gperftools documents, libunwind provided by libgcc is not signal safe.

libgcc's unwind method is not safe to use from signal handlers. One particular cause of deadlock is when profiling tick happens when the program is propagating thrown exception.

If the signal arrives while the program is getting backtrace (through libgcc) (for sampling, profiling, error handling...), the result is hard to predict (sometimes will crash directly). A possible solution (in my imagination 😸 ) is to scan and find the address of libgcc. In the signal handler, we can judge whether the context (register rip) is in libgcc's part. If it is, pprof-rs can skip this sampling. But as I am busy with other projects, I have no time to try this method these days 😞 .

But it's also not 100% perfect because libgcc's unwind can call other libraries, it's hard to tell whether the current context is in a calling process of unwind without getting backtrace.

@umanwizard
Copy link
Contributor Author

Thank you for the detailed response. I think the best solution is just to turn off other things that might be getting the backtrace (e.g. jemalloc) while using Pprof-rs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants