Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid deadlock by skipping sampling in libc, libgcc and pthread #85

Merged
merged 11 commits into from
Nov 1, 2021

Conversation

YangKeao
Copy link
Member

Signed-off-by: YangKeao yangkeao@chunibyo.icu

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
@YangKeao YangKeao force-pushed the avoid-deadlock-inside-backtrace branch from 6aab8f3 to ff4e24e Compare October 26, 2021 05:28
@YangKeao YangKeao requested a review from BusyJay October 27, 2021 09:37
@BusyJay
Copy link
Member

BusyJay commented Oct 27, 2021

Please fix warnings. I'm afraid this can mislead users. For example, symbols like pthread_cond_xxx will be skipped while they occupy quite some samples in a typical profile.

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
@BusyJay
Copy link
Member

BusyJay commented Oct 29, 2021

I can see there are still warnings in the changes.

@YangKeao
Copy link
Member Author

I can see there are still warnings in the changes.

Fixed 😺

@YangKeao
Copy link
Member Author

Please fix warnings. I'm afraid this can mislead users. For example, symbols like pthread_cond_xxx will be skipped while they occupy quite some samples in a typical profile.

That's why it is a feature. I left the choice to the user to trade off the deadlock possibility and the accuracy of the flamegraph

@YangKeao YangKeao requested review from BusyJay and removed request for BusyJay October 29, 2021 05:30
@BusyJay
Copy link
Member

BusyJay commented Oct 29, 2021

Can you explain why there will be deadlocks?

@YangKeao
Copy link
Member Author

YangKeao commented Oct 29, 2021

Can you explain why there will be deadlocks?

...
pthread_mutex_lock                          libpthread
dl_iterate_phdr                                     libc.so
_Unwind_Backtrace                            libgcc_s.so
Backtrace::trace_unsynchronized()
...
<---- Signal Arrive
...
dl_iterate_phdr                                     libc.so
_Unwind_Backtrace                            libgcc_s.so
Backtrace::new()
...

Because Backtrace::new() will call dl_iterate_phdr, in which there is a mutex. If a signal arrives right after it acquires the lock, this lock will never be released and this thread will be dead. In tikv-server, when some errors are generated, it will be equipped with a Backtrace, which is the most frequenctly situation where this deadlock may occur.

This lock is acquired during some logic inside the _Unwind_Backtrace and dl_iterate_phdr, so libgcc_s.so and libc.so are all required to be included in the blacklist.

Actually trying to get a lock in the signal handler is (theoretically) unsafe anyway, this PR is only a possible (practical) solution.

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
@YangKeao YangKeao force-pushed the avoid-deadlock-inside-backtrace branch from 0ce2dd4 to 59bad74 Compare October 29, 2021 05:45
@YangKeao
Copy link
Member Author

I've thought about shrinking the blacklist by only skipping some symbols (but not whole library).

However, it's only safe to get the ip address of the nearest frame of the signal, but not a full backtrace, so trying to find all possible symbol of the nearest "deadlock" frame is quite complicated (and not stable).

@YangKeao
Copy link
Member Author

@BusyJay PTAL

src/profiler.rs Outdated
@@ -23,6 +26,9 @@ pub struct Profiler {
sample_counter: i32,

running: bool,

#[cfg(all(feature = "ignore-libc", target_arch = "x86_64", target_os = "linux"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a feature seems over complicated to me. Blacklist should be a common requirement. Though I don't have strong opinion on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should make it configurable through an argument, but not a compile-time feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BusyJay Now I added a ProfilerGuardBuilder to set the configuration (though the ProfilerGuardBuilder sounds chaos).

src/profiler.rs Outdated
@@ -149,14 +173,60 @@ extern "C" fn perf_signal_handler(_signal: c_int) {
}
}

#[cfg(all(feature = "ignore-libc", target_arch = "x86_64", target_os = "linux"))]
const SHLIB_BLACKLIST: [&str; 3] = ["libc", "libgcc_s", "libpthread"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about binaries that linked statically?

Copy link
Member Author

@YangKeao YangKeao Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have to be handle case by case (for different libc implementation). The musl implementation of dl_iterate_phdr doesn't have a lock, so the _Unwind_Backtrace will not deadlock at this point.

But I'm not familiar with other static libc implementation.

@YangKeao YangKeao force-pushed the avoid-deadlock-inside-backtrace branch 2 times, most recently from 181bdbb to 6e4cfea Compare October 29, 2021 09:14
Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
@YangKeao YangKeao force-pushed the avoid-deadlock-inside-backtrace branch from 6e4cfea to 1b7b26c Compare October 29, 2021 09:25
Copy link
Member

@BusyJay BusyJay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it will still deadlock on Arm, right?

src/profiler.rs Outdated Show resolved Hide resolved
Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
@YangKeao YangKeao force-pushed the avoid-deadlock-inside-backtrace branch 2 times, most recently from 39518c8 to 911afb8 Compare October 29, 2021 11:58
@YangKeao YangKeao force-pushed the avoid-deadlock-inside-backtrace branch 2 times, most recently from fac6459 to cbfe7ea Compare October 29, 2021 12:43
Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
@YangKeao YangKeao force-pushed the avoid-deadlock-inside-backtrace branch from cbfe7ea to 1fc974a Compare October 29, 2021 12:48
@YangKeao
Copy link
Member Author

YangKeao commented Oct 29, 2021

So it will still deadlock on Arm, right?

@BusyJay This PR supports ARM now, but the pprof-rs don't 😭 . #75 describes this problem. A rough investigation on that issue is about the arm special signal frame, but I haven't found a way to resolve it.

@YangKeao
Copy link
Member Author

YangKeao commented Nov 1, 2021

@BusyJay PTAL

Copy link
Member

@BusyJay BusyJay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/profiler.rs Outdated Show resolved Hide resolved
Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
@YangKeao YangKeao requested a review from BusyJay November 1, 2021 07:03
@sticnarf
Copy link
Contributor

sticnarf commented Nov 2, 2021

A simpler solution might be hacking dl_iterate_phdr. Set a thread local flag when dl_iterate_phdr is called, then we can skip the sample by checking the thread local flag.

YangKeao added a commit that referenced this pull request Nov 25, 2021
* avoid deadlock by skipping sampling in libc, libgcc and pthread

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

* make clippy happy

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

* skip libc only for linux

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

* add null check in the signal handler

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

* add more cfg to remove blacklist on non-linux platform

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

* fix clippy warning for test

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

* fix clippy

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

* use a configurable option to set blacklist

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

* add document and don't reset blacklist_segments

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

* support aarch64

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

* replace blacklist into blocklist

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants