Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support trigger action on signal #340

Closed
namhyung opened this issue Apr 9, 2018 · 40 comments
Closed

support trigger action on signal #340

namhyung opened this issue Apr 9, 2018 · 40 comments
Labels
Milestone

Comments

@namhyung
Copy link
Owner

namhyung commented Apr 9, 2018

Sometimes it needs to trace a long-running process or daemon and it might be hard to setup a trigger on a specific function. Add a signal trigger to control tracing behavior with signaling. Currently trace-on, trace-off and finish actions are supported only.

The code is available at review/signal-trigger-v1.

Example:

$ uftrace --disable --signal SIGUSR1@trace-on --signal SIGUSR2@finish --keep-pid  mydaemon
@atwufei
Copy link

atwufei commented May 22, 2018

Hi @namhyung , I tried it but seemed not work. The process terminated after sending USR2 signal.

$ uftrace record --disable --signal SIGUSR1@trace-on --signal SIGUSR2@finish ./bin/mysqld --defaults-file=~/work/mysql/my.cnf --gdb
child terminated by signal: 12: User defined signal 2

@namhyung
Copy link
Owner Author

@atwufei thanks for testing! Could you please confirm that it's because of SIGUSR2 or the finish trigger? You could do the above with swapping the two signals..

@atwufei
Copy link

atwufei commented May 23, 2018

I swapped the signals, it terminated as long as I sent SIGUSR2, SIGUSR1 had no problem. I also change SIGUSR2 to other signals such as SIGXCPU, SIGPROF, they behaved the same as SIGUSR2.

@namhyung
Copy link
Owner Author

There's a possibility that mysqld has its own handlers for those signals and it overwrites handlers in uftrace. What about real-time signals like SIGRTMIN+1?

@atwufei
Copy link

atwufei commented May 23, 2018

That's possible. I tried this but uftrace didn't get any data:

$ uftrace record --disable --signal SIGUSR1@trace-on --signal SIGRTMIN+1@finish ./bin/mysqld --defaults-file=~/work/mysql/my.cnf
$ kill -USR1 `pidof mysqld`
$ connect to mysqld and run a sql
$ kill -33 `pidof mysqld`
$ uftrace replay

@namhyung
Copy link
Owner Author

The SIGRTMIN can be a different number. I see the below on my system:

$ kill -l | grep RTMIN
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12

@atwufei
Copy link

atwufei commented May 24, 2018

kill -SIGRTMIN+1 (35 actually), then mysqld terminated. I need to figure out how mysqld handles signals first.

@namhyung
Copy link
Owner Author

Pushed rebased version to review/signal-trigger-v2

@namhyung
Copy link
Owner Author

Pushed to review/signal-trigger-v3.

Changelog:

  • cleanup finish routine
  • add test code

Example:

$ uftrace -a --signal SIGUSR1@finish tests/t-signal
# DURATION     TID     FUNCTION
            [ 13624] | main() {
   0.172 us [ 13624] |   foo();
   2.760 us [ 13624] |   signal(SIGUSR1, &sighandler) = 0x7fe409c85fd0;
            [ 13624] |   raise() {
            [ 13624] |     sighandler() {
   0.109 us [ 13624] |       bar();
            [ 13624] |       /* linux:task-exit */

uftrace stopped tracing with remaining functions
================================================
task: 13624
[2] sighandler
[1] raise
[0] main

@honggyukim
Copy link
Collaborator

Thanks for doing this work, but it seems that the signal is not properly catched by uftrace.

I've made an example that keeps running GC operations and runs it on node. Here is my testing setup:

$ uftrace record --disable --signal SIGUSR1@trace_on --signal SIGUSR2@finish ./node --expose-gc -e 'for (var i = 0; i < 1000; i++) gc();'

But after running this, when it gets SIGUSR1, it abnormally finished as follows.

Debugger listening on ws://127.0.0.1:9229/cea5f90a-1e51-4e7d-bfdb-2d796ad8d909
For help, see: https://nodejs.org/en/docs/inspector
child terminated by signal: 12: User defined signal 2

SIGUSR2 doesn't work either. Please let me know if I need to change something. Thanks!

@honggyukim
Copy link
Collaborator

One more request is that it'd be better to make it possible to use signal number as well. I would like to use the following method.

$ uftrace -a --signal 10@finish tests/t-signal

It took me some time to write the long signal name to send signal to it, so would like to simply send the number. It'd be much easier to use the same number for both uftrace and external signal send method.

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

@namhyung
Copy link
Owner Author

Could you please verify if the node installed new signal handlers? You can set filters on the signal functions.

@namhyung
Copy link
Owner Author

I did it on my machine and found node uses its own signal handler (overwriting ours)..

$ uftrace --caller-filter ^sig -A ^sig --no-event nodejs/node -e ''
# DURATION     TID     FUNCTION
            [ 24624] | main() {
            [ 24624] |   node::Start() {
            [ 24624] |     node::PlatformInit() {
   3.419 us [ 24624] |       sigemptyset(0x7ffe93da3400);
   1.767 us [ 24624] |       sigaddset(0x7ffe93da3400, SIGUSR1);
   3.234 us [ 24624] |       sigaction(SIGHUP, 0x7ffe93da3480, 0);
   1.440 us [ 24624] |       sigaction(SIGINT, 0x7ffe93da3480, 0);
   1.293 us [ 24624] |       sigaction(SIGQUIT, 0x7ffe93da3480, 0);
   1.287 us [ 24624] |       sigaction(SIGILL, 0x7ffe93da3480, 0);
   1.310 us [ 24624] |       sigaction(SIGTRAP, 0x7ffe93da3480, 0);
   1.389 us [ 24624] |       sigaction(SIGABRT, 0x7ffe93da3480, 0);
   1.300 us [ 24624] |       sigaction(SIGBUS, 0x7ffe93da3480, 0);
   1.290 us [ 24624] |       sigaction(SIGFPE, 0x7ffe93da3480, 0);
   1.309 us [ 24624] |       sigaction(SIGUSR1, 0x7ffe93da3480, 0);
   1.290 us [ 24624] |       sigaction(SIGSEGV, 0x7ffe93da3480, 0);
   1.284 us [ 24624] |       sigaction(SIGUSR2, 0x7ffe93da3480, 0);
   1.600 us [ 24624] |       sigaction(SIGPIPE, 0x7ffe93da3480, 0);
   1.320 us [ 24624] |       sigaction(SIGALRM, 0x7ffe93da3480, 0);
   1.290 us [ 24624] |       sigaction(SIGTERM, 0x7ffe93da3480, 0);
   1.290 us [ 24624] |       sigaction(SIGSTKFLT, 0x7ffe93da3480, 0);
   1.333 us [ 24624] |       sigaction(SIGCHLD, 0x7ffe93da3480, 0);
   1.310 us [ 24624] |       sigaction(SIGCONT, 0x7ffe93da3480, 0);
   1.330 us [ 24624] |       sigaction(SIGTSTP, 0x7ffe93da3480, 0);
   1.306 us [ 24624] |       sigaction(SIGTTIN, 0x7ffe93da3480, 0);
   1.290 us [ 24624] |       sigaction(SIGTTOU, 0x7ffe93da3480, 0);
   1.340 us [ 24624] |       sigaction(SIGURG, 0x7ffe93da3480, 0);
   1.317 us [ 24624] |       sigaction(SIGXCPU, 0x7ffe93da3480, 0);
   1.280 us [ 24624] |       sigaction(SIGXFSZ, 0x7ffe93da3480, 0);
   1.317 us [ 24624] |       sigaction(SIGVTALRM, 0x7ffe93da3480, 0);
   1.393 us [ 24624] |       sigaction(SIGPROF, 0x7ffe93da3480, 0);
   1.326 us [ 24624] |       sigaction(SIGWINCH, 0x7ffe93da3480, 0);
   1.313 us [ 24624] |       sigaction(SIGPOLL, 0x7ffe93da3480, 0);
   1.280 us [ 24624] |       sigaction(SIGPWR, 0x7ffe93da3480, 0);
   1.287 us [ 24624] |       sigaction(SIGSYS, 0x7ffe93da3480, 0);
   2.347 us [ 24624] |       sigfillset(0x7ffe93da35c8);
   1.364 us [ 24624] |       sigaction(SIGINT, 0x7ffe93da35c0, 0);
   0.277 us [ 24624] |       sigfillset(0x7ffe93da3528);
   1.363 us [ 24624] |       sigaction(SIGTERM, 0x7ffe93da3520, 0);
  91.041 us [ 24624] |     } /* node::PlatformInit */
            [ 24624] |     node::Start() {
            [ 24624] |       node::inspector::Agent::Start() {
   1.700 us [ 24624] |         sigfillset(0x7ffe93da2620);
            [ 24624] |         node::RegisterSignalHandler() {
   0.670 us [ 24624] |           sigfillset(0x7ffe93da2478);
   2.567 us [ 24624] |           sigaction(SIGUSR1, 0x7ffe93da2470, 0);
   5.123 us [ 24624] |         } /* node::RegisterSignalHandler */
   0.540 us [ 24624] |         sigemptyset(0x7ffe93da2620);
   0.374 us [ 24624] |         sigaddset(0x7ffe93da2620, SIGUSR1);
 259.138 us [ 24624] |       } /* node::inspector::Agent::Start */
   1.750  s [ 24624] |     } /* node::Start */
   1.752  s [ 24624] |   } /* node::Start */
   1.752  s [ 24624] | } /* main */

@honggyukim
Copy link
Collaborator

Thanks for finding it out. But I was trying to use SIGRTMAX and SIGRTMAX-1, but they didn't work as well.

@honggyukim
Copy link
Collaborator

Thanks for the work. I have some requests to print the below info.

  • the tid of target process when --signal is used, when uftrace begins.
  • tracing on and off info, when uftrace receives given signal.

I think it'd be better to keep caller path even when tracing is disabled. This info can be useful when tracing is on in the middle of call stack. I think it'd be really nice to be similar to -C/--caller-filter.

Lastly, would it be possible to provide a simpler way for the current method?

--disable --signal SIGUSR1@trace_on --signal SIGUSR2@finish

@namhyung
Copy link
Owner Author

I guess it's working with SIGRTMIN/MAX now, right?

Regarding the print, I'm ok with showing pid at the beginning as it's printed before child is running. But not sure about signal receive since it could mess up the output from child. Maybe we could use pr_dbg() for that (so users can see with -v option).

And keeping caller looks a good idea - I'll think about how to do it.

Lastly, maybe we could remove 'SIG' prefix.. how about this?

$ uftrace --disable --signal USR1=trace-on,USR2=finish

@honggyukim
Copy link
Collaborator

I guess it's working with SIGRTMIN/MAX now, right?

Yes, it's working fine with SIGRTMAX-1 and SIGRTMAX-2

Regarding the print, I'm ok with showing pid at the beginning as it's printed before child is running.

Good.

But not sure about signal receive since it could mess up the output from child. Maybe we could use pr_dbg() for that (so users can see with -v option).

You're right. I'm fine with printing it using pr_dbg().

And keeping caller looks a good idea - I'll think about how to do it.

Thanks!

Lastly, maybe we could remove 'SIG' prefix.. how about this?
$ uftrace --disable --signal USR1=trace-on,USR2=finish

I'm fine with it, but I'd be good to accept a multiple way using SIGUSR1, USR1, and its signal number as well.

@namhyung
Copy link
Owner Author

Pushed review/signal-trigger-v4

Changelog:

  • print pid of target process at the beginning
  • allow multiple way of signal spec (including number)
  • fix to handle SIGRTMIN and SIGRTMAX properly

Note:

  • I decided to change behavior of 'trace-on/off' (keeping callers) as a separate work
  • the old code already had pr_dbg() in the signal handlers
  • the signal spec keeps the original form (single signal + action separated by "@") in order to share the code with (function) triggers, so it should be like:
$ uftrace --disable --signal USR1@trace-on --signal USR2@finish

@honggyukim
Copy link
Collaborator

Thanks for the work! I found that it works fine with the below test:

$ uftrace record --disable --signal RTMAX-1@trace-on --signal RTMAX@finish ./node --expose-gc -e 'for (var i = 0; i < 1000; i++) gc();'

But I'm not sure if the below usage is wrong. It shows that parsing signal is failed.

$ uftrace record --disable --signal 63@trace-on --signal 64@finish ./node --expose-gc -e 'for (var i = 0; i < 1000; i++) gc();'
uftrace: install signal handlers to task 174247
failed to parse signal: 63
failed to parse signal: 64
child terminated by signal: 63: Real-time signal 29

And it may not be directly related to this work but I'm just wondering if the name of finish can be changed or added to detach. I sometimes misunderstand it as finishing the target process.

@namhyung
Copy link
Owner Author

Oh, I forgot to handle numbers for real-time signals, will fix. Also I agree the name is confusing. What about trace-finish?

@honggyukim
Copy link
Collaborator

Hmm.. trace-finish is okay but I'm worrying about it's getting too long to use signal feature. detach is more clear for me. It can be a pair with attach in the future.

@honggyukim
Copy link
Collaborator

It mostly works fine but somestimes get segfault, but I don't know how to reproduce it.

$ uftrace record --disable --signal RTMAX-1@trace-on --signal RTMAX-2@finish ./node --expose-gc -e 'for (var i = 0; i < 1000; i++) gc();'
uftrace: install signal handlers to task 176338
child terminated by signal: 11: Segmentation fault

@honggyukim
Copy link
Collaborator

honggyukim commented Nov 30, 2018

This may be a weird usage but would it be possible to send signal directly to uftrace process?
I mean using some signals that can be sent through keyboard - ^C, ^D, or ^\, etc.

$ uftrace record --disable --signal SIGINT@trace-on --signal SIGQUIT@finish ./node --expose-gc -e 'for (var i = 0; i < 1000; i++) gc();'
uftrace: install signal handlers to task 174247
^C
^\

^C triggers trace-on, then ^\ triggers finish in this case. It seems that it works fine even if the target process overrides some signals because the signal is sent to uftrace process, not the target process.

@namhyung
Copy link
Owner Author

namhyung commented Dec 3, 2018

A signal from a controlling terminal will be sent to each process in the foreground process group. The uftrace and target process is in a same process group by default so they will receive the signal separately. If we'd create a new process group, uftrace cannot receive the signal anymore since it'd be in the background.

@namhyung
Copy link
Owner Author

Pushed review/signal-trigger-v5

Changelog:

  • fix race between mcount_exit() and signal handler (finish trigger)
  • handle number in range of RT signals (34 ~ 64)

@honggyukim
Copy link
Collaborator

Thanks for the update. I will test it again when I have time on it.

@namhyung
Copy link
Owner Author

namhyung commented Jan 3, 2019

Pushed review/signal-trigger-v6

Changelog:

  • revert commit to make mcount_exit() return 0
  • restrict finish signal trigger processing to safe locations
  • skip mcount functions when restoring return addresses

@honggyukim
Copy link
Collaborator

honggyukim commented Jan 3, 2019

It seems to be working fine so far, but cannot guarantee the safety yet. I will tell you if I find some problems. Thanks!

@namhyung
Copy link
Owner Author

namhyung commented Jan 3, 2019

Note that this version survived from a stress test overnight.

@honggyukim
Copy link
Collaborator

Very cool! Did you repeat trace-on and trace-off during the test?

@namhyung
Copy link
Owner Author

namhyung commented Jan 3, 2019

No, they don't affect runtime behavior except for saving data. I just tested the finish trigger which used to cause segfault or deadlock.

@honggyukim
Copy link
Collaborator

I see. Thanks!

@namhyung
Copy link
Owner Author

namhyung commented Jan 4, 2019

Hmm.. I found a segfault, will take a look.

@namhyung
Copy link
Owner Author

namhyung commented Jan 4, 2019

Pushed review/signal-trigger-v7

Changelog:

  • fix segfault in tailcalls
  • fix segfault in segv_handler

@namhyung
Copy link
Owner Author

Pushed review/signal-trigger-v8

Changelog:

  • fix index in segv_handler()
  • fix PLT hooking return path vs signal on ARM/AArch64 (but still gets occasional crash)

@honggyukim
Copy link
Collaborator

I tested it for chrome and found that it works fine.

I have another request. It'd be really great to have a way to insert external note inside the trace record using signal. It could be the same way as #615.

@honggyukim
Copy link
Collaborator

It can be shown as follows:

$ uftrace replay -f +time
# DURATION     TID        TIMESTAMP       FUNCTION
   2.127 us [ 24141]    31178.248988259 | __monstartup();
   0.689 us [ 24141]    31178.248995043 | __cxa_atexit();
            [ 24141]    31178.248996884 | main() {
            [ 24141]    31178.248997000 | /* external-data: SIGUSR1 received */
            [ 24141]    31178.248997008 |   a() {
            [ 24141]    31178.248997166 |     b() {
            [ 24141]    31178.248997275 |       c() {
   0.972 us [ 24141]    31178.248997420 |         getpid();
   1.798 us [ 24141]    31178.248999073 |       } /* c */
            [ 24141]    31178.248999180 | /* external-data: SIGUSR2 received */
   2.133 us [ 24141]    31178.248999299 |     } /* b */
   2.411 us [ 24141]    31178.248999419 |   } /* a */
   2.634 us [ 24141]    31178.248999518 | } /* main */

@honggyukim
Copy link
Collaborator

If possible, it's also needed to have a way to find unused signal from the target binary because the target program can override the signal.

@namhyung
Copy link
Owner Author

That could be done in the sender side as well - i.e. send signal + get current time (CLOCK_MONOTONIC) + save the message in a text file. After that it could be moved to uftrace.data.

Of course it'd be way more convenient having it in uftrace. Probably I'll add it as a trigger but I'm afraid that it needs some work (getting mtd + current time at least) in the signal handler which can affect the target process - it might crash or deadlock in the worst case.

@honggyukim
Copy link
Collaborator

Okay. It's additional feature, so we can leave it as one of future work items. Thanks.

@namhyung namhyung added this to the v0.9.2 milestone Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants