Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix: ust: app stuck on recv message during UST comm timeout scenario
Observed issue ============== The following scenario lead to the UST thread to be "stuck" on recvmsg on the notify socket. The problem manifest itself when an application is unresponsive during the ustctl_start_session call. Note that the default timeout for ust communication is 5 seconds. # Start an instrumented app ./app gdb lttng-sessiond # put a breakpoint on ustctl_start_session lttng create my_session lttng enable-event -u -a lttng start # The tracepoint should hit. Do not continue. kill -s SIGSTOP $(pgrep app) # Continue lttng-sessiond. sleep 5 # This make sure lttng-sessiond unregister the app from its point of view kill -s SIGCONT $(pgrep app) gdb -p $(pgrep app) thread apply all bt App stack trace: Thread 3 (Thread 0x7fe2c6f58700 (LWP 48172)): #0 __libc_recvmsg (flags=0, msg=0x7fe2c6f56ac0, fd=4) at ../sysdeps/unix/sysv/linux/recvmsg.c:28 #1 __libc_recvmsg (fd=fd@entry=4, msg=msg@entry=0x7fe2c6f56ac0, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:25 #2 0x00007fe2c7a010ba in ustcomm_recv_unix_sock (sock=sock@entry=4, buf=buf@entry=0x7fe2c6f56ea0, len=len@entry=48) at lttng-ust-comm.c:308 #3 0x00007fe2c7a037c3 in ustcomm_register_channel (sock=4, session=session@entry=0x7fe2c0000ba0, session_objd=<optimized out>, channel_objd=<optimized out>, nr_ctx_fields=nr_ctx_fields@entry=0, ctx_fields=<optimized out>, chan_id=0x7fe2 c6f5716c, header_type=0x7fe2c0012b18) at lttng-ust-comm.c:1544 #4 0x00007fe2c7a10787 in lttng_session_enable (session=0x7fe2c0000ba0) at lttng-events.c:444 #5 0x00007fe2c7a0b785 in lttng_session_cmd (objd=1, cmd=128, arg=140611977311672, uargs=0x7fe2c6f57800, owner=0x7fe2c7a5da00 <local_apps>) at lttng-ust-abi.c:576 #6 0x00007fe2c7a07d6d in handle_message (lum=0x7fe2c6f57590, sock=3, sock_info=0x7fe2c7a5da00 <local_apps>) at lttng-ust-comm.c:1003 #7 ust_listener_thread (arg=0x7fe2c7a5da00 <local_apps>) at lttng-ust-comm.c:1712 #8 0x00007fe2c7993609 in start_thread (arg=<optimized out>) at pthread_create.c:477 lttng#9 0x00007fe2c78ba293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ... Cause ===== When the app continues after the timeout from lttng-sessiond side, the actual start_session message is received on the application side then UST, app side, send commands on the notify socket. On lttng-sessiond side, the command is received but no reply is sent. This is due to the fact that the lookup against the ust_app_ht_by_notify_sock hash table (find_app_by_notify_sock) return nothing since the app is unregistered at this point and the hash table node was removed on unregistration. Solution ======== When the app lookup fails, return an error that will trigger the cleanup of the notify socket. Known drawbacks ========= None Note ========= Subsequent error path in reply_ust_register_channel, add_event_ust_registry, and add_enum_ust_registry might lead to the same type of problem since no reply is sent to the app. Still, for those cases the complete application/notify socket should not be destroyed since the error path relate to either a session or a sub object of a session. Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Change-Id: Iea0dc027ca1ee772e84c7e545114f1be69fd1f63 Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
- Loading branch information