Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] fix tsbpd() may deadlock with processCtrlShutdown() #2740

Merged
merged 1 commit into from
Aug 9, 2023

Conversation

gou4shi1
Copy link
Contributor

@gou4shi1 gou4shi1 commented May 17, 2023

I experienced a deadlock issue, the CRcvQueue::worker thread was blocked in processCtrlShutdown()->releaseSynch()->pthread_join() forever:

(gdb) thread 37 
[Switching to thread 37 (LWP 2609)]
#0  0x00007fd2f29cd98d in pthread_join (threadid=140541933123328, thread_return=0x0) at pthread_join.c:90
90	pthread_join.c: No such file or directory.
(gdb) bt
#0  0x00007fd2f29cd98d in pthread_join (threadid=140541933123328, thread_return=0x0) at pthread_join.c:90
#1  0x00007fd2f22a0c33 in std::thread::join() () from /home/proxy_server/lib/libstdc++.so.6
#2  0x000055dd734a4242 in srt::CUDT::releaseSynch (this=this@entry=0x7fd241ee63f8) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/core.cpp:7645
#3  0x000055dd734ad456 in srt::CUDT::updateBrokenConnection (this=this@entry=0x7fd241ee63f8) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/core.cpp:11426
#4  0x000055dd734ad8d0 in srt::CUDT::processCtrlShutdown (this=0x7fd241ee63f8) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/core.cpp:8916
#5  srt::CUDT::processCtrl (this=0x7fd241ee63f8, ctrlpkt=...) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/core.cpp:8994
#6  0x000055dd734c6cda in srt::CRcvQueue::worker_ProcessAddressedPacket (this=0x55dd746c3e90, id=<optimized out>, unit=0xca, addr=...) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/queue.cpp:1476
#7  0x000055dd734c5e21 in srt::CRcvQueue::worker (param=0x55dd746c3e90) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/queue.cpp:1255
#8  0x00007fd2f22a0a00 in ?? () from /home/proxy_server/lib/libstdc++.so.6
#9  0x00007fd2f29cc6ba in start_thread (arg=0x7fd24e7fc700) at pthread_create.c:333
#10 0x00007fd2f1cf551d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) p this->m_SocketID
$1 = 94643394

While the corresponding tsbpd thread of @94643394 was blocked in wait() forever:

(gdb) thread 67
[Switching to thread 67 (LWP 3397)]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fd2f229b0fc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /home/proxy_server/lib/libstdc++.so.6
#2  0x000055dd734a26f1 in srt::sync::CSync::wait (this=<optimized out>) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/sync.h:519
#3  srt::CUDT::tsbpd (param=0x7fd241ee63f8) at /worker/build/dc588689bec99c48/root/external/srt/srtcore/core.cpp:5495
#4  0x00007fd2f22a0a00 in ?? () from /home/proxy_server/lib/libstdc++.so.6
#5  0x00007fd2f29cc6ba in start_thread (arg=0x7fd277fff700) at pthread_create.c:333
#6  0x00007fd2f1cf551d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) p ((srt::CUDT*)param)->m_SocketID
$59 = 94643394

The only reason I can find is that the

CSync::lock_notify_one(m_RcvTsbPdCond, m_RecvLock);

happened while the tsbpd thread was under
InvertedLock unrecv(self->m_RecvLock);

The order was
TsbPd: unlock(m_RecvLock) ->
RcvQ: lock_notify(m_RcvTsbPdCond, m_RecvLock); m_RcvTsbPdThread.join() ->
TsbPd: lock(m_RecvLock); tsbpd_cc.wait()

The proposed solution is simple. We just need to check m_bClosing again after re-locking m_RecvLock:

  • If m_bClosing = true happens before this check, the tsbpd thread will exit after this check.
  • If m_bClosing = true happens after this check, the following lock_notify(m_RcvTsbPdCond, m_RecvLock) will happen after tsbpd_cc.wait() (because of the m_RecvLock), so the tsbpd thread will be woken up and cleared.

@maxsharabayko maxsharabayko added Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core labels May 17, 2023
@maxsharabayko maxsharabayko added this to the v1.6.0 milestone May 17, 2023
@maxsharabayko maxsharabayko modified the milestones: v1.6.0, v1.5.3 Aug 9, 2023
@maxsharabayko maxsharabayko merged commit 7cfe12b into Haivision:master Aug 9, 2023
@gou4shi1 gou4shi1 deleted the fix-tsbpd-deadlock branch August 9, 2023 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants