Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SIGSEGV in CCryptoControl::sendKeysToPeer #2234

Closed
jeandube opened this issue Jan 26, 2022 · 6 comments
Closed

[BUG] SIGSEGV in CCryptoControl::sendKeysToPeer #2234

jeandube opened this issue Jan 26, 2022 · 6 comments
Assignees
Labels
[core] Area: Changes in SRT library core Priority: Critical Type: Bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@jeandube
Copy link
Collaborator

jeandube commented Jan 26, 2022

A group Caller/Receiver crashed while receiving stream.

Application receiving Broadcast group (ports 9173,9174), retransmitted to call-in peer on port 9956.
Here my command for my own reference:
gdb mxphub
(gdb) run -cr srt://10.65.10.234:9173 -Gt1 -Sea -p passphrase1 -cr srt://192.168.10.234:9174 -G+ -Sea -lt srt://[::]:9956 -Sea -p passphrase2 -D7
Network Interface eth1 (192.168.10.230) is broken and 2nd link attempts reconnection every 8 sec (configured CONNTIMEO).
Not a voluntary test case. Broken eth1 was unnoticed until looking at the trace since stream was playing on monitor up to the crash since Link 1 was working.
According to the thread info this is probably the crashing session.

gdb session trace attached
sendKeysToPeer-SIGSEGV-gdb.txt

SRT v1.4.4 + some changes (branch jeandube:fix-binding-any, commit 7801866).

@jeandube jeandube added the Type: Bug Indicates an unexpected problem or unintended behavior label Jan 26, 2022
@maxsharabayko maxsharabayko added [core] Area: Changes in SRT library core Priority: Critical labels Jan 27, 2022
@maxsharabayko maxsharabayko added this to the v1.4.5 milestone Jan 27, 2022
@maxsharabayko maxsharabayko self-assigned this Jan 27, 2022
@maxsharabayko
Copy link
Collaborator

From the backtrace it looks as if the m_pCryptoControl is NULL (see this=0x0):

(gdb) bt
#0  0x00000000004e8c40 in CCryptoControl::sendKeysToPeer (this=0x0, regen=REGEN_KM)
    at /home/jdube/sandbox/makito2_project/components/vendors/haisrt/srt.git/srtcore/crypto.cpp:425
#1  0x00000000004c6ccc in srt::CUDT::checkSndTimers (this=0x7fcc01ff48, regen=REGEN_KM)
    at /home/jdube/sandbox/makito2_project/components/vendors/haisrt/srt.git/srtcore/core.cpp:5821
#2  0x00000000004ce9e0 in srt::CUDT::processCtrlAck (this=0x7fcc01ff48, ctrlpkt=..., currtime=...)
    at /home/jdube/sandbox/makito2_project/components/vendors/haisrt/srt.git/srtcore/core.cpp:8248

However, (gdb) p *this in the srt::CUDT::checkSndTimers(..) shows the m_pCryptoControl is not NULL.

  m_bTLPktDrop = false, m_pCryptoControl = {
    _M_t = {<std::_Tuple_impl<0ul, CCryptoControl* (...)

There is already a check for NULL. However, m_RcvBufferLock is not locked, while it is used to protect m_pCryptoControl::reset() call 🤯

void srt::CUDT::checkSndTimers(Whether2RegenKm regen)
{
    // (...)
    if (regen || m_SrtHsSide == HSD_INITIATOR)
    {
        // Don't call this function in "non-regen mode" (sending only),
        // if this side is RESPONDER. This shall be called only with
        // regeneration request, which is required by the sender.
->      if (m_pCryptoControl)
            m_pCryptoControl->sendKeysToPeer(regen);
    }
}

I wonder though why is m_bTLPktDrop = false. Was it configured this way? Otherwise, the stack might be corrupted.

@jeandube
Copy link
Collaborator Author

Log shows that test app sets SRTO_TLPKTDROP to 1 for the srt (group) receiver ([1]srt:) but to 0 for the re-sender ([3]srt:).
Both sessions are encrypted. The SIGSEGV happens in thread SRT:RcvQ:w3 which is the re-sender ([3]srt:). The initial analysis was wrong, it is the listener/re-sender that crashes and not the group caller/receiver.

@maxsharabayko
Copy link
Collaborator

The problem is that CUDT::processCtrl(..) can access m_pCryptoControl without any protection, so that another thread can potentially destroy the object via CUDT::closeInternal().
It looks like the CUDT::m_ConnectionLock must protect the m_pCryptoControl, but it is not locked even in CUDT::processData(..).

CRcvQueue::worker_ProcessAddressedPacket(..)
    CUDT::processCtrl(..) // if control packet
        CUDT::processCtrlAck(..)
            m_RecvAckLock lock/unlock
            m_GlobControlLock lock/unlock
            m_StatsLock lock/unlock
            CUDT::checkSndTimers(REGEN_KM)
                if (m_pCryptoControl)        <-- unprotected access !!!
                    m_pCryptoControl->sendKeysToPeer(regen);
    CUDT::processData(..) // if data packet
        m_RcvTsbPdStartupLock lock/unlock
        m_StatsLock lock/unlock
        m_GlobControlLock lock/unlock
        lock m_RcvBufferLock
        m_pRcvBuffer->insert(*i)
        m_pCryptoControl->decrypt(..)    <-- access is protected by m_RcvBufferLock, but not by m_ConnectionLock
        unlock m_RcvBufferLock
        // some further processing
CUDT::closeInternal()
    ScopedLock connectguard(m_ConnectionLock);
    ScopedLock sendguard(m_SendLock);
    ScopedLock recvguard(m_RecvLock);
    // Locking m_RcvBufferLock to protect calling to m_pCryptoControl->decrypt((packet))
    // from the processData(...) function while resetting Crypto Control.
    enterCS(m_RcvBufferLock);
    if (m_pCryptoControl)
        m_pCryptoControl->close();
    leaveCS(m_RcvBufferLock);

maxsharabayko added a commit to maxsharabayko/srt that referenced this issue Apr 20, 2022
@maxsharabayko
Copy link
Collaborator

Can't just lock m_ConnectionLock from the processCtrl(..) function. It would cause a deadlock against the GC thread.

CUDT::processCtrl(..) {
  // Locking m_ConnectionLock here would dead lock with the GC thread.
  CUDT::processCtrlShutdown(..) {
    CUDT::completeBrokenConnectionDependencies {
      lock m_GlobControlLock;
      
CUDTUnited::garbageCollect(..) {
  lock m_GlobControlLock;
  CUDTSocket::breakSocket_LOCKED() {
    CUDT::closeInternal() {
      lock m_ConnectionLock;
  }

@gou4shi1
Copy link
Contributor

gou4shi1 commented Apr 23, 2022

I think the GC thread hold m_GlobControlLock too long, related issue #2252.

@maxsharabayko maxsharabayko modified the milestones: v1.5.0, Next Release May 4, 2022
@maxsharabayko maxsharabayko modified the milestones: Next Release, Backlog Jun 15, 2022
@maxsharabayko
Copy link
Collaborator

This issue has been occasionally fixed by PR #2541 (SRT v1.5.2). The call to checkSndTimers(REGEN_KM) with an unprotected access to CryptoControl was moved from processCtrlAck to processData.

@maxsharabayko maxsharabayko modified the milestones: Backlog, v1.5.2 Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Priority: Critical Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants