Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Fixed: closing socket should mark and signal so that srt_connect call can exit immediately #2032

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ethouris
Copy link
Collaborator

@ethouris ethouris commented Jun 1, 2021

Fixes #2029

Note that some parts of it have been earlier exported to other PRs, so this one fixes only one remaining problem:

The srt_connect call makes a runaround loop with sending packets and attempting to read packets from the socket (through CRcvQueue::recvfrom) in a hope that it will connect at last. The problem is that for the whole time of running this loop it locks CSocket::m_ControlLock and CUDT::m_ConnectionLock (in this order), which holds the call to srt_close() from execution until this loop exits, and the loop exits only on timeout. The reading from the queue causes reading from socket with 1s maximum waiting time after which the conditions are being rechecked.

Fix:

  1. Added a possibility to interrupt the call to CRcvQueue::recvfrom before the 1s timeout.
  2. The m_bClosing AND m_bBroken flags are checked after reading by CRcvQueue::recvfrom so that the loop can interrupt immediately.
  3. The srt_close call sets the m_bClosing flag and "kicks" the condition for CRcvQueue::recvfrom. This finally leads to immediate exit of the loop, so that srt_close() can continue.

@maxsharabayko
Copy link
Collaborator

Weird...

The following tests FAILED:
	 57 - TestEnforcedEncryption.PasswordLength (SEGFAULT)
	 58 - TestEnforcedEncryption.SetGetDefault (SEGFAULT)
	 87 - Transmission.FileUpload (Failed)
	143 - TestSocketOptions.MinInputBWWrongLen (SEGFAULT)
	147 - TestSocketOptions.StreamIDWrongLen (SEGFAULT)

@maxsharabayko maxsharabayko added [core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior labels Jun 2, 2021
@maxsharabayko maxsharabayko added this to the v1.4.4 milestone Jun 2, 2021
@ethouris
Copy link
Collaborator Author

ethouris commented Jun 2, 2021

Not weird. The problem is bigger than I thought. I found a solution, but it's a dirty workaround, you might not like it.

Mikołaj Małecki added 2 commits June 2, 2021 11:14
…fore changing the flag. Otherwise it would confuse the closing function when used on a connected socket
@maxsharabayko
Copy link
Collaborator

Indeed not sure about this fix.

It looks like if srt_close(..) is called between the start of the CUDT::startConnect(..) and the line with m_bConnecting = true;, then m_bClosing is not set to true, and the hang-up for the SRTO_CONNECTTIMEO would still happen. Although the chances are reduced.

Furthermore, the fix might make even more fragile:

The problem is that this flag shall NOT be set in case
when you have a CONNECTED socket because not only isn't it
not a problem in this case, but also it additionally
turns the socket in a "confused" state in which it skips
vital part of closing itself and therefore runs an infinite
loop when trying to purge the sender buffer of the closing socket.

At the same time, I don't see an easy way to fix issue #2029 without intense refactoring. Unlocking m_ControlLock before s->core().startConnect(target_addr, forced_isn) seems to violate the locking order. 🤔

@maxsharabayko maxsharabayko modified the milestones: v1.4.4, v1.4.5 Aug 18, 2021
@maxsharabayko maxsharabayko modified the milestones: v1.4.5, Next Release Apr 25, 2022
@maxsharabayko maxsharabayko modified the milestones: v1.5.1, Next release Sep 12, 2022
@maxsharabayko maxsharabayko modified the milestones: v1.5.2, Backlog Dec 12, 2022
Copy link

codecov bot commented Jun 25, 2024

Codecov Report

Attention: Patch coverage is 97.72727% with 1 line in your changes missing coverage. Please review.

Project coverage is 67.14%. Comparing base (4f925fb) to head (cf3850c).
Report is 11 commits behind head on master.

Current head cf3850c differs from pull request most recent head 9967d1d

Please upload reports for the commit 9967d1d to get more accurate results.

Files Patch % Lines
test/test_file_transmission.cpp 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2032      +/-   ##
==========================================
+ Coverage   64.88%   67.14%   +2.26%     
==========================================
  Files         101      103       +2     
  Lines       17634    20486    +2852     
==========================================
+ Hits        11441    13756    +2315     
- Misses       6193     6730     +537     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ethouris
Copy link
Collaborator Author

Ok, I think I know why the tests on Travis fail. The socket gets closed, but somehow the binding wasn't freed, which is exactly the problem to solve here. So the problem isn't solved.

@maxsharabayko maxsharabayko modified the milestones: Backlog, v1.6.0 Aug 26, 2024
@ethouris
Copy link
Collaborator Author

Ok, it looks like all tests pass now. Out of these two new tests, RapidClose passes even with this fix blocked, but RendezvousHangs is failing. Likely all most important fixes were moved elsewhere, and this one just kicks off the spare buffer CV to prevent it from blocking the closing action.

@ethouris ethouris marked this pull request as ready for review August 30, 2024 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] srt_rendezvous hangs after calling srt_close in different thread
2 participants