Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in v2.6.2 #2961

Closed
1 task done
Barry-Xu-2018 opened this issue Sep 21, 2022 · 11 comments
Closed
1 task done

Deadlock in v2.6.2 #2961

Barry-Xu-2018 opened this issue Sep 21, 2022 · 11 comments

Comments

@Barry-Xu-2018
Copy link
Contributor

Barry-Xu-2018 commented Sep 21, 2022

Is there an already existing issue for this?

  • I have searched the existing issues

Expected behavior

No deadlock occurs at startup

Current behavior

A high deadlock rate occurs at startup.

Steps to reproduce

The scenario while deadlock occurs.

thread3
thread2
thread1

##1 Get a lock on mp_mutex in Thread3
##2 Get shared lock of endpoints_list_mutex in Thread2
##3 Trying to get mp_mutex in Thread2, but it is blocked because it is already locked in Thread3
##4 Trying to get write lock of endpoints_list_mutex in Thread1, but it is blocked because there is a reader in ##2.

while (state_ & n_readers_)
{
gate2_.wait(lk);
}

write_entered flag is set, and following endpoints_list_mutex reads are blocked.
while ((state_ & write_entered_) || (state_ & n_readers_) == n_readers_)
{
gate1_.wait(lk);
}

##5 Trying to get shared lock of endpoints_list_mutex in Thread3, but it is blocked because of the write_entered flag

Fast DDS version/commit

v2.6.2

Platform/Architecture

Ubuntu Focal 20.04 amd64

Transport layer

Default configuration, UDPv4 & SHM

Additional context

For same codes, there is no deadlock with v2.6.0

XML configuration file

No response

Relevant log output

No response

Network traffic capture

No response

@Barry-Xu-2018 Barry-Xu-2018 added the triage Issue pending classification label Sep 21, 2022
@fujitatomoya
Copy link
Contributor

@MiguelCompany @eProsima/team this is deadlock issue, just friendly ping.

@MiguelCompany
Copy link
Member

@Barry-Xu-2018 @fujitatomoya There's a proposed fix in #2976, could you check with it?

@fujitatomoya
Copy link
Contributor

@MiguelCompany thanks! we will try that out and get back to you.

@MiguelCompany MiguelCompany removed the triage Issue pending classification label Oct 14, 2022
@MiguelCompany
Copy link
Member

@Barry-Xu-2018 @fujitatomoya Did you have time to check whether #2976 fixes this?

@fujitatomoya
Copy link
Contributor

@MiguelCompany i will check the evaluation status, will get back to you soon.

@Barry-Xu-2018
Copy link
Contributor Author

@MiguelCompany According to changed code, I think it can fix this problem. Fujita-san will provide final evaluation result in the real environment.

@fujitatomoya
Copy link
Contributor

Fujita-san

that is me 😄 family name!

@wade30822
Copy link

@fujitatomoya hello,how is the final evaluation about #2976 going?

@fujitatomoya
Copy link
Contributor

sorry we confirmed that no deadlock observed after this PR.

@wade30822
Copy link

@fujitatomoya thx~

@MiguelCompany
Copy link
Member

Closing based on #2961 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants