-
-
Notifications
You must be signed in to change notification settings - Fork 520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock between do_work, reveive_timeout and receive_command_ack #590
Conversation
Sorry about my force-push, you where to fast... ;-) I had to remove |
Hmm, I just checked the code again, the problem is even more complex and lies here: By using _state_mutex in the way I did, this problem is indirectly solved. And by inverting the order back as it was before, you can get a deadlock where one process has locked _state_mutex and is blocked at borrow_front and the other has locked the queue by calling borrow_front and then blocks when wanting to call borrow_front . So in this case, state_mutex is the correct name but the order locking state_mutex and then calling borrow_front has to be always the same to be deadlock prof. However, I think that's not all of the problem. I think the actions are: Result: This is exactly the state I get from my backtrace, I just mapped the line numbers to develop. I'll suppose a better way of implementing locked_queue as soon as I find time. |
Agreed, that's actually silly!
Right ?
I wonder if we can just drop the silly relocking on |
No, borrow_front locks the queue. Here I described what would happen if you don't apply my pull request and just fix locked_queue. Bad (mixed order): Good (same order):
It's actually not so simple. In pop_front(), you have to check if the mutex is already locked. If not, something went wrong. |
You wrote borrow front and then again borrow front which confused me.
You're right. What if we make it simple and instead of using this locked queue just have a normal queue and one mutex for both the state and the queue? |
That's sounds also good! This way you can always use lock_guards and unique_locks. The way it is implemented now if you forget one return_front or pop_front, the queue is locked forever. |
I was thinking about some other possibilities, but assuming that return_front can be called from different processes without calling borrow_front first, none of them will work surely. So I think using a normal queue and renaming _state_mutex to _mutex or so is really the best way. If you make sure borrow_front is always called first and later pop or return, this should also work: However, if you change the assert in return_front to: |
I think we should include this change anyway: hdiethelm/DronecodeSDK@c62f6e6 and adapt all places where it is used.
The question is actually, do we use a mutex because we want to protect the |
So, for now I'm gonna merge this as is because you convinced me that this fixes the problem for now. |
Followed up in #601. |
Same as #589 for develop branch