-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tizen] Fix deadlock caused by mutex lock inversion #25573
Conversation
Tizen mDNS API uses callbacks for notifying caller about particular events, like resolving service. Unfortunately, Tizen API internally uses global recursive mutex for every API call, which is not unlocked before calling callbacks... The problem arises when the caller uses its own mutex which needs to be locked inside a callback function passed to mDNS API. We might have a case when: - in thread 1, we are running callback function, which holds mDNS API internal lock, and in that callback we need to lock our lock - in thread 2, mDNS API is called under callers lock, which internally tries to lock its internal lock As a workaround mDNS callback function will not lock matter stack mutex, but instead it will only gather all data and schedule further data processing to a glib idle source callback.
PR #25573: Size comparison from 2086355 to 09621f8 Full report (1 build for cc32xx)
|
Can't we just acquire the stack lock at the top of each iteration of https://github.com/project-chip/connectedhomeip/blob/master/src/platform/Tizen/MainLoop.h? This would ensure that locks are always acquired in the same order in all threads. This would also ensure that any calls from the Tizen's |
I don't see how that would happen, unless we will somehow add matter locking into dns-sd Tizen library... The problem here is that Tizen library calls callback function ( |
Isn't the g_main_loop thread that's dispatching the callbacks started in https://github.com/project-chip/connectedhomeip/blob/master/src/platform/Tizen/MainLoop.h? If so, can't we wrap the g_poll() function with one that matter stack mutex? Better yet, can't we execute Tizen's main loop from the sdk's event loop / scheduler? glib should be amenable to either or these approaches. More generally, if the Tizen platform layer is starting its own event loop on a thread separate from the the sdk's main event loop + thread, it seems like this should have built-in thread safety for calling back into the sdk, as I think many of the tasks, timers and callbacks executing from |
Yes, I can see your point. Some time ago I've been doing some refactoring of glib main thread in Linux platform. Tizen platform might need similar refactoring, because right now instead of running a single dedicated thread with main event loop, new thread is spawned every time async processing is needed. I know that even better would be to incorporate glib main event loop processing into matter event loop. I've started such refactoring in Linux, but I had to skip it because of other important tasks. PS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Before refactorying glib main thread. We can apply to resolve the critical deadlock issue.
And we will prepare Tizen SDK's modification to remove the basic cause of this issue.
@msandstedt What do you think about merging it as is? Currently, I'm working on Tizen glib thread refactoring, but it will take some time. Firstly I will unify all glib threads into one thread, and later I'll try to add proper locking when calling back to matter stack. However, as I think of it right now, adding matter stack lock directly into the glib main event loop will defeat the purpose of separate thread... because all actions will be sequenced with matter thread. Anyway, I'll try to think of something... PS. when done with Tizen platform, Linux also will have to be "fixed", because it also runs separate thread for glib main event loop. |
That sounds fine. The logic looks correct, and I understand the challenge with attempting a larger refactor. |
Tizen mDNS API uses callbacks for notifying caller about particular events, like resolving service. Unfortunately, Tizen API internally uses global recursive mutex for every API call, which is not unlocked before calling callbacks... The problem arises when the caller uses its own mutex which needs to be locked inside a callback function passed to mDNS API. We might have a case when: - in thread 1, we are running callback function, which holds mDNS API internal lock, and in that callback we need to lock our lock - in thread 2, mDNS API is called under callers lock, which internally tries to lock its internal lock As a workaround mDNS callback function will not lock matter stack mutex, but instead it will only gather all data and schedule further data processing to a glib idle source callback.
Tizen mDNS API uses callbacks for notifying caller about particular events, like resolving service. Unfortunately, Tizen API internally uses global recursive mutex for every API call, which is not unlocked before calling callbacks... The problem arises when the caller uses its own mutex which needs to be locked inside a callback function passed to mDNS API. We might have a case when: - in thread 1, we are running callback function, which holds mDNS API internal lock, and in that callback we need to lock our lock - in thread 2, mDNS API is called under callers lock, which internally tries to lock its internal lock As a workaround mDNS callback function will not lock matter stack mutex, but instead it will only gather all data and schedule further data processing to a glib idle source callback.
Problem
Tizen mDNS API uses callbacks for notifying caller about particular events, like resolving service. Unfortunately, Tizen API internally uses global recursive mutex for every API call, which is not unlocked before calling callbacks...
The problem arises when the caller uses its own mutex which needs to be locked inside a callback function passed to mDNS API. We might have a case when:
Changes
As a workaround, mDNS callback function will not lock matter stack mutex, but instead it will only gather all data and schedule further data processing to a glib idle source callback. Fixes #25466
Testing
Tested using test case provided in #25466