-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock occurs between variable server creating threads and main thread shutdown #1656
Comments
I was able to re-create this with the aircraft sim - GUI disabled, short duration, disabled realtime, and a standalone python subprocess that would hammer the variable server. Not 100% repeatable either, its about 1/3. (gdb) thread apply all bt
Thread 2 (Thread 0x7f05fafbf640 (LWP 681091) "VarServListen"):
#0 futex_wait (private=0, expected=2, futex_word=0x564741086180 <Trick::SysThread::list_mutex()::list_mutex>) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x564741086180 <Trick::SysThread::list_mutex()::list_mutex>, private=0) at ./nptl/lowlevellock.c:49
#2 0x00007f05fc4a8002 in lll_mutex_lock_optimized (mutex=0x564741086180 <Trick::SysThread::list_mutex()::list_mutex>) at ./nptl/pthread_mutex_lock.c:48
#3 ___pthread_mutex_lock (mutex=0x564741086180 <Trick::SysThread::list_mutex()::list_mutex>) at ./nptl/pthread_mutex_lock.c:93
#4 0x00005647409aa4ae in Trick::SysThread::SysThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
#5 0x00005647409ba98c in Trick::VariableServerSessionThread::VariableServerSessionThread(Trick::VariableServerSession*) ()
#6 0x00005647409ba8fa in Trick::VariableServerSessionThread::VariableServerSessionThread() ()
#7 0x00005647409b8693 in Trick::VariableServerListenThread::thread_body() ()
#8 0x00005647409ace98 in Trick::ThreadBase::thread_helper(void*) ()
#9 0x00007f05fc4a4ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#10 0x00007f05fc536850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 1 (Thread 0x7f05fc3be740 (LWP 681089) "S_main_Linux_11"):
#0 __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=681091, futex_word=0x7f05fafbf910) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=681091, futex_word=0x7f05fafbf910) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f05fafbf910, expected=681091, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128) at ./nptl/futex-internal.c:139
#3 0x00007f05fc4a6624 in __pthread_clockjoin_ex (threadid=139663662380608, thread_return=0x0, clockid=0, abstime=0x0, block=<optimized out>) at ./nptl/pthread_join_common.c:105
#4 0x00005647409acbec in Trick::ThreadBase::join_thread() ()
#5 0x00005647409aa771 in Trick::SysThread::ensureAllShutdown() ()
#6 0x0000564740940973 in Trick::Executive::shutdown() ()
#7 0x0000564740a4c8b2 in master(int, char**) ()
#8 0x000056474086157b in main ()
(gdb) |
I ran into this as well, here is a report I gave: I have a koviz variable server client connect to a sim's variable server. If I shutdown the sim from the sim control panel, then exit the sim control panel, the panel closes, but the S_main*.exe remains running, and I have to manually kill it. I broke into the S_main*.exe process and see: Main Thread: Child Thread: If I shutdown the koviz vs client before I shutdown the sim, the sim gracefully shuts down. If one wants to recreate the issue using koviz, let me know. I can get it to hang almost every time. The description of hanging on reconnect makes sense. Koviz tries to reconnect after the sim drops. The theory that the hang occurs due to a reconnect during shutdown sounds correct. |
We'll begin looking into this one directly, thanks for the thorough reporting. |
Thank you all for the very thorough and useful bug report. We are working on a fix now and it will be released with the next minor Trick release. At that time, please let us know if you still experience the same issue. |
Overview
We recently upgraded to Trick version 19.7.1 from version 19.4.0 and we began running into a regular failure where the trick sim would fail to shut down properly and hang infinitely. We root caused this to a deadlock between the main thread and the
VariableServerListenThread
during shutdown. We believe this was introduced as part of the new thread management/shutdown behavior in #1448.Investigation
After running our trick sim with
gdb
, we discovered that the main thread was stuck in the shutdown loop while it was attempting to join all theSysThread
s. More specifically, the main thread was stuck joining theVariableServerListenThread
which is in turn handling a new connection request. At this point in time the main thread holds the lock for thelist_mutex()
, which controls access toall_sys_threads()
. When theVariableServerListenThread
creates aVariableServerSessionThread
it attempts to add that object intoall_sys_threads()
, which means it must acquirelist_mutex()
. This creates a deadlock where the main thread holds thelist_mutex()
and is waiting to join theVariableServerListenThread
, which is in turn blocked on acquiring to lock forlist_mutex()
.We reached this conclusion by following stack traces. When the process deadlocks and we hit
Ctrl + C
ingdb
, we can see the following stack trace:From here, we can see that the main thread is stuck in
ensureAllShutdown()
attempting to join threads. Our next line of inquiry was to figure out which thread was being joined. Jumping into frame 2 and printing the thread object shows us that it is theVariableServerListenThread
:Now that we know the thread we can list the threads out and switch to the appropriate thread so we can see why that thread is not closing:
Now that we've switched to the
VarServListen
thread we can again look at the stack trace:This stack trace shows us that the VariableServerListenThread has received a new connection and that it is in the process of constructing a
VariableServerSessionThread
, which in turn creates aSysThread
, which is what attempts to acquirelist_mutex()
and add toall_sys_threads()
.I suspect this is all happening due to our setup that regularly attempts connect a new variable server client if the previous connection dropped. In this case, the connection dropped because the sim was being shut down properly, but the attempt at re-connection deadlocked the shutdown process. I am able to reliably reproduce the issue with our sim setup, but I have not yet attempted to reproduce the issue on one of the demo sims. I would recommend that someone looking to reproduce send a large amount of variable server connection requests directly after requesting shutdown.
The text was updated successfully, but these errors were encountered: