Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Force thread cleanup in threadpool destructor
The current threadpool design expects that host/client programs that use RocksDB will explicitly call ThreadPoolImpl::Impl::JoinThreads before the threadpool destructor is called. It has been observed that a case may arise in which the join on the threads is not done and the destructor is called by the exit handler of the host program. When this happens, the threadpool destructor will hang on the call to pthread_cond_destroy as show in the truncated stack trace of a hung process below: $ pstack 3885 #0 0x000014e2288ebe63 in pthread_cond_destroy@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 facebook#1 0x000014e225a969af in rocksdb::ThreadPoolImpl::Impl::~Impl (this=0x14e227c472a0, __in_chrg=<optimized out>) at MariaDB/storage/rocksdb/rocksdb/util/threadpool_imp.cc:147 facebook#2 std::default_delete<rocksdb::ThreadPoolImpl::Impl>::operator() (this=<optimized out>, __ptr=0x14e227c472a0) at /gcc-x.5.0/include/c++/x.5.0/bits/unique_ptr.h:78 facebook#3 std::unique_ptr<rocksdb::ThreadPoolImpl::Impl, std::default_delete<rocksdb::ThreadPoolImpl::Impl> >::~unique_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /gcc-x.5.0/include/c++/x.5.0/bits/unique_ptr.h:263 facebook#4 rocksdb::ThreadPoolImpl::~ThreadPoolImpl (this=<optimized out>, __in_chrg=<optimized out>) at MariaDB/storage/rocksdb/rocksdb/util/threadpool_imp.cc:427 facebook#5 0x000014e225af7186 in std::_Destroy<rocksdb::ThreadPoolImpl> (__pointer=<optimized out>) at /gcc-x.5.0/include/c++/x.5.0/bits/stl_construct.h:98 facebook#6 std::_Destroy_aux<false>::__destroy<rocksdb::ThreadPoolImpl*> (__last=<optimized out>, __first=0x14e227c0f190) at /gcc-x.5.0/include/c++/x.5.0/bits/stl_construct.h:108 facebook#7 std::_Destroy<rocksdb::ThreadPoolImpl*> (__last=<optimized out>, __first=<optimized out>) at /gcc-x.5.0/include/c++/x.5.0/bits/stl_construct.h:137 facebook#8 std::_Destroy<rocksdb::ThreadPoolImpl*, rocksdb::ThreadPoolImpl> (__last=0x14e227c0f1c0, __first=<optimized out>) at /gcc-x.5.0/include/c++/x.5.0/bits/stl_construct.h:206 facebook#9 std::vector<rocksdb::ThreadPoolImpl, std::allocator<rocksdb::ThreadPoolImpl> >::~vector (this=0x14e225f65a40 <rocksdb::Env::Default()::default_env+32>, __in_chrg=<optimized out>) at /gcc-x.5.0/include/c++/x.5.0/bits/stl_vector.h:434 facebook#10 rocksdb::(anonymous namespace)::PosixEnv::~PosixEnv (this=0x14e225f65a20 <rocksdb::Env::Default()::default_env>, __in_chrg=<optimized out>) at MariaDB/storage/rocksdb/rocksdb/env/env_posix.cc:133 facebook#11 0x000014e2285685ac in __run_exit_handlers () from target:/lib64/libc.so.6 facebook#12 0x000014e2285685fa in exit () from target:/lib64/libc.so.6 ... The reason this hang occurs is because the threadpool condition variable may still be locked by other threads. To fix the issue, a cleanup of the threads is forced by calling join before the threadpool proceeds with the rest of it's object destruction process. In this way, we handle any corner cases were the destructor is called, but the threadpool's condition variable is still locked by another thread. Lastly in our call to ThreadPoolImpl::Impl::JoinThreads we set the boolean wait_for_jobs_to_complete = false so that the effect of destructing the threadpool is immediate. This change was tested by using MariaDB as a host program, and making it follow an execution path that directly calls the threadpool destructor as MariaDB is exiting: 1) Remove the addr2line binary on the host machine. This will make the MariaDB cleanup handler to directly call the RocksDB threadpool destructor when a MariaDB crash is instigated. 2) Instigate a crash of MariaDB by sending signal 11 to its running process `pkill -11 mysqld`. 3) Check that MariaDB is no longer running: `ps -ef | grep mysqld`. Before this change, the process would still be in existence in a hang state due to the RocksDB threadpool, but after the change the process properly exits. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.
- Loading branch information