[Python] Fix deadlock caused by ExecutorService::close #11882
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #11847
Motivation
There's a deadlock that might happen when Python client enables custom logging. From stack traces of #11847, we can see there're 3 threads when the Python program hanged:
ExecutorServiceProvider::close
, which waits until all worker threads ofExecutorService
completed bystd::thread::join
.PyGILState_Ensure
, which tried to acquire Python GIL. It's called inClientConnection::handleRead
. Since all pending events were cancelled byboost::asio::io_service::stop
, these callbacks were completed withboost::asio::error::operation_aborted
immediately.ExecutorService
. It waited until all callbacks includingClientConnection::handleRead
completed.The root cause might be related to Python's GIL issues. It seems like CPython APIs might not work well in C++ destructors. It might be caused by some lifetime issues. But the direct cause is thread 1 was blocked by joining a worker thread.
Modifications
Detach the worker thread instead of join in
ExecutorService::close
to avoid potential deadlock. The close method could be called inClientImpl
's destructor, which callsshutdown
. It's better to call these blocking methods in C++ destructors even without this issue.In addition, this PR reduces the log level from error to debug for the
boost::asio::error::operation_aborted
error code, which means the registered event is cancelled by a close event of the event loop.