Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Fix deadlock caused by ExecutorService::close #11882

Merged
merged 1 commit into from
Sep 2, 2021

Conversation

BewareMyPower
Copy link
Contributor

Fixes #11847

Motivation

There's a deadlock that might happen when Python client enables custom logging. From stack traces of #11847, we can see there're 3 threads when the Python program hanged:

  1. The thread to call ExecutorServiceProvider::close, which waits until all worker threads of ExecutorService completed by std::thread::join.
  2. The thread to use Python object for logging. It stuck at PyGILState_Ensure, which tried to acquire Python GIL. It's called in ClientConnection::handleRead. Since all pending events were cancelled by boost::asio::io_service::stop, these callbacks were completed with boost::asio::error::operation_aborted immediately.
  3. The worker thread of ExecutorService. It waited until all callbacks including ClientConnection::handleRead completed.

The root cause might be related to Python's GIL issues. It seems like CPython APIs might not work well in C++ destructors. It might be caused by some lifetime issues. But the direct cause is thread 1 was blocked by joining a worker thread.

Modifications

Detach the worker thread instead of join in ExecutorService::close to avoid potential deadlock. The close method could be called in ClientImpl's destructor, which calls shutdown. It's better to call these blocking methods in C++ destructors even without this issue.

In addition, this PR reduces the log level from error to debug for the boost::asio::error::operation_aborted error code, which means the registered event is cancelled by a close event of the event loop.

@BewareMyPower BewareMyPower self-assigned this Sep 1, 2021
@BewareMyPower BewareMyPower added component/c++ doc-not-needed Your PR changes do not impact docs release/2.8.2 type/bug The PR fixed a bug or issue reported a bug labels Sep 1, 2021
@BewareMyPower BewareMyPower force-pushed the bewaremypower/fix-py-hang branch from ed97331 to d72cb0b Compare September 1, 2021 17:09
@merlimat merlimat merged commit 43b4ff6 into apache:master Sep 2, 2021
hangc0276 pushed a commit that referenced this pull request Sep 4, 2021
@hangc0276 hangc0276 added cherry-picked/branch-2.8 Archived: 2.8 is end of life release/2.8.1 and removed release/2.8.2 labels Sep 4, 2021
@BewareMyPower BewareMyPower deleted the bewaremypower/fix-py-hang branch September 22, 2021 11:04
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.8 Archived: 2.8 is end of life doc-not-needed Your PR changes do not impact docs release/2.8.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
4 participants