Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix run_process() and open_process().__aexit__ leaking an orphan process when cancelled #672

Merged
merged 9 commits into from
Jan 25, 2024

Conversation

gschaffner
Copy link
Collaborator

@gschaffner gschaffner commented Jan 17, 2024

fixes #669.

underlying AnyIO bug

currently, cancellation of async with await open_process() (i.e. Process.__aexit__() in a cancelled scope) can cause the AsyncResource to not get fully closed, leaking an orphan process. this is because a cancelled Process.__aexit__() is not performing the minimum level of cleanup: Trio's cancellation model states that

Async cleanup operations–like __aexit__ methods or async close methods–are cancellable just like anything else except that if they are cancelled, they still perform a minimum level of cleanup before raising Cancelled. (https://trio.readthedocs.io/en/stable/reference-core.html#cancellation-and-primitive-operations)

how it caused #669

#669 is a special case of this bug: in #669, even though run_process does call process.kill if cancelled, it still (very briefly) leaks an orphan process. this is because when process.kill() returns, the process is not necessarily dead yet: process.kill (kill(2)/TerminateProcess) is like call_soon and only schedules the kill. so after process.kill() we need to wait briefly for the death to occur and for the event loop to learn about it.

so the problem seen in #669 was: an orphan process was leaked briefly. this started a race between

  • the event loop closing and
  • the OS killing the process and the event loop learning about it

note that this race is present on both backends. if the event loop closing happened first and won the race, then:

the tests currently added in this PR should cover the underlying AnyIO bug that caused #669 while being deterministic (not flaky). however, if you also want a test that looks more like the original example in #669 (testing asyncio/Trio close & __del__ behavior), here's another test that could be added to this PR:

def test_process_aexit_cancellation_doesnt_orphan_del(
    anyio_backend_name: str, anyio_backend_options: dict[str, Any]
) -> None:
    """
    Regression test for #669.

    Ensure that, when cancelled, open_process.__aexit__() doesn't leave behind any
    __del__() finalizers that emit ResourceWarning (or fail due to
    https://github.com/python/cpython/issues/114177) if there is a race where the event
    loop is closed too soon.

    N.B.: This is a test of asyncio and Trio, not AnyIO: it should pass if
    test_process_aexit_cancellation_doesnt_orphan_process() passes and asyncio/Trio are
    not bugged. This test can also have false negatives (passes when it should fail)
    because it relies on run() finishing either before the OS kills the process or
    before the event loop learns the process died.

    """
    # Don't del process until after run() finishes
    process: Process

    async def main() -> None:
        nonlocal process
        with CancelScope() as scope:
            async with await open_process(
                [sys.executable, "-c", "import time; time.sleep(1)"]
            ) as process:
                scope.cancel()

    run(main, backend=anyio_backend_name, backend_options=anyio_backend_options)

@gschaffner
Copy link
Collaborator Author

i should also note that this appears to also be how Trio implemented Process.aclose before it was removed.

Copy link
Owner

@agronholm agronholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some formatting changes.

src/anyio/_core/_subprocesses.py Outdated Show resolved Hide resolved
src/anyio/_backends/_asyncio.py Show resolved Hide resolved
tests/test_subprocesses.py Outdated Show resolved Hide resolved
tests/test_subprocesses.py Outdated Show resolved Hide resolved
agronholm and others added 2 commits January 22, 2024 16:30
With pytest.mark.xfail, the test is still run and will emit xpass
if/when it stops failing, i.e. it will detect when the mark.xfail should
be removed. In contrast, pytest.xfail immediately force-skips the test,
skipping the xpass check.

The only remaining use of pytest.xfail (in
TestBlockingPortal.test_from_async) needs to remain a pytest.xfail until
someone gets around to agronholm#524 (comment)
because it will deadlock otherwise.
@agronholm
Copy link
Owner

How is it that the xfail test passes? Explain please?

@gschaffner
Copy link
Collaborator Author

oops, i forgot to make the xfail conditional on the backend. fixed.

@agronholm agronholm merged commit 1e60219 into agronholm:master Jan 25, 2024
16 checks passed
@agronholm
Copy link
Owner

Thanks!

@gschaffner gschaffner deleted the fix-669 branch February 1, 2024 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

run_process with fail_after causes RuntimeError on timeout
2 participants