gh-113538: Allow client connections to be closed #114432

CendioOssman · 2024-01-22T14:56:49Z

Give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work.

Issue: Cannot cleanly shut down an asyncio based server #113538

📚 Documentation preview 📚: https://cpython-previews--114432.org.readthedocs.build/

Give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work.

CendioOssman · 2024-01-22T14:58:36Z

PR includes unit tests, but I also used this more complete example to test and showcase the functionality:

#!/usr/bin/python3

import asyncio
import socket

# Well behaved or spammy protocol?
FLOOD = True

class EchoServerProtocol(asyncio.Protocol):
    def connection_made(self, transport):
        peername = transport.get_extra_info('peername')
        print('Connection from {}'.format(peername))
        self.transport = transport
        self._paused = False
        self._write()

    def pause_writing(self):
        self._paused = True

    def resume_writing(self):
        self._paused = False
        self._write()

    def _write(self):
        if not FLOOD:
            return
        if self._paused:
            return
        self.transport.write(b'a' * 65536)
        asyncio.get_running_loop().call_soon(self._write)

    def data_received(self, data):
        message = data.decode()
        print('Data received: {!r}'.format(message))

def stop():
    loop = asyncio.get_running_loop()
    loop.stop()

client = None

def server_up(fut):
    global server
    server = fut.result()
    print('Server running')
    global client
    client = socket.create_connection(('127.0.0.1', 8888))

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)

fut = asyncio.ensure_future(loop.create_server(
        lambda: EchoServerProtocol(),
        '127.0.0.1', 8888))
fut.add_done_callback(server_up)

# Simulate termination
loop.call_later(2.5, stop)

try:
    loop.run_forever()
finally:
    print('Closing...')
    server.close()
    try:
        loop.run_until_complete(asyncio.wait_for(server.wait_closed(), 0.1))
    except TimeoutError:
        print('Timeout waiting for clients. Attempting close...')
        server.close_clients()
        try:
            loop.run_until_complete(asyncio.wait_for(server.wait_closed(), 0.1))
        except TimeoutError:
            print('Timeout waiting for client close. Attempting abort...')
            server.abort_clients()
            loop.run_until_complete(asyncio.wait_for(server.wait_closed(), 0.1))
    loop.close()
    if client is not None:
        client.close()
    print('Closed')

gvanrossum

This looks great, but I have one reservation. The server now keeps the transports alive, through the set of clients. Maybe it would be better if that was a weak set, so that if somehow a transport was freed without going through the usual close routine, the server doesn't hold onto it forever? Thoughts?

CendioOssman · 2024-02-02T15:38:41Z

I figured I'd start things simple, so no weak references unless there is a practical issue that requires them.

So, are there any scenarios where this is relevant?

The transport detaches from the server in the same place where it closes the server. That should mean that this is something that must be done, as otherwise the destructor will complain. So I don't think well-written code should have an issue.

Might there be an issue with badly written code, then? The primary case would be if the destructor check fails to trigger. Which I guess is the case with my suggestion. It is difficult to trigger, though. Streams are a bit buggy, so abandoning them doesn't always result in a cleanup (#114914), and transports are referenced by the event loop until you tell them to pause reading.

But difficult isn't impossible, so this case breaks when applying this PR:

#!/usr/bin/python3

import asyncio
import socket

class EchoServerProtocol(asyncio.Protocol):
    def connection_made(self, transport):
        peername = transport.get_extra_info('peername')
        print('Connection from {}'.format(peername))
        self.transport = transport
        self.transport.pause_reading()

    def data_received(self, data):
        message = data.decode()
        print('Data received: {!r}'.format(message))

client = None

def server_up(fut):
    global server
    server = fut.result()
    print('Server running')
    global client
    client = socket.create_connection(('127.0.0.1', 8888))

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)

fut = asyncio.ensure_future(loop.create_server(
        lambda: EchoServerProtocol(),
        '127.0.0.1', 8888))
fut.add_done_callback(server_up)

# Simulate termination
loop.call_later(2.5, loop.stop)

try:
    loop.run_forever()
    import gc
    gc.collect()
finally:
    server.close()
    loop.close()
    if client is not None:
        client.close()

This case creates a transport, pauses it, and then fails to maintain any other reference to it. Without this PR it gets collected when gc.collect() is called (or earlier). With this PR, it only gets collected at the end.

So a weak reference does indeed seem appropriate.

We want to be able to detect if the application fails to keep track of the transports, so we cannot keep them alive by using a hard reference.

The application might be waiting for all transports to close, so we need to properly inform the server that this transport is done.

gvanrossum · 2024-02-19T02:19:43Z

Sorry for the delay -- this just made it to the top of my review queue. I hope to get to this soon!

gvanrossum

Not bad, some thoughts.

Lib/asyncio/base_events.py

Doc/whatsnew/3.13.rst

gvanrossum · 2024-02-19T04:17:48Z

Lib/asyncio/base_events.py

@@ -277,7 +277,8 @@ def __init__(self, loop, sockets, protocol_factory, ssl_context, backlog,
                 ssl_handshake_timeout, ssl_shutdown_timeout=None):
        self._loop = loop
        self._sockets = sockets
-        self._active_count = 0
+        # Weak references so abandoned transports can be detected


Suggested change

# Weak references so abandoned transports can be detected

# Weak references so abandoned transports can be ignored

The wording here was intentional. Weak references is to my knowledge the only way to detect abandoned objects. But it's not this code that does that detection, so I can understand the confusion. How about:

Weak references so we don't break Transport's ability to detect abandoned transports

?

Ah, you're thinking from the POV of the transport, whose __del__ must be called to "detect" (i.e., warn about) that it was abandoned. I was thinking from the POV of the loop in close_clients(), where we want to ignore (not encounter) transports that have been closed already.

I'll make it your choice which wording to use.

Lib/asyncio/selector_events.py

Misc/NEWS.d/next/Library/2024-01-22-15-50-58.gh-issue-113538.v2wrwg.rst

Lib/test/test_asyncio/test_server.py

gvanrossum · 2024-02-19T04:32:33Z

Lib/test/test_asyncio/test_server.py

+
+        srv.close()
+        srv.close_clients()
+        await asyncio.sleep(0.1) # FIXME: flush call_soon()?


Yeah, short sleeps are a nuisance in asyncio tests. Usually they can be replaced by a small number of sleep(0) calls though -- usually 1, rarely 2 or 3. sleep(0) is special and guarantees we go through the event loop exactly once.

A bunch of sleep(0) felt even more arbitrary. :/

What's your suggestion here? Keep it as is? Or something else?

A bunch of sleep(0) wastes less time (the asyncio tests are already too slow), and doesn't risk the test becoming flaky due to timing out on slow platforms (a problem we've struggled with). The right number of sleep(0) call takes up no more time than needed, and lets the machinery go through its motions in a deterministic matter. So I recommend sleep(0).

gvanrossum · 2024-02-19T04:33:42Z

Lib/test/test_asyncio/test_server.py

+            while s_wr.transport.get_write_buffer_size() == 0:
+                s_wr.write(b'a' * 65536)
+                await asyncio.sleep(0)
+            await asyncio.sleep(0.1) # FIXME: More socket buffer space magically appears?


Eh, what's going on with this one? Is it the same issue that can be fixed with a small number of sleep(0) calls, or different?

I don't know, to be honest. Might be a kernel or libc issue where it shuffles buffers around and/or allocates more space.

Without it, I cannot reliably get both the server and client to a state where buffers are full. Which is needed for the test to check the right thing.

IMO this PR is not finished until you get to the bottom of that. If you really need then to reach a specific state and there's no deterministic way to get there, consider manipulating internal APIs.

This was a pain, but I think I got it fixed. The core problem is that the kernel dynamically increases the socket send buffer size. And it takes about 20-30 ms to do so.

I think I've worked around that by specifying an explicit buffer size. That should turn off the dynamic resizing, if I remember things correctly.

Doc/library/asyncio-eventloop.rst

gvanrossum · 2024-02-19T04:41:19Z

Lib/test/test_asyncio/test_server.py

+
+        srv.close()
+        srv.abort_clients()
+        await asyncio.sleep(0.1) # FIXME: flush call_soon()?


This one looks similar to the first sleep.

Doc/library/asyncio-eventloop.rst

gvanrossum · 2024-02-19T16:21:35Z

Lib/asyncio/base_events.py

@@ -277,7 +277,8 @@ def __init__(self, loop, sockets, protocol_factory, ssl_context, backlog,
                 ssl_handshake_timeout, ssl_shutdown_timeout=None):
        self._loop = loop
        self._sockets = sockets
-        self._active_count = 0
+        # Weak references so abandoned transports can be detected


Ah, you're thinking from the POV of the transport, whose __del__ must be called to "detect" (i.e., warn about) that it was abandoned. I was thinking from the POV of the loop in close_clients(), where we want to ignore (not encounter) transports that have been closed already.

I'll make it your choice which wording to use.

Lib/asyncio/selector_events.py

gvanrossum · 2024-02-19T16:25:41Z

Lib/test/test_asyncio/test_server.py

+
+        srv.close()
+        srv.close_clients()
+        await asyncio.sleep(0.1) # FIXME: flush call_soon()?


A bunch of sleep(0) wastes less time (the asyncio tests are already too slow), and doesn't risk the test becoming flaky due to timing out on slow platforms (a problem we've struggled with). The right number of sleep(0) call takes up no more time than needed, and lets the machinery go through its motions in a deterministic matter. So I recommend sleep(0).

gvanrossum · 2024-02-19T16:29:08Z

Lib/test/test_asyncio/test_server.py

+            while s_wr.transport.get_write_buffer_size() == 0:
+                s_wr.write(b'a' * 65536)
+                await asyncio.sleep(0)
+            await asyncio.sleep(0.1) # FIXME: More socket buffer space magically appears?


IMO this PR is not finished until you get to the bottom of that. If you really need then to reach a specific state and there's no deterministic way to get there, consider manipulating internal APIs.

gvanrossum · 2024-02-19T16:32:09Z

Lib/asyncio/base_events.py

        # Note that 'transport' may already be missing from
        # self._clients if it has been garbage collected


IMO you don't need this comment either (it was explaining the condition).

One could be made clearar, and the other is probably superfluous.

Try to get the streams and the kernel in to a more deterministic state by specifying fixed buffering limits.

gvanrossum

Thanks for fixing all that. I have one remaining concern only.

gvanrossum · 2024-02-27T16:59:48Z

Lib/test/test_asyncio/test_server.py

+        while c_wr.transport.is_reading():
+            await asyncio.sleep(0)
+
+        # Get the writer in a waiting state by sending data until the
+        # kernel stops accepting more in to the send buffer
+        while s_wr.transport.get_write_buffer_size() == 0:
+            s_wr.write(b'a' * 4096)
+            await asyncio.sleep(0)


Can we put an upper bound on these while loops and fail the test when they loop too many times? (Possibly a for loop with a break is the best idiom to do this.)

I worry that something in the network stack might cause one or the other to loop forever, and I'd rather not waste the CPU time in CI over this.

How many iterations do you expect? Is it deterministic regardless of platform?

It's a bit sensitive to platform behaviour, but I believe I can get something that dynamically adapts

No possibly infinite loop. Instead ask the system how much buffer space it has and fill that.

gvanrossum

LG, but can you fix the merge conflict? The tests won't run until that's fixed.

CendioOssman · 2024-03-08T09:23:07Z

Sure. How do you prefer to handle that? A rebase against current main?

gvanrossum · 2024-03-08T16:34:55Z

No rebase please! Please merge the updated main branch into your branch, fix the conflicts as part of the merge, and then commit and push (not push -f). The extra commits will be squashed when I merge back into main.

…close

gvanrossum

Thanks! LGTM. I'll merge.

mhsmith · 2024-03-11T21:15:55Z

It looks like one of the new tests isn't always passing: #116423 (comment)

…hon#114432)" Reason: The new test doesn't always pass: python#116423 (comment) This reverts commit 1d0d49a.

gvanrossum · 2024-03-11T22:55:14Z

It looks like one of the new tests isn't always passing: #116423 (comment)

Thanks, I've created a new PR on the same issue that will revert it. Sorry for the inconvenience.

…#114432)" (#116632) Revert "gh-113538: Add asycio.Server.{close,abort}_clients (#114432)" Reason: The new test doesn't always pass: #116423 (comment) This reverts commit 1d0d49a.

gvanrossum · 2024-03-12T02:57:22Z

The PR has been reverted.

@CendioOssman could you look into the test failures? I probably should have reviewed the test more carefully, sorry.

CendioOssman · 2024-03-12T09:01:14Z

Of course. And once they are fixed, how do you want to proceed? Submit a new PR using the existing branch?

gvanrossum · 2024-03-12T14:48:21Z

Of course. And once they are fixed, how do you want to proceed? Submit a new PR using the existing branch?

That sounds fine to me. If you can't get it to work, just rename the branch locally and push that.

These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work. This is a do-over with a test fix for gh-114432, which was reverted.

…on#116784) These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work. This is a do-over with a test fix for pythongh-114432, which was reverted.

These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work.

…ort}_clients (python#114432)" (python#116632) Revert "pythongh-113538: Add asycio.Server.{close,abort}_clients (python#114432)" Reason: The new test doesn't always pass: python#116423 (comment) This reverts commit 1d0d49a.

…on#116784) These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work. This is a do-over with a test fix for pythongh-114432, which was reverted.

These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work.

…ort}_clients (python#114432)" (python#116632) Revert "pythongh-113538: Add asycio.Server.{close,abort}_clients (python#114432)" Reason: The new test doesn't always pass: python#116423 (comment) This reverts commit 1d0d49a.

…on#116784) These give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work. This is a do-over with a test fix for pythongh-114432, which was reverted.

CendioOssman added 2 commits January 22, 2024 11:30

Don't use internals for wait_closed() tests

9f0111e

pythongh-113538: Allow client connections to be closed

c25eabb

Give applications the option of more forcefully terminating client connections for asyncio servers. Useful when terminating a service and there is limited time to wait for clients to finish up their work.

CendioOssman requested review from 1st1, asvetlov, gvanrossum, kumaraditya303 and willingc as code owners January 22, 2024 14:56

bedevere-app bot added the awaiting review label Jan 22, 2024

bedevere-app bot mentioned this pull request Jan 22, 2024

Cannot cleanly shut down an asyncio based server #113538

Closed

CendioOssman added 2 commits January 22, 2024 16:02

Fix method references in new documentation

b7fa198

Add attribution

b3cd9c1

gvanrossum reviewed Jan 31, 2024

View reviewed changes

CendioOssman added 2 commits February 2, 2024 16:39

Only keep a weak client references

c78a927

We want to be able to detect if the application fails to keep track of the transports, so we cannot keep them alive by using a hard reference.

Inform server of fallback transport cleanup

1ec06da

The application might be waiting for all transports to close, so we need to properly inform the server that this transport is done.

CendioOssman requested a review from gvanrossum February 2, 2024 16:08

gvanrossum reviewed Feb 19, 2024

View reviewed changes

CendioOssman added 3 commits February 19, 2024 13:26

Clarify recomended order of methods

2790ddf

Fix typos

6a56a80

Use discard() instead of conditional remove()

6929888

gvanrossum reviewed Feb 19, 2024

View reviewed changes

CendioOssman added 3 commits February 27, 2024 12:52

Improve some comments

8316199

One could be made clearar, and the other is probably superfluous.

Do multiple sleeps to flush out callbacks

3e1705b

More deterministic abort_clients() test

6c078d6

Try to get the streams and the kernel in to a more deterministic state by specifying fixed buffering limits.

CendioOssman requested a review from gvanrossum February 27, 2024 14:45

gvanrossum reviewed Feb 27, 2024

View reviewed changes

Even more deterministic

1158151

No possibly infinite loop. Instead ask the system how much buffer space it has and fill that.

gvanrossum reviewed Mar 6, 2024

View reviewed changes

Merge branch 'main' of https://github.com/python/cpython into server_…

1065dda

…close

CendioOssman requested a review from gvanrossum March 11, 2024 15:15

gvanrossum approved these changes Mar 11, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Mar 11, 2024

gvanrossum merged commit 1d0d49a into python:main Mar 11, 2024
33 checks passed

bedevere-app bot removed the awaiting merge label Mar 11, 2024

gvanrossum added a commit to gvanrossum/cpython that referenced this pull request Mar 11, 2024

Revert "pythongh-113538: Add asycio.Server.{close,abort}_clients (pyt…

3c0ecf5

…hon#114432)" Reason: The new test doesn't always pass: python#116423 (comment) This reverts commit 1d0d49a.

CendioOssman mentioned this pull request Mar 14, 2024

gh-113538: Allow client connections to be closed #116784

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-113538: Allow client connections to be closed #114432

gh-113538: Allow client connections to be closed #114432

CendioOssman commented Jan 22, 2024 •

edited by github-actions bot

Loading

CendioOssman commented Jan 22, 2024

gvanrossum left a comment

CendioOssman commented Feb 2, 2024

gvanrossum commented Feb 19, 2024

gvanrossum left a comment

gvanrossum Feb 19, 2024

CendioOssman Feb 19, 2024

gvanrossum Feb 19, 2024

gvanrossum Feb 19, 2024

CendioOssman Feb 19, 2024

gvanrossum Feb 19, 2024

gvanrossum Feb 19, 2024

CendioOssman Feb 19, 2024

gvanrossum Feb 19, 2024

CendioOssman Feb 27, 2024

gvanrossum Feb 19, 2024

gvanrossum Feb 19, 2024

gvanrossum Feb 19, 2024

gvanrossum Feb 19, 2024

gvanrossum Feb 19, 2024

gvanrossum left a comment

gvanrossum Feb 27, 2024

CendioOssman Mar 5, 2024

gvanrossum left a comment

CendioOssman commented Mar 8, 2024

gvanrossum commented Mar 8, 2024

gvanrossum left a comment

mhsmith commented Mar 11, 2024

gvanrossum commented Mar 11, 2024

gvanrossum commented Mar 12, 2024

CendioOssman commented Mar 12, 2024

gvanrossum commented Mar 12, 2024 •

edited

Loading

	# Weak references so abandoned transports can be detected
	# Weak references so abandoned transports can be ignored

		# Note that 'transport' may already be missing from
		# self._clients if it has been garbage collected

gh-113538: Allow client connections to be closed #114432

gh-113538: Allow client connections to be closed #114432

Conversation

CendioOssman commented Jan 22, 2024 • edited by github-actions bot Loading

CendioOssman commented Jan 22, 2024

gvanrossum left a comment

Choose a reason for hiding this comment

CendioOssman commented Feb 2, 2024

gvanrossum commented Feb 19, 2024

gvanrossum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvanrossum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvanrossum left a comment

Choose a reason for hiding this comment

CendioOssman commented Mar 8, 2024

gvanrossum commented Mar 8, 2024

gvanrossum left a comment

Choose a reason for hiding this comment

mhsmith commented Mar 11, 2024

gvanrossum commented Mar 11, 2024

gvanrossum commented Mar 12, 2024

CendioOssman commented Mar 12, 2024

gvanrossum commented Mar 12, 2024 • edited Loading

CendioOssman commented Jan 22, 2024 •

edited by github-actions bot

Loading

gvanrossum commented Mar 12, 2024 •

edited

Loading