Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio.Server.wait_closed() appears to hang indefinitely in 3.12.0a7 #104344

Open
jnsnow opened this issue May 9, 2023 · 16 comments · May be fixed by christgau/wsdd#219
Open

asyncio.Server.wait_closed() appears to hang indefinitely in 3.12.0a7 #104344

jnsnow opened this issue May 9, 2023 · 16 comments · May be fixed by christgau/wsdd#219
Labels
docs Documentation in the Doc dir topic-asyncio

Comments

@jnsnow
Copy link

jnsnow commented May 9, 2023

Bug report

I have some code in the qemu.qmp package that appears to work correctly in Python 3.6 (RIP~) through to Python 3.11. I've started testing against 3.12 and I'm finding that waiting on asyncio.Server.wait_closed() is creating a deadlock/hang.

Here's a minimal-ish reproducer:

#!/usr/bin/env python3
import asyncio

async def cause_problems():
    accepted = asyncio.Event()

    async def _incoming(reader, writer):
        print("entered _incoming")
        print("signaling outer context; we accepted a client")
        accepted.set()
        print("left _incoming")

    print("starting server")
    server = await asyncio.start_unix_server(
        _incoming,
        path="problems.sock",
        backlog=1,
    )
    print("created server; waiting on a client")
    await accepted.wait()
    print("got a client")

    print("closing server")
    server.close()
    print("issued close()")
    await server.wait_closed()
    print("finished waiting on server to close")


if __name__ == '__main__':
    asyncio.run(cause_problems())

You can test a simple connection using whatever you'd like; socat works to trigger the accept code, e.g. from the same directory that you run the above code, try:

socat UNIX-CONNECT:problems.sock stdout

Here's what happens in python 3.11.3

jsnow@scv ~> python3.11 hang.py 
starting server
created server; waiting on a client
entered _incoming
signaling outer context; we accepted a client
left _incoming
got a client
closing server
issued close()
finished waiting on server to close

And here's 3.12.0a7:

jsnow@scv ~> python3.12 hang.py
starting server
created server; waiting on a client
entered _incoming
signaling outer context; we accepted a client
left _incoming
got a client
closing server
issued close()

When I issue ^C, we get this traceback:

^CTraceback (most recent call last):
  File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/base_events.py", line 664, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/jsnow/hang2.py", line 26, in cause_problems
    await server.wait_closed()
  File "/usr/lib64/python3.12/asyncio/base_events.py", line 384, in wait_closed
    await waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jsnow/hang2.py", line 31, in <module>
    asyncio.run(cause_problems())
  File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/runners.py", line 123, in run
    raise KeyboardInterrupt()
KeyboardInterrupt

Your environment

  • Python 3.12.0a7 (as packaged on Fedora 37)
  • Linux 6.2.12-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 20 23:38:29 UTC 2023
  • Tested on CPython 3.6 through 3.12 as available on Fedora 37. Works on 3.6, 3.7, 3.8, 3.9, 3.10 and 3.11 but fails on the 3.12 alpha package.
@jnsnow jnsnow added the type-bug An unexpected behavior, bug, or error label May 9, 2023
@github-project-automation github-project-automation bot moved this to Todo in asyncio May 10, 2023
@kumaraditya303
Copy link
Contributor

Why aren't you closing the writer in _incoming? That fixes the problem for me and is bug in your code.

@kumaraditya303 kumaraditya303 added the pending The issue will be closed if no feedback is provided label May 10, 2023
@jnsnow
Copy link
Author

jnsnow commented May 10, 2023

For python 3.6 through 3.11 you don't have to close the writer to stop the server: you can accept a single peer, stop the server, and continue communicating with the peer.

@bonzini
Copy link

bonzini commented May 10, 2023

So something like this?

#!/usr/bin/env python3
import asyncio

async def cause_problems():
    accepted = asyncio.Queue()

    async def _incoming(reader, writer):
        print("entered _incoming")
        print("signaling outer context; we accepted a client")
        await accepted.put((reader, writer))
        print("left _incoming")

    print("starting server")
    server = await asyncio.start_unix_server(
        _incoming,
        path="problems.sock",
        backlog=1,
    )
    print("created server; waiting on a client")
    reader, writer = await accepted.get()
    print("got a client")

    print("closing server")
    server.close()
    print("issued close()")
    await server.wait_closed()
    writer.write(b"finished waiting on server to close\n")


if __name__ == '__main__':
    asyncio.run(cause_problems())

Indeed the docs don't say that Server.wait_closed() waits for all writers (and I don't see why it would).

@kumaraditya303
Copy link
Contributor

cc @gvanrossum bisect points to 5d09d11

@bonzini
Copy link

bonzini commented May 10, 2023

Indeed the docs don't say that Server.wait_closed() waits for all writers (and I don't see why it would).

Looks like this was done on purpose for GH-79033, though.

@gvanrossum
Copy link
Member

Looks like this was done on purpose for GH-79033, though.

So I think indeed that the better semantics are for wait_closed() to wait until all requests have been handled, i.e. "draining" the request queue, while not accepting further requests. That makes it useful, as opposed to it being a no-op as it was before.

I'm a little surprised that the request handler (_incoming in @jnsnow's example) is required to close the connection, but it sounds like that is not something we should change (given the possibility of the reader and writer surviving the handler, as in @bonzini's example).

In a production version of @jnsnow's example, wouldn't it be leaking sockets if the handler never closed its writer?

@gvanrossum
Copy link
Member

(Unless there's a better argument why GH-79033 is wrong, we should just document things better.)

@jnsnow
Copy link
Author

jnsnow commented May 10, 2023

In the production code, we save the reader/writer for communicating with the peer. The intent of the code as it stands in the actual library is: "Accept precisely one peer, stop accepting new peers, then communicate via reader/writer."

(My example in the report was just the shortest reproducer I could distill.)

Presumably we don't leak a socket because we close the reader/writer on disconnect.

@bonzini
Copy link

bonzini commented May 10, 2023

I agree that using wait_closed() is pointless both in @jnsnow's minimal example and in mine, and it was working just because the code was incorrect.

On the other hand the usage is exactly what you would expect when reading the documentation for wait_closed(), which says nothing about waiting for clients: it's just "Wait until the close() method completes".

What about deprecating Server.wait_closed() and putting the new implementation in a new method, like Server.wait_clients()?

@gvanrossum
Copy link
Member

Maybe, but looking at the implementation of wait_closed(), the intention was always to wait for the last connection to be closed. The documentation wasn't super clear about this, but that is clearly what it was meant to do, so I think it's reasonable to fix the bug without renaming the code. We should also clarify the docs (I wouldn't even mind adding a "version-changed: 3.12" note reminding users that this was broken until 3.11). The examples were just buggy.

@bonzini
Copy link

bonzini commented May 11, 2023

looking at the implementation of wait_closed(), the intention was always to wait for the last connection to be closed [...] The examples were just buggy.

I understand that, but as users of Python are we supposed to read the documentation or the implementation?

The documentation for close() is clear that the listening socket is, at least potentially, closed asynchronously. It says that connections are left open, and without a mention of connections in the wait_closed() documentation, how could anyone guess that intention?

IMHO the only possible reading of the documentation is "you need wait_closed() before calling asyncio.start_server()/start_unix_server() again for the same Unix socket path or listening port. Otherwise, the underlying file descriptor might still be open and creating the server will fail".

It was indeed a bug that wait_closed() behaved differently if called before vs. after close(), but 5d09d11 fixed it in the wrong direction.

@gvanrossum
Copy link
Member

No, I think you misread the documentation for close(). The documentation you linked to states clearly that the listening sockets are closed. You misinterpreted the phrase "The server is closed asynchronously" as referring to the process of closing the listening socket -- but this is always done synchronously. It is only the connection sockets that are left open, and wait_closed() is intended to wait for those to be closed.

@jnsnow
Copy link
Author

jnsnow commented May 11, 2023

I see. OK - that is a bit surprising given how the documentation reads (and what has historically happened) - but if that's the intent, then:

  1. Some updates to the doc phrasing would go a long way, and
  2. Adding a pointer to Counter-intuitive behavior of Server.close() / wait_closed() #79033 in the asyncio section of the changelog for 3.12 would be appreciated.

...and I'll just update the lib not to do that anymore.

(...While I have your attention, could I ask for someone to take a peek at #90871 ? It's dubiously related to my efforts to try and accept precisely and only one client using asyncio. Sorry for the hijack, I'm new here. 😩)

@gvanrossum
Copy link
Member

gvanrossum commented May 11, 2023

Yes, we'll update the docs and "what's new in 3.12". If you want it to happen soon, you could volunteer a docs PR.

...and I'll just update the lib not to do that anymore.

Yes, that should be 100% backwards compatible. :-)

PS. I think there is a scenario possible where in 3.11 or before wait_closed() works as intended, but you have to call await svr.wait_closed() before calling svr.close(). Maybe call the latter from a request handler. I haven't tested this, but I think in that case svr.wait_closed() will complete once the last connection has its transport closed -- each time a connection is opened, _attach() increments active_count, and each time one is closed, _detach() decrements it again. If the server is closed when the count is decremented to zero, _detach() will wake up all the waiters (there may be several, if multiple tasks call wait_closed()).

(Oh, and it looks like the waiters are woken up using waiter.set_result(waiter) which creates a reference cycle. This seems pointless, it should just set the result to None. Also, you can wait for the same future multiple times, so I'm not sure why this is using a list of multiple futures. I am guessing that we weren't so clear on that idea at the time this code was written.)

@gvanrossum gvanrossum self-assigned this May 11, 2023
@gvanrossum
Copy link
Member

gvanrossum commented May 11, 2023

(Me)

Also, you can wait for the same future multiple times, so I'm not sure why this is using a list of multiple futures. I am guessing that we weren't so clear on that idea at the time this code was written.

To the contrary, I forgot about cancel semantics. This was just explained on Discourse.

@kumaraditya303 kumaraditya303 added docs Documentation in the Doc dir and removed type-bug An unexpected behavior, bug, or error pending The issue will be closed if no feedback is provided labels May 13, 2023
chrysn added a commit to chrysn/aiocoap that referenced this issue Sep 23, 2023
Python 3.12 has changed its behavior[1] such that wait_closed() doesn't
return before all open connections are closed -- and we used to block on
that before we even attempted to shut down the individual connections.

[1]: python/cpython#104344
stsquad pushed a commit to qemu/qemu that referenced this issue Oct 16, 2023
This patch is a backport from
https://gitlab.com/qemu-project/python-qemu-qmp/-/commit/e03a3334b6a477beb09b293708632f2c06fe9f61

According to Guido in python/cpython#104344 ,
this call was never meant to wait for the server to shut down - that is
handled synchronously - but instead, this waits for all connections to
close. Or, it would have, if it wasn't broken since it was introduced.

3.12 fixes the bug, which now causes a hang in our code. The fix is just
to remove the wait.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-id: 20231006195243.3131140-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
stsquad pushed a commit to qemu/qemu that referenced this issue Nov 1, 2023
This patch is a backport from
https://gitlab.com/qemu-project/python-qemu-qmp/-/commit/e03a3334b6a477beb09b293708632f2c06fe9f61

According to Guido in python/cpython#104344 ,
this call was never meant to wait for the server to shut down - that is
handled synchronously - but instead, this waits for all connections to
close. Or, it would have, if it wasn't broken since it was introduced.

3.12 fixes the bug, which now causes a hang in our code. The fix is just
to remove the wait.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-id: 20231006195243.3131140-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit acf8738)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
VisualEhrmanntraut pushed a commit to ChefKissInc/QEMUAppleSilicon that referenced this issue Nov 22, 2023
This patch is a backport from
https://gitlab.com/qemu-project/python-qemu-qmp/-/commit/e03a3334b6a477beb09b293708632f2c06fe9f61

According to Guido in python/cpython#104344 ,
this call was never meant to wait for the server to shut down - that is
handled synchronously - but instead, this waits for all connections to
close. Or, it would have, if it wasn't broken since it was introduced.

3.12 fixes the bug, which now causes a hang in our code. The fix is just
to remove the wait.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-id: 20231006195243.3131140-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit acf8738)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
rumpelsepp added a commit to Fraunhofer-AISEC/gallia that referenced this issue Dec 8, 2023
There is a Python bug with .wait_closed(); just remove this invocation.

* python/cpython#104344
* python/cpython#109538
rumpelsepp added a commit to Fraunhofer-AISEC/gallia that referenced this issue Dec 8, 2023
There is a Python bug with .wait_closed(); just remove this invocation.

* python/cpython#104344
* python/cpython#109538
SomberNight added a commit to spesmilo/electrum that referenced this issue Dec 24, 2023
Semantics for server.wait_closed changed in 3.12, and it was always
a no-op in our usage in prior versions.
see python/cpython#104344
@willingc
Copy link
Contributor

Maybe, but looking at the implementation of wait_closed(), the intention was always to wait for the last connection to be closed. The documentation wasn't super clear about this, but that is clearly what it was meant to do, so I think it's reasonable to fix the bug without renaming the code. We should also clarify the docs (I wouldn't even mind adding a "version-changed: 3.12" note reminding users that this was broken until 3.11). The examples were just buggy.

Next step for someone interested in docs to make Guido's suggested change.

maurerle added a commit to maurerle/mango that referenced this issue Aug 30, 2024
This is needed for wait_closed to terminate starting from 3.12
Before, wait_closed was a noop and did not check for all clients to be closed.

see python/cpython#104344
maurerle added a commit to maurerle/mango that referenced this issue Aug 31, 2024
This is needed for wait_closed to terminate starting from 3.12
Before, wait_closed was a noop and did not check for all clients to be closed.

see python/cpython#104344
aleasto added a commit to aleasto/wsdd that referenced this issue Oct 11, 2024
This effectively re-implements asyncio.Server::close_clients which
is only available in python3.13.

In python3.12 asyncio.Server::wait_closed will hang unless these
sockets are closed: python/cpython#104344
aleasto added a commit to aleasto/wsdd that referenced this issue Oct 11, 2024
This effectively re-implements asyncio.Server::close_clients which
is only available in python3.13.

In python3.12 asyncio.Server::wait_closed will hang unless these
sockets are closed: python/cpython#104344
aleasto added a commit to aleasto/wsdd that referenced this issue Oct 14, 2024
This effectively re-implements asyncio.Server::close_clients which
is only available in python3.13.

In python3.12 asyncio.Server::wait_closed will hang unless these
sockets are closed: python/cpython#104344
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir topic-asyncio
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

6 participants