Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProactorEventLoop bug reproduced: UDP server stops accepting datagrams from any clients after a single client disconnects #7972

Closed
kozlovsky opened this issue Apr 17, 2024 · 1 comment · Fixed by #7976

Comments

@kozlovsky
Copy link
Contributor

It is possible to have the following situation of Windows with ProactorEventLoop easily:

  1. An UDP server is running and awaits datagrams from clients.
  2. An UDP client sends a datagram and drops before the server can respond.
  3. The UDP server receives the datagram and responds to the disconnected client. More specifically, the client machine is still reachable, but the client application no longer listens on the port.
  4. After that, the UDP server cannot receive datagrams from ANY clients. The server stops executing callbacks (like connection_made, datagram_received, error_received, and connection_lost) until restarted.

The following is an example that reproduces the bug:

server.py

import asyncio


class EchoServerProtocol(asyncio.DatagramProtocol):
    def connection_made(self, transport):
        self.transport = transport

    def datagram_received(self, data, addr):
        message = data.decode()
        print('Received %r from %s' % (message, addr))

        # print('Is server transport closing due to client abort?', self.transport.is_closing())

        print('Sending %r to %s' % (message, addr))
        self.transport.sendto(data, addr)

    def error_received(self, exc):
        print(f"Exception thrown caught by 'error_received': {exc}")


async def main():
    print("Starting UDP server")
    loop = asyncio.get_running_loop()

    # loop.set_debug(True)  # This does not affect the occurrence of this issue

    transport, protocol = await loop.create_datagram_endpoint(
        lambda: EchoServerProtocol(), local_addr=('127.0.0.1', 9999))

    try:
        await asyncio.sleep(3600)  # Serve for 1 hour.
    finally:
        transport.close()


asyncio.run(main())

client.py

import asyncio


class EchoClientProtocol:
    def __init__(self, message, on_con_lost):
        self.message = message
        self.on_con_lost = on_con_lost
        self.transport = None

    def connection_made(self, transport: asyncio.DatagramTransport):
        self.transport = transport

        print('Sending:', self.message)
        self.transport.sendto(self.message.encode())

        # Try to force the server to raise the error 1234 "No service is operating
        # at the destination network endpoint on the remote system."
        self.transport.abort()

    def datagram_received(self, data, addr):
        print("Received:", data)

        print("Close the socket")
        self.transport.close()

    def error_received(self, exc):
        print('Error received:', exc)

    def connection_lost(self, exc):
        print("Connection closed")
        self.on_con_lost.set_result(True)


async def main():
    loop = asyncio.get_running_loop()
    # loop.set_debug(True)  # error disappearing if uncommented

    on_con_lost = loop.create_future()
    message = "Hello World!"

    transport, protocol = await loop.create_datagram_endpoint(
        lambda: EchoClientProtocol(message, on_con_lost),
        remote_addr=('127.0.0.1', 9999))

    try:
        await on_con_lost
    finally:
        transport.close()


asyncio.run(main())

You can run "server.py" and check that after receiving the message from the first client, all other clients are ignored, and the server hangs.

See CPython issues #88906, #91227

The issue was fixed three weeks ago in the main Python branch and will be included in the subsequent releases for Python 3.11 and 3.12.

I backported the fix to Python 3.8 and above; I am preparing the PR now.

@kozlovsky
Copy link
Contributor Author

The reason for the bug:

  • ProactorEventLoop uses WSASendTo and WSARecvFrom API calls to send a UDP datagram.
  • After WSASendTo sends a packet to an unreachable client, it gets an ICMP unreachable message.
  • The next server call to WSARecvFrom receives OSError "[WinError 1234] No service is operating at the destination network endpoint on the remote system".
  • The error received relates to the previous sending attempt, but Python's IocpProactor falsely believes the error is related to reading the new datagram. After that, the server logic breaks, and it stops executing further callbacks.

kozlovsky added a commit to kozlovsky/tribler that referenced this issue Apr 17, 2024
kozlovsky added a commit to kozlovsky/tribler that referenced this issue Apr 18, 2024
kozlovsky added a commit to kozlovsky/tribler that referenced this issue Apr 18, 2024
kozlovsky added a commit that referenced this issue Apr 18, 2024
Fixes #7972: UDP server stops accepting datagrams from any clients after a single client disconnects
kozlovsky added a commit that referenced this issue Apr 18, 2024
…ny clients after a single client disconnects

(cherry picked from commit 449b864)
kozlovsky added a commit to kozlovsky/tribler that referenced this issue Apr 18, 2024
… from any clients after a single client disconnects

(cherry picked from commit 449b864)
kozlovsky added a commit to kozlovsky/tribler that referenced this issue Apr 18, 2024
… from any clients after a single client disconnects

(cherry picked from commit 449b864)
kozlovsky added a commit to kozlovsky/tribler that referenced this issue Apr 18, 2024
… from any clients after a single client disconnects

(cherry picked from commit 449b864)
kozlovsky added a commit to kozlovsky/tribler that referenced this issue Apr 19, 2024
… from any clients after a single client disconnects

(cherry picked from commit 449b864)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
1 participant