Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in asyncio.Server.close #109564

Open
bmerry opened this issue Sep 19, 2023 · 11 comments
Open

Race condition in asyncio.Server.close #109564

bmerry opened this issue Sep 19, 2023 · 11 comments
Labels
stdlib Python modules in the Lib dir topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@bmerry
Copy link
Contributor

bmerry commented Sep 19, 2023

Bug report

Bug description:

There is a race condition when closing an asyncio.Server. Accepting a connection takes several iterations of the event loop to complete, and a call to Server.close in the middle will lead to exceptions.

First, some demonstration code. Run both of these concurrently, running the server with python -X dev:

Client:

#!/usr/bin/env python3

import socket

while True:
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        try:
            sock.connect(("127.0.0.1", 4445))
        except IOError:
            pass

Server:

#!/usr/bin/env python3

import asyncio

def cb(reader, writer):
    writer.close()

def _my_attach(self):
    if self._sockets is None:
        print(self._sockets, self._active_count, flush=True)
    _orig_attach(self)

async def main():
    while True:
        server = await asyncio.start_server(cb, host="127.0.0.1", port=4445)
        await asyncio.sleep(0)
        server.close()
        await server.wait_closed()

asyncio.run(main())

It will repeatly output errors like this:

Error on transport creation for incoming connection
handle_traceback: Handle created at (most recent call last):
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/usr/lib/python3.12/asyncio/base_events.py", line 651, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.12/asyncio/base_events.py", line 618, in run_forever
    self._run_once()
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1943, in _run_once
    handle._run()
  File "/usr/lib/python3.12/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/lib/python3.12/asyncio/selector_events.py", line 211, in _accept_connection
    self.create_task(accept)
  File "/usr/lib/python3.12/asyncio/base_events.py", line 436, in create_task
    task = tasks.Task(coro, loop=self, name=name, context=context)
protocol: <asyncio.streams.StreamReaderProtocol object at 0x7f7bb340b350>
Traceback (most recent call last):
  File "/usr/lib/python3.12/asyncio/selector_events.py", line 230, in _accept_connection2
    transport = self._make_socket_transport(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/selector_events.py", line 72, in _make_socket_transport
    return _SelectorSocketTransport(self, sock, protocol, waiter,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/selector_events.py", line 936, in __init__
    super().__init__(loop, sock, protocol, extra, server)
  File "/usr/lib/python3.12/asyncio/selector_events.py", line 800, in __init__
    self._server._attach()
  File "/usr/lib/python3.12/asyncio/base_events.py", line 294, in _attach
    assert self._sockets is not None
AssertionError

Additionally, stopping the server (with Ctrl-C) prints lots of ResourceWarnings about unclosed transports and sockets.

When a connection arrives on a socket (with selector_events), it goes through the following steps before reaching Server._attach:

  1. The socket is accepted: here
  2. _accept_connection2 is scheduled (will only run on next event loop iteration): here
  3. The transport is created: here
  4. Server._attach is called: here

It's possible for server.close() to be executed between steps 2 and 3, in which case Server._sockets has already been set to None, and the assertion is triggered. The client will presumably experience this as a connection that either closes immediately (if the garbage collector closes the socket) or is unresponsive.

This particular bug might be fixable by creating the transport synchronously in _accept_connection. I think there may be further potential race conditions because connection_made is also called asynchronously, which means it is possible for it to be called after Server.close has already returned. That might not break anything in asyncio itself but it will cause software that does something like the following to hang in 3.12 (related to #79033 / #104344):

server.close()
for (reader, writer) in my_connections:
    writer.close()
    await writer.closed()
await server.wait_closed()

CPython versions tested on:

3.11, 3.12

Operating systems tested on:

Linux

Linked PRs

@bmerry bmerry added the type-bug An unexpected behavior, bug, or error label Sep 19, 2023
@github-project-automation github-project-automation bot moved this to Todo in asyncio Sep 19, 2023
@Yaro1
Copy link

Yaro1 commented Nov 29, 2023

Hi, I would like to take that issue!

@gvanrossum
Copy link
Member

@Yaro1 Okay, first step, can you reproduce the issue locally using the main branch?

@Yaro1
Copy link

Yaro1 commented Dec 1, 2023

@gvanrossum Yes I reproduced

@gvanrossum
Copy link
Member

Okay go ahead and work on a fix then. Come back here if you are stuck.

@Yaro1
Copy link

Yaro1 commented Dec 2, 2023

okay, thank you

@Yaro1
Copy link

Yaro1 commented Dec 9, 2023

Well, I'm not sure about creating the transport synchronously in _accept_connection.

I tried to understand what is the reason of dividing accepting socket and creating a transport to _accept_connection and _accept_connection2.

I found commit which include creating _accept_connection2 2934262

But there is a problem - I guess there is not enough information in commit message:

Issue https://github.com/python/cpython/pull/23333: Fix BaseSelectorEventLoop._accept_connection(). Close the
  transport on error. In debug mode, log errors using call_exception_handler()

And the link for now indicates to another issue.

Could you help me please with an understanding of purpose _accept_connection and _accept_connection2? @gvanrossum

@gvanrossum
Copy link
Member

I'm sorry, I recommend that you try a simpler issue.

@Yaro1
Copy link

Yaro1 commented Dec 10, 2023

okay I will take simpler

@ordinary-jamie
Copy link
Contributor

ordinary-jamie commented Nov 11, 2024

Hello @bmerry 👋 I was able to reproduce this issue on latest;

repro code
import asyncio
import sys
import socket

async def server():
    server = await asyncio.start_server(lambda *_: None, host="127.0.0.1", port=4445)
    await asyncio.sleep(0)
    server.close()
    await server.wait_closed()


def client():

    while True:
        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
            try:
                sock.connect(("127.0.0.1", 4445))
            except IOError:
                pass
            finally:
                sock.close()


if __name__ == '__main__':
    if len(sys.argv) > 1 and sys.argv[1] == 'server':
        asyncio.run(server())
    else:
        client()

As noted in this issue, the reader callback for the listening socket (_accept_connection) and the coroutine scheduled for the accepted connection are separate (_accept_connection2). This means the socket server can be closed in the period between the _accept_connection2 coroutine being scheduled and the coroutine actually running.

However, I don't believe a code change is needed to address this, since the transport will be closed on error and throw a CancelledError into _accept_connection2; for this edge-case I would considered that sufficiently handled. And, the assertions will only be logged, and that should be turned off by clearing PYTHONASYNCIODEBUG or with python -O. What do you think?

Additionally, stopping the server (with Ctrl-C) prints lots of ResourceWarnings about unclosed transports and sockets.

Yup, I see 3 separate clean-up issues:

  1. Unclosed socket: The transport (and the accepted socket) only gets closed if the transport is successfully created. If the transport fails to create there is no code path in BaseSelectorEventLoop._accept_connection2 to close the socket. I'm a little hesitant to add this however, as I'm not sure if this will cause a EBADF if the socket gets closed twice in the happy path; what do you think, @gvanrossum? Would something like if not transport: conn.close() be necessary?
  2. Unclosed transport on _SelectorTransport.__del__: I don't believe it is necessary to handle these unclosed transport warnings; I think this is working as intended.
  3. Multiple calls to Server._wakeup: The asyncio.Server has a missing None check for when it wants to set its waiters; this happens in this edge-case because there is an explicit Server.close call in addition to the transports detaching from the server, resulting in multiple calls to Server._wakeup; I've made PR gh-109564: Add None check for asyncio.Server._wakeup #126660 to address this

@picnixz picnixz added the stdlib Python modules in the Lib dir label Nov 11, 2024
@asvetlov
Copy link
Contributor

  1. Unclosed socket: The transport (and the accepted socket) only gets closed if the transport is successfully created.

Multiple sock.close() calls don't raise an exception; conn.close() could be called explicitly in both except (SystemExit, KeyboardInterrupt) and except BaseException branches safely.

@ordinary-jamie
Copy link
Contributor

Multiple sock.close() calls don't raise an exception; conn.close() could be called explicitly in both except (SystemExit, KeyboardInterrupt) and except BaseException branches safely.

Thanks @asvetlov; should I put this change in a different PR to #126660, since the None check and the socket close seem unrelated to each other?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

7 participants