Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime finalization assumes all other threads have exited. #80657

Open
ericsnowcurrently opened this issue Mar 29, 2019 · 14 comments
Open

Runtime finalization assumes all other threads have exited. #80657

ericsnowcurrently opened this issue Mar 29, 2019 · 14 comments
Labels
3.8 (EOL) end of life 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-subinterpreters type-bug An unexpected behavior, bug, or error

Comments

@ericsnowcurrently
Copy link
Member

BPO 36476
Nosy @tim-one, @ncoghlan, @pitrou, @vstinner, @ericsnowcurrently, @phmc, @pablogsal, @nanjekyejoannah

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-03-29.20:38:26.289>
labels = ['expert-subinterpreters', 'interpreter-core', 'type-bug', '3.8', '3.9']
title = 'Runtime finalization assumes all other threads have exited.'
updated_at = <Date 2020-06-03.16:42:03.692>
user = 'https://github.com/ericsnowcurrently'

bugs.python.org fields:

activity = <Date 2020-06-03.16:42:03.692>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core', 'Subinterpreters']
creation = <Date 2019-03-29.20:38:26.289>
creator = 'eric.snow'
dependencies = []
files = []
hgrepos = []
issue_num = 36476
keywords = []
message_count = 10.0
messages = ['339143', '339145', '358745', '358747', '358749', '358750', '358751', '358782', '359101', '359103']
nosy_count = 8.0
nosy_names = ['tim.peters', 'ncoghlan', 'pitrou', 'vstinner', 'eric.snow', 'pconnell', 'pablogsal', 'nanjekyejoannah']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue36476'
versions = ['Python 3.8', 'Python 3.9']

@ericsnowcurrently
Copy link
Member Author

Among the first 3 things that happen in Py_FinalizeEx() are, in order:

  1. wait for all non-daemon threads (of the main interpreter) to finish
  2. call registered atexit funcs
  3. mark the runtime as finalizing

At that point the only remaining Python threads are:

  • the main thread (where finalization is happening)
  • daemon threads
  • non-daemon threads created in atexit functions
  • any threads belonging to subinterpreters

The next time any of those threads (aside from main) acquire the GIL, we expect that they will exit via a call to PyThread_exit_thread() (caveat: issue bpo-36475). However, we have no guarantee on when that will happen, if ever. Such lingering threads can cause problems, including crashes and deadlock (see issue bpo-36469).

I don't know what else we can do, beyond what we're already doing. Any ideas?

@ericsnowcurrently ericsnowcurrently added 3.7 (EOL) end of life 3.8 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Mar 29, 2019
@ericsnowcurrently
Copy link
Member Author

FYI, I've opened bpo-36477 to deal with the subinterpreters case.

@ericsnowcurrently
Copy link
Member Author

Adding to the list:

  • any OS threads created by an extension module or embedding application

@ericsnowcurrently ericsnowcurrently added 3.9 only security fixes and removed 3.7 (EOL) end of life labels Dec 20, 2019
@ericsnowcurrently
Copy link
Member Author

Problems with lingering threads during/after runtime finalization continue to be a problem. I'm going to use this issue as the focal point for efforts to resolve this.

Related issues:

  • bpo-36479 "Exit threads when interpreter is finalizing rather than runtime."
  • bpo-24770 "Py_Finalize() doesn't stop daemon threads"
  • bpo-23592 "SIGSEGV on interpreter shutdown, with daemon threads running wild"
  • bpo-37127 "Handling pending calls during runtime finalization may cause problems."
  • bpo-33608 "Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed."
  • bpo-36818 "Add PyInterpreterState.runtime."
  • bpo-36724 "Clear _PyRuntime at exit"
  • bpo-14073 "allow per-thread atexit()"
  • bpo-1596321 "KeyError at exit after 'import threading' in other thread"
  • bpo-37266 "Daemon threads must be forbidden in subinterpreters"
  • bpo-31517 "MainThread association logic is fragile"

@ericsnowcurrently
Copy link
Member Author

Analysis by @pconnell:

tl;dr daemon threads and external C-API access during/after runtime finalization are causing crashes.

@ericsnowcurrently
Copy link
Member Author

To put it another way:

(from bpo-33608#msg358748)

The docs [1] aren't super clear about it, but there are some fundamental
assumptions we make about runtime finalization:

  • no use of the C-API while Py_FinalizeEx() is executing (except for a
    few helpers like Py_Initialized)
  • only a small portion of the C-API is available afterward (at least
    until Py_Initialize() is run)

I guess the real question is what to do about this?

[1] https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx

Adding to that list:

  • no other Python threads are running once we start finalizing the runtime (not far into Py_FinalizeEx())

@ericsnowcurrently
Copy link
Member Author

So I see 3 things to address here:

  1. Python daemon threads
  2. Python threads created in atexit handlers
  3. non-Python threads accessing the C-API

Possible solutions (starting point for discussion):

  1. stop them at the point we stop waiting for non-daemon threads (at the beginning of finalization)
  2. disallow them? do one more pass of wait-for-threads?
  3. cause all (external) attempts to access the C-API to fail once finalization begins

Regarding daemon threads, the docs already say "Daemon threads are abruptly stopped at shutdown." [1] So let's force them to stop. Can we do that? If we *can* simply kill the threads, can we do so without leaking resources? Regardless, the mechanism we currently use (check for finalizing each(?) time through the eval loop) mostly works fine. The problem is when C code called from Python in a daemon thread blocks long enough that it makes C-API calls (or even the eval loop) *after* we've started cleaning up the runtime state. So if there was a way to interrupt that blocking code, that would probably be good enough.

The other two possible solutions are, I suppose, a bit more drastic. What are the alternatives?

[1] https://docs.python.org/3/library/threading.html#thread-objects

@pitrou
Copy link
Member

pitrou commented Dec 21, 2019

  1. Python daemon threads

I think the answer is to document a bit more clearly that they can pose all kinds of problems. Perhaps we could even display a visible warning when people create daemon threads.

  1. Python threads created in atexit handlers

We could run the "join non-daemon threads" routine a *second time* after atexit handlers have been called. It probably can't hurt (unless people do silly things?).

  1. non-Python threads accessing the C-API

This one I don't know how to handle. By construction, a non-Python thread can do anything it wants, and we cannot add guards against this at the beginning of each C API function. I think that when someone calls the C API, we're clearly in the realm of "consenting adults".

@ncoghlan
Copy link
Contributor

Perhaps we need a threading.throw() API, similar to the one we have for generators and coroutines?

If we had that, then Py_FinalizeEx() could gain a few new features:

  • throw SystemExit into all daemon threads and then give them a chance to terminate before calling any atexit handlers (printing a warning if some of the threads don't exit)
  • throw SystemExit into all daemon and non-daemon threads after running atexit handlers (printing a warning if any such threads exist at all, along with another warning if some of the threads don't exit)

Adding that would require an enhancement to the PendingCall machinery, though, since most pending calls are only processed in the main thread (there's no way to route them to specific child threads).

A simpler alternative would be to have an atomic "terminate_threads" counter in the ceval runtime state that was incremented to 1 to request that SystemExit be raised in daemon threads, and then to 2 to request that SystemExit be raised in all still running threads. When a thread received that request to exit, it would set a new flag in the thread state to indicate it was terminating, and then raise SystemExit. (The thread running Py_FinalizeEx would set that flag in advance so it wasn't affected, and other threads would use it to ensure they only raised SystemExit once). The runtime cost of this would just be another _Py_atomic_load_relaxed call in the eval_breaker branch. (Probably inside make_pending_calls, so it gets triggered both by the eval_breaker logic, and by explicit calls to Py_MakePendingCalls).

@ncoghlan
Copy link
Contributor

Thinking about that idea further, I don't think that change would help much, since the relevant operations should already be checking for thread termination when they attempt to reacquire the GIL.

That means what we're missing is:

  1. When daemon threads still exist after the non-daemon threads terminate, deliberately giving them additional time to run (and hence terminate)
  2. Explicitly attempting to kick daemon threads out of blocking system calls by sending them signals to provoke EINTR (I have no idea if there's a windows equivalent for this, but we should be able to use pthread_kill on POSIX systems. However, choosing *which* wakeup signal to send could be fraught with compatibility problems)

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@ezio-melotti ezio-melotti moved this to Todo in Subinterpreters Apr 15, 2022
@gpshead
Copy link
Member

gpshead commented May 26, 2023

related issue: #104690 "Threads started in exit handler are still running after their thread states are destroyed"
and a related PR: #104826 "Disallow thread creation and fork at interpreter finalization"

@gpshead
Copy link
Member

gpshead commented May 26, 2023

If we had that, then Py_FinalizeEx() could gain a few new features:

  • throw SystemExit into all daemon threads and then give them a chance to terminate before calling any atexit handlers (printing a warning if some of the threads don't exit)

related to the above linked issues and ongoing thoughts about how horrible daemon threads are and wishing they didn't exist... I was pondering an if we could inject the equivalent of an uncatchable exception (meaning even bare except: would not handle it) into all daemon and non-python threads via their tstate so that their code would uncerimoniously exit no matter what. That's as close to a SIGKILL for a thread can we can cleanly get. Perhaps this only makes sense to do after Nick's idea of a SystemExit in the threads and waiting a little while.

@ericsnowcurrently
Copy link
Member Author

You mean in addition to the trick we do in ceval_gil.c, with tstate_must_exit() and PyThread_exit_thread()?

@gpshead
Copy link
Member

gpshead commented May 28, 2023

You mean in addition to the trick we do in ceval_gil.c, with tstate_must_exit() and PyThread_exit_thread()?

PyThread_exit_thread() is being deprecated by #28525 as it is impossible to safely end a thread abruptly. So take_gil will turn into an intentional hang in that situation.

I was thinking of if we could do something before we get to the point of destroying the thread states - the only safe exit from a thread is for them to actually return all the way up their call stack on their own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 (EOL) end of life 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-subinterpreters type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

5 participants