Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.12 regression: "can't create new thread at interpreter shutdown" from non-daemon threads or atexit handlers #113964

Open
zenbones opened this issue Jan 11, 2024 · 22 comments
Assignees
Labels
3.12 bugs and security fixes docs Documentation in the Doc dir interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@zenbones
Copy link

zenbones commented Jan 11, 2024

Bug report

Bug description:

I have a single threading.Timer held on a class as self.timer...

    def refresh_timer(self):
        if self.timer is not None:
            self.timer.cancel()
        self.timer = threading.Timer(self.worker_context.timeout_minutes * 60, self.stop_cycle)
        self.timer.start()

As recently as 3.10 this code worked. As of 3.12 it errors with "can't create new thread at interpreter shutdown". The code has not changed, and the interpreter is not shutting down at the time this error is thrown, as in the process is live and code is executing. There are no code changes from where this is working in 3.10 to where it is not in 3.12.1 (upgraded from 3.12 to 3.12.1 and tsted again to see if this had been fixed).

CPython versions tested on:

3.12

Operating systems tested on:

Windows

Linked PRs

@zenbones zenbones added the type-bug An unexpected behavior, bug, or error label Jan 11, 2024
@zware
Copy link
Member

zware commented Jan 11, 2024

Can you please provide a full, self-contained reproducer? The provided snippet does not appear to be complete.

@zenbones
Copy link
Author

zenbones commented Jan 12, 2024

Not easily. The use case is complex, but the error is real, and was not there in 3.10. Maybe you can help me define a case I can get a reproducer for, or help me debug why this is happening now. The process entry point uses importlib.import_module() to load 'external code', starts a thread...

class WorkerThread(threading.Thread):
    def __init__(self, worker):
        threading.Thread.__init__(self)

        self.setDaemon(False)
        self.stopped = False
        self.worker = worker

self.worker_thread = WorkerThread(self)
self.worker_thread.start()

...and starts the timer (which is another thread I guess)...

self.timer = None
self.refresh_timer()

...and then, I don't know... exits, but doesn't because the WorkerThread is not a daemon. The WorkerThread uses the the passed in 'worker' to handle all the business logic. Periodically, as in often, the refresh_timer() method will be called from the WorkerThread instance, via the self.worker reference.

The initial timer setup is fine in the 'main' thread, but after that thread finishes (but doe not exit due to the non-daemon thread), the refresh_timer() is called again and fails with "can't create new thread at interpreter shutdown". Maybe this is related to work on multiple interpreters in 3.12?

Explaining this this way has helped in any case, and maybe I can get these classes setup. I think the module load, which is hard to replicate, may have nothing to do with this and I can try with a main thread,, a worker thread, and a timer thread. However, if the answer is obvious from my use case don't hesitate to save me some work and tell me what's up.

@zenbones
Copy link
Author

zenbones commented Jan 12, 2024

Voila...

import threading
import time

def main():
	
	worker = Worker();

class Worker:
    def __init__(self):
        self.worker_thread = WorkerThread(self)
        self.worker_thread.start()

        self.timer = None
        self.refresh_timer()
        print("worker init complete")

    def refresh_timer(self):
        if self.timer is not None:
            self.timer.cancel()
        self.timer = threading.Timer(30 * 60, self.stop_cycle)
        self.timer.start()

    def stop_cycle(self):
 	    print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")

class WorkerThread(threading.Thread):
    def __init__(self, worker):
        threading.Thread.__init__(self)

        self.setDaemon(False)
        self.stopped = False
        self.worker = worker

    def run(self):
        while True:
        	print("running...")
        	self.worker.refresh_timer()
        	time.sleep(3)

main()
C:\Users\david\Documents\testme.py:30: DeprecationWarning: setDaemon() is deprecated, set the daemon attribute instead
  self.setDaemon(False)
running...
worker init complete
running...
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Program Files\Python312\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "C:\Users\david\Documents\testme.py", line 37, in run
    self.worker.refresh_timer()
  File "C:\Users\david\Documents\testme.py", line 21, in refresh_timer
    self.timer.start()
  File "C:\Program Files\Python312\Lib\threading.py", line 992, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown

@zenbones
Copy link
Author

If I keep the main entry point running, as with timer.sleep(a_long_time), then I don't get the thread creation error. However, that would force me to throw in some waiting loop on a lock to keep the main thread alive until the non-daemon thread terminates, being very careful to trap any error and release the main thread (try/catch/finally). But none of that makes any difference to me. The whole point of designating the second thread as non-daemon is so the main thread does not exit until all the non-daemon threads terminate, so the system is responsible for all that.

My guess is that whatever throws "can't create new thread at interpreter shutdown" is checking some attribute of the main thread, but failing to check for any running non-daemon threads. Or, whatever code makes up the main thread is responding that yes, it has terminated, despite the fact that non-demon child threads are still running. So I think this is a python bug.

@ronaldoussoren
Copy link
Contributor

On a first glance this behaviour does seem correct:

main() creates a new non-daemon thread that does the actual work, and then returns. That mains the main thread immediately tries to exit the script and is therefore finalising the interpreter, waiting for the WorkerThread to stop before actually doing most of the cleanup and exiting the interpreter.

If you call worker.worker_thread.join() in main() the script will continue to run as long as the worker thread is.

I'm not an expert on the finer details of interpreter shutdown though.

@zenbones
Copy link
Author

zenbones commented Jan 12, 2024

The problem is that I want the function to return. I'll agree that a join is better than an event or a barrier, but anything like those will prevent the init function return, which is ugly, But more than that, it's wrong. The first problem with the argument that the main thread should exit the interpreter, but not 'finish' until the non-daemon thread exits, is that this is a significant change in behavior. This works correctly in 3.10 and throws this error in 3.12. That alone is dangerous. The second problem is that it makes no sense. A thread which is 'finalising' in the interpreter has exited as far as any client usage, and the docs on daemon threads say...

A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left.

So the 'program' can not exit due to the non-daemon thread, but the main thread kind of can? Even if that wasn't vague enough, if the program has not exited, why can't the live thread create a timer, or start another thread? The program has not exited. There is code running in the interpreter. But it's now blocked from creating a new thread?

The docs should be amended to say...

A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. However, a program running only child non-daemon threads will not be able to create new threads, such as with timer.start().

Does that really make sense?

@zenbones
Copy link
Author

zenbones commented Jan 12, 2024

by the way, def _shutdown() in threading .py, contains this code...

    # Join all non-deamon threads
    while True:
        with _shutdown_locks_lock:
            locks = list(_shutdown_locks)
            _shutdown_locks.clear()

        if not locks:
            break

        for lock in locks:
            # mimic Thread.join()
            lock.acquire()
            lock.release()

        # new threads can be spawned while we were waiting for the other
        # threads to complete

Note the comments. I really think the problem is the check in timer.start() which s not properly deciding to check for active non-daemon threads. I'm trying to find that code. Any pointers to it would be helpful.

@ronaldoussoren ronaldoussoren added docs Documentation in the Doc dir interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jan 12, 2024
@ronaldoussoren
Copy link
Contributor

There should IMHO at least be documentation about this behaviour. I'm therefore reopening the issue.

I've done some research. Py_FinalizeEx first sets the finalizing flag in the interpreter state, the waits for threads to exit:

cpython/Python/pylifecycle.c

Lines 1823 to 1842 in ac92527

int
Py_FinalizeEx(void)
{
int status = 0;
_PyRuntimeState *runtime = &_PyRuntime;
if (!runtime->initialized) {
return status;
}
/* Get current thread state and interpreter pointer */
PyThreadState *tstate = _PyThreadState_GET();
// XXX assert(_Py_IsMainInterpreter(tstate->interp));
// XXX assert(_Py_IsMainThread());
// Block some operations.
tstate->interp->finalizing = 1;
// Wrap up existing "threading"-module-created, non-daemon threads.
wait_for_thread_shutdown(tstate);

Starting a new thread ends up in _threading.start_joinable_thead, which in the end calls do_start_new_thread which raises an exception when the finalizing flag in the interpreter state is set:

static int
do_start_new_thread(thread_module_state* state,
PyObject *func, PyObject* args, PyObject* kwargs,
int joinable,
PyThread_ident_t* ident, PyThread_handle_t* handle)
{
PyInterpreterState *interp = _PyInterpreterState_GET();
if (!_PyInterpreterState_HasFeature(interp, Py_RTFLAGS_THREADS)) {
PyErr_SetString(PyExc_RuntimeError,
"thread is not supported for isolated subinterpreters");
return -1;
}
if (interp->finalizing) {
PyErr_SetString(PyExc_RuntimeError,
"can't create new thread at interpreter shutdown");
return -1;
}

@zenbones
Copy link
Author

zenbones commented Jan 12, 2024

This is unfortunately deep as Timer sub-classes Thread, and the error gets thrown in do_start_new_thread() of _threadmodule.c...

    if (interp->finalizing) {
        PyErr_SetString(PyExc_RuntimeError,
                        "can't create new thread at interpreter shutdown");
        return -1;
    }

So the interpreter is responding that it is finalizing, so I think my contention is either that...

  1. The interpreter should be blocked from entering the finalizing state until all non-daemon threads have exited. I think that makes sense if there's a single interpreter because there's code still running. My guess is this bug is tied up with the work on allowing multiple interpreters, which makes sense version wise, and the main thread's interpreter is finalizing and there are now other interpreters for the non-daemon threads, but the thread start only checks the main interpreter?

or

  1. That both the main thread and all non-daemon thread interpreters need to be checked before throwing that error?

There is a specific test for this error being thrown in test_threading.py, but no test that this error is not thrown if there's a non-daemon thread still running.

@zenbones
Copy link
Author

And yes, what you saw above. Thank you.

@OlivierGagnon
Copy link

Hello is there an ETA for a fix? thx

@ronaldoussoren
Copy link
Contributor

Hello is there an ETA for a fix? thx

No. It is not even clear to me that there is a bug in the first place.

@zenbones
Copy link
Author

zenbones commented Jan 31, 2024

It might simply be an undocumented change in behavior at odds with any proper notion of consistent thread context or the meaning of non-daemon threads. But maybe that's what the python language wants for itself. My workaround for this, if it's helpful, is to pass in from the main method...

class Workaround(object):
    def set_thread(self, thread):
        self.thread = thread

    def wait (self):
        self.thread.join()

Then set the non-daemon thread on this workaround, and in the main method call workaround.wait(). The need to do something like this shows how invalid the current thread model in Python is as the entire point of non-daemon threads is to make a hack like this unnecessary.

But it's not my language.

damies13 added a commit to damies13/rfswarm that referenced this issue Feb 18, 2024
Try Workaround mentioned in python/cpython#113964
damies13 added a commit to damies13/rfswarm that referenced this issue Feb 18, 2024
Workaround in python/cpython#113964 worked
Now just tuned from 300 sec to 10 sec in the loop, no need to wait 5 min after base.run_dbthread is set false, 10 sec should be plenty, actually 1 would probably be enough.
Cut debugging in test back to level 1
Issue #177
@gvanrossum
Copy link
Member

Here's a simpler reproducer. We have main() creating a thread which sleeps a bit and then creates another thread.

import threading, time

def main():
    threading.Thread(target=bloop).start()

def bloop():
    time.sleep(0.5)
    threading.Thread(target=print, args=("new thread running",)).start()

main()

This works, printing "new thread running" in 3.11 (in fact, if you add from __future__ import print_function it runs in 2.7 :-). But in 3.12 it reports

RuntimeError: can't create new thread at interpreter shutdown

Let's focus on that -- is that intentional? Why?

Is the solution maybe as simple as first waiting for all non-daemon threads to exit and then setting the finalizing flag? That's the semantics I would have expected, and that might have been what happened previously.

@gpshead
Copy link
Member

gpshead commented Feb 22, 2024

The discussion in the other issue what that form of simple reproducer came from seemed to me to gloss over reason: The main thread "exiting" by virtue of leaving the main .py file is being considered interpreter shutdown. We should not treat it as such. Users expect the main thread to implicitly block waiting on all non-daemon threads to complete because that has always been our behavior. So adding code that does that before we consider the interpreter to be in a "shutdown" state would solve it.

The high level way to do that via @vstinner on #115533 from Python is adding this form of logic as a finally clause on the way out of the main .py file:

import threading
for thread in threading.enumerate():
    if thread.daemon or thread is threading.current_thread():
        continue
    thread.join()

How we want to implement that is up to us. The "just pretend its surrounded by a try finally with that code" statement might not be wholly accurate in terms of other things that need dealing with order of operations wise after "main code object exiting" but if so, that's just a matter of researching what 3.11 did, and deciding if the same order still makes sense.

@gpshead gpshead changed the title timer.start() yields "can't create new thread at interpreter shutdown" 3.12 regression: "can't create new thread at interpreter shutdown" from non-daemon threads Feb 22, 2024
@gpshead gpshead added the 3.12 bugs and security fixes label Feb 22, 2024
@gvanrossum
Copy link
Member

According to git bisect this is the culprit: ce558e6 (PR gh-104826, Issue gh-104690). @gpshead :-)

@gpshead
Copy link
Member

gpshead commented Feb 23, 2024

Not at all surprised to see my name on that change. :) This looks like an unintended consequence, we apparently never had test coverage of this feature - and perhaps proper documentation in the threading module describing it rather than implying it by omission.

@gpshead
Copy link
Member

gpshead commented Mar 19, 2024

def exit_func():
    thread = threading.Thread(target=foo)
    thread.start()


atexit.register(exit_func)

The current PR is focused on spawning a thread during finalization from a main ended garbage collection-ish context. The atexit context may need its own PR as is opens new questions that need specific answers and a bit of different implementation.

We need additional changes to make that work safely. It raises the question of "when do we call atexit handlers?"... if they allow code to launch a thread during an atexit handler... it'll run concurrently and must not re-trigger the same atexit handlers to be called when it is the last thread standing and itself exits. do atexit handlers executing prevent the registration of new atexit handlers? people could setup creative loops registering new ones from such a thread... newly registered ones weren't in the first list to be called, they could remain and get called. ... pondering ...

@gpshead
Copy link
Member

gpshead commented Mar 19, 2024

recording a use case from the linked duped issue #115219 directly for reference: TL;DR - Subprocess on windows uses a thread thus cannot be used in atexit context in 3.12 there as matplotlib does:

In Matplotlib we are holding onto a latex process and triggering this via a weakref finalize, but a minimal reproducer is:

from subprocess import Popen, PIPE
import atexit

# anything that holds stdin and stdout should work
proc = Popen(['powershell'], stdin=PIPE, stdout=PIPE)

def cleanup():
    print('hi bob')
    proc.kill()
    proc.communicate()

atexit.register(cleanup)

@gpshead gpshead changed the title 3.12 regression: "can't create new thread at interpreter shutdown" from non-daemon threads 3.12 regression: "can't create new thread at interpreter shutdown" from non-daemon threads or atexit handlers Mar 19, 2024
colesbury added a commit that referenced this issue Mar 19, 2024
#116677)

Starting in Python 3.12, we prevented calling fork() and starting new threads
during interpreter finalization (shutdown). This has led to a number of
regressions and flaky tests. We should not prevent starting new threads
(or `fork()`) until all non-daemon threads exit and finalization starts in
earnest.

This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`,
which is set immediately before terminating non-daemon threads.
colesbury added a commit to colesbury/cpython that referenced this issue Mar 19, 2024
…n threads exit (pythonGH-116677)

Starting in Python 3.12, we prevented calling fork() and starting new threads
during interpreter finalization (shutdown). This has led to a number of
regressions and flaky tests. We should not prevent starting new threads
(or `fork()`) until all non-daemon threads exit and finalization starts in
earnest.

This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`,
which is set immediately before terminating non-daemon threads.
(cherry picked from commit 60e105c)

Co-authored-by: Sam Gross <colesbury@gmail.com>
colesbury added a commit that referenced this issue Mar 19, 2024
…ads exit (GH-116677) (#117029)

Starting in Python 3.12, we prevented calling fork() and starting new threads
during interpreter finalization (shutdown). This has led to a number of
regressions and flaky tests. We should not prevent starting new threads
(or `fork()`) until all non-daemon threads exit and finalization starts in
earnest.

This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`,
which is set immediately before terminating non-daemon threads.

(cherry picked from commit 60e105c)
vstinner pushed a commit to vstinner/cpython that referenced this issue Mar 20, 2024
…ds exit (python#116677)

Starting in Python 3.12, we prevented calling fork() and starting new threads
during interpreter finalization (shutdown). This has led to a number of
regressions and flaky tests. We should not prevent starting new threads
(or `fork()`) until all non-daemon threads exit and finalization starts in
earnest.

This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`,
which is set immediately before terminating non-daemon threads.
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…ds exit (python#116677)

Starting in Python 3.12, we prevented calling fork() and starting new threads
during interpreter finalization (shutdown). This has led to a number of
regressions and flaky tests. We should not prevent starting new threads
(or `fork()`) until all non-daemon threads exit and finalization starts in
earnest.

This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`,
which is set immediately before terminating non-daemon threads.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…ds exit (python#116677)

Starting in Python 3.12, we prevented calling fork() and starting new threads
during interpreter finalization (shutdown). This has led to a number of
regressions and flaky tests. We should not prevent starting new threads
(or `fork()`) until all non-daemon threads exit and finalization starts in
earnest.

This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`,
which is set immediately before terminating non-daemon threads.
@tlandschoff-scale
Copy link

Just to have another reproducer, we have code that sometimes starts a new thread and sometimes gets called from GC during shutdown (actually from C objects that are destructed). This lead to processes hanging during shutdown.

Code:

from threading import Thread

class CleanupOnDelete:
    def __del__(self):
        print("Starting thread due to deletion")
        thread = Thread(target=lambda: None)
        thread.start()
        print("Thread started")
        thread.join()
        print("Thread joined")

foo = CleanupOnDelete()

This used to work with Python 3.7, lead to interpreter hangs in intermediate versions and not raises an exception:

$ docker run -it --rm -v "$PWD":/src -w /src python:3.7 python exit_gc_hang.py
Starting thread due to deletion
Thread started
Thread joined

$ docker run -it --rm -v "$PWD":/src -w /src python:3.8 python exit_gc_hang.py
Starting thread due to deletion
<hangs...>

$ docker run -it --rm -v "$PWD":/src -w /src python:3.10 python exit_gc_hang.py
Starting thread due to deletion
<hangs...>

$ docker run -it --rm -v "$PWD":/src -w /src python:3.11 python exit_gc_hang.py
Starting thread due to deletion
<hangs...>

$ docker run -it --rm -v "$PWD":/src -w /src python:3.12 python exit_gc_hang.py
Starting thread due to deletion
Exception ignored in: <function CleanupOnDelete.__del__ at 0x7f83834f6200>
Traceback (most recent call last):
  File "/src/exit_gc_hang.py", line 7, in __del__
  File "/usr/local/lib/python3.12/threading.py", line 992, in start
RuntimeError: can't create new thread at interpreter shutdown

$ docker run -it --rm -v "$PWD":/src -w /src python:3.13-rc-bookworm python exit_gc_hang.py
Starting thread due to deletion
Exception ignored in: <function CleanupOnDelete.__del__ at 0x7fd02488d760>
Traceback (most recent call last):
  File "/src/exit_gc_hang.py", line 7, in __del__
  File "/usr/local/lib/python3.13/threading.py", line 971, in start
PythonFinalizationError: can't create new thread at interpreter shutdown

We can actually live with getting an exception, but it is difficult to deal with processes stuck in shutdown.
Currently, we are guarding the thread creation with a check if main_thread().is_alive().

@3042457129
Copy link

You can try threading.Thread.join() function to block the main thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes docs Documentation in the Doc dir interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

10 participants