deadlocked child process after forking on pystate.c's head_mutex #74580

lbrandy · 2017-05-17T23:25:59Z

BPO	30395
Nosy	@gpshead, @vstinner, @ambv, @cooperlees, @lbrandy, @phantez
PRs	bpo-30395: _PyGILState_Reinit deadlock fix #1734 [3.6] bpo-30395 _PyGILState_Reinit deadlock fix (GH-1734) #1740

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2017-06-09.21:23:13.988>
created_at = <Date 2017-05-17.23:25:58.616>
labels = ['interpreter-core', 'type-feature', '3.7']
title = "deadlocked child process after forking on pystate.c's head_mutex"
updated_at = <Date 2017-06-09.21:23:13.986>
user = 'https://github.com/lbrandy'

bugs.python.org fields:

activity = <Date 2017-06-09.21:23:13.986>
actor = 'lukasz.langa'
assignee = 'none'
closed = True
closed_date = <Date 2017-06-09.21:23:13.988>
closer = 'lukasz.langa'
components = ['Interpreter Core']
creation = <Date 2017-05-17.23:25:58.616>
creator = 'Louis Brandy'
dependencies = []
files = []
hgrepos = []
issue_num = 30395
keywords = []
message_count = 10.0
messages = ['293906', '294165', '294205', '294222', '294256', '294316', '294326', '294331', '294366', '295582']
nosy_count = 9.0
nosy_names = ['gregory.p.smith', 'vstinner', 'pixelbeat', 'lukasz.langa', 'cooperlees', 'Louis Brandy', 'drothera', 'phantez', 'Piotr Wysocki']
pr_nums = ['1734', '1740']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue30395'
versions = ['Python 3.6', 'Python 3.7']

lbrandy · 2017-05-17T23:25:58Z

A forked process (via os.fork) can inherit a locked head_mutex from its parent process and will promptly deadlock in this stack (on my linux box):

Child Process (deadlocked):

#0 0x00007f1a4da82e3c in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7f1a4c2964e0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 0x00007f1a4da82e3c in do_futex_wait (sem=sem@entry=0x7f1a4c2964e0, abstime=0x0) at sem_waitcommon.c:111
#2 0x00007f1a4da82efc in __new_sem_wait_slow (sem=0x7f1a4c2964e0, abstime=0x0) at sem_waitcommon.c:181
#3 0x00007f1a4da82fb9 in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4 0x00007f1a4de4c605 in PyThread_acquire_lock_timed (lock=0x7f1a4c2964e0, microseconds=-1, intr_flag=0) at Python/thread_pthread.h:352
#5 0x00007f1a4de4c4b4 in PyThread_acquire_lock (lock=<optimized out>, waitflag=waitflag@entry=1) at Python/thread_pthread.h:556
#6 0x00007f1a4dd59e08 in _PyThreadState_DeleteExcept (tstate=0x7f19f2301800) at Python/pystate.c:483
#7 0x00007f1a4dd46af4 in PyEval_ReInitThreads () at Python/ceval.c:326
#8 0x00007f1a4dd78b0b in PyOS_AfterFork () at ./Modules/signalmodule.c:1608

The parent process has a race between one thread calling os.fork (and holding the GIL) and another (in our case C++) thread trying to use PyGILState_Ensure. PyGILState_Ensure will grab the head_mutex before it tries to get the GIL. So if a fork happens in this critical section, the child process will get the locked head_mutex.

The path from PyGILState_Ensure -> head_mutex looks like this:

#0 new_threadstate (interp=0x7fb5fd483d80, init=init@entry=1) at Python/pystate.c:183
#1 0x00007fb5ff149027 in PyThreadState_New (interp=<optimized out>) at Python/pystate.c:250
#2 0x00007fb5ff006ac7 in PyGILState_Ensure () at Python/pystate.c:838

----

Possible fix?

A simple fix would be to, inside PyOS_AfterFork, reset/unlock pystate.c's head_mutex if it's already locked.

Unclear if this is related to: https://bugs.python.org/issue28812

gpshead · 2017-05-22T17:49:51Z

You cannot safely use Python's os.fork() in a process that has threads. Because of POSIX.

The CPython interpreter is by definition not async signal safe so no Python code may safely be executed after the fork() system call if the process contained _any_ threads at fork() system call time.

The only reliable recommendation: *Never use os.fork()* (if your process _might ever_ contain any threads now or in the future started by your or any of your dependencies).

The closest thing to a fix to this is a bunch of band-aids to setup atfork functions to clear the state of known locks in the forked child. -- But even that can't guarantee things because that can't know about all locks.

The C standard library can contain locks. malloc() for example. Any C/C++ extension modules can contain locks of their own. The problem isn't solvable within CPython. Adding a clear of pystate.c's head_mutex after forking makes sense. That may even get you further. But you will encounter other problems of the same nature in the future.

related: https://bugs.python.org/issue6721 and https://bugs.python.org/issue16500

ambv · 2017-05-22T23:58:57Z

New changeset f82c951 by Łukasz Langa (Jason Fried) in branch 'master':
bpo-30395 _PyGILState_Reinit deadlock fix (bpo-1734)
f82c951

ambv · 2017-05-23T05:23:07Z

New changeset d29fecc by Łukasz Langa in branch '3.6':
[3.6] bpo-30395 _PyGILState_Reinit deadlock fix (GH-1734) (bpo-1740)
d29fecc

lbrandy · 2017-05-23T15:37:10Z

Thanks to everyone jumping in.

I need no convincing that mixing forks and threads isn't just a problem but a problem factory. Given that the rest of this code seems to try to avoid similar deadlocks with similar mutexes, I figured we'd want to include this mutex to make a best-effort at being safe here. That made it worth reporting. To be sure, I still believe that the application code that led us here needs deeper fixes to address the fork/thread problems.

vstinner · 2017-05-24T04:23:06Z

+ head_mutex = NULL;

Shouldn't we free memory here?

gpshead · 2017-05-24T05:57:13Z

Would PyThread_free_lock (effectively sem_destroy()) work without (additional) problems?

vstinner · 2017-05-24T07:22:27Z

Gregory P. Smith added the comment:

Would PyThread_free_lock (effectively sem_destroy()) work without
(additional) problems?

If I recall correctly, no, you can get issues if the lock is still
acquired... But I should check.

gpshead · 2017-05-24T17:40:05Z

Another alternative *might* be to check if the lock is locked (non-blocking acquire?) and release it if so. Under the normal assumption that we are the only thread running immediately post-fork().

I'm not sure that can be guaranteed reliable given that other C code could've used pthread_atfork to register an "after" fork handler that spawns threads. But that should be rare, and nothing here can really fix the underlying "programmer has mixed fork and threads" issue. Only ameliorate it.

But i'm not sure this post fork memory leak is really a problem other than potentially showing up in memory sanatizer runs involving forked children. The scenario where it would be an actual leak is if a process does serial fork() calls with the parent(s) dying rather than forking new children from a common parent. That would grow the leak as each child would have an additional lock allocated (a few bytes). I don't _believe_ that kind of process pattern is common (but people do everything).

ambv · 2017-06-09T21:23:14Z

If you'd like to fix the miniscule leak this introduces, feel free but I don't think it's worth the additional complexity. Closing this for now as it solved an issue for us internally and we haven't observed any memory-related issues due to it.

lbrandy mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 17, 2017

gpshead added 3.7 (EOL) end of life invalid type-feature A feature request or enhancement labels May 22, 2017

ambv closed this as completed Jun 9, 2017

ambv removed the invalid label Jun 9, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

ChuBoning mentioned this issue Nov 20, 2023

deadlocked child process after forking on pystate.c's HEAD_LOCK #112275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deadlocked child process after forking on pystate.c's head_mutex #74580

deadlocked child process after forking on pystate.c's head_mutex #74580

lbrandy mannequin commented May 17, 2017

lbrandy mannequin commented May 17, 2017

gpshead commented May 22, 2017

ambv commented May 22, 2017

ambv commented May 23, 2017

lbrandy mannequin commented May 23, 2017

vstinner commented May 24, 2017

gpshead commented May 24, 2017

vstinner commented May 24, 2017

gpshead commented May 24, 2017

ambv commented Jun 9, 2017

deadlocked child process after forking on pystate.c's head_mutex #74580

deadlocked child process after forking on pystate.c's head_mutex #74580

Comments

lbrandy mannequin commented May 17, 2017

lbrandy mannequin commented May 17, 2017

gpshead commented May 22, 2017

ambv commented May 22, 2017

ambv commented May 23, 2017

lbrandy mannequin commented May 23, 2017

vstinner commented May 24, 2017

gpshead commented May 24, 2017

vstinner commented May 24, 2017

gpshead commented May 24, 2017

ambv commented Jun 9, 2017