Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess can block all Python threads when using vfork() until the child process exec() succeeds or fails. #104372

Closed
gpshead opened this issue May 10, 2023 · 3 comments · Fixed by #104782
Assignees
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@gpshead
Copy link
Member

gpshead commented May 10, 2023

Background

When using the subprocess module on Linux, vfork() rather than fork() has been preferred when possible for a few Python releases as is generally offers MUCH higher performance, the larger the parent process, the more cpu time it saves on page table copying.

Problem

vfork() - by design - does not return control to parent process until after the child process has successfully performed an exec() system call or has died.

Python subprocess uses _posixsubprocess.fork_exec() extension module implementation to do handle async-signal-safe fork/vfork+exec code path (as that cannot safely be implemented in Python). This function does not release the GIL. Thus the GIL is held across the vfork() call. Meaning the parent process does not resume execution until the exec has succeeded, or failed and returned an error which our child writes to it's errpipe_write fd before exiting.

This can result in all Python threads hanging (ie: the GIL is held by a blocked thread) when the child exec system call takes a long time. Such as the binary existing on a potentially slow or high latency network filesystem. (We witnessed this on a FUSE filesystem backed by remote (high latency) cloud storage with a large executable... but any slow or high latency filesystem containing an executable or anything interfering in the exec system call can cause the same problem).

Workarounds

Building without vfork support in _posixsubprocess or otherwise passing an arg to subprocess API that happens to disable vfork because it is incompatible with the vfork-concept are workarounds. (I won't recommend any of those because none of them are a good idea to pass when not needed for their primary purpose).

Another workaround viable in our specific case is to pre-read the executable before asking subprocess to launch it as that moves the high IO latency from the exec system call into a read without the GIL held beforehand as it pulls the executable into a fast-enough local storage cache. (That being useful at all depends on your particular filesystem latency implementation situation)

Potential Solutions

If we could simply release the GIL around the vfork() call that would seem to be ideal... It's not clear to me that we can actually do that, I'm working out what a PR would look like or what blocks us from doing so safely.

Offering yet another keyword argument to subprocess APIs to disable the use of APIs that could block the entire process upon exec() such as vfork is viable - but doesn't feel great given our existing long list of subprocess args.

Linked PRs

@gpshead gpshead added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir extension-modules C modules in the Modules dir labels May 10, 2023
@gpshead gpshead self-assigned this May 10, 2023
@gpshead
Copy link
Member Author

gpshead commented May 15, 2023

A practical workaround usable in some cases:

+import shlex
 ...
-    subprocess.run(argv, ...)
+    subprocess.run(shlex.join(["exec"] + argv, shell=True, ...)

Do not use this workaround unless you fully control argv and know that it cannot contain anything that might wind up as shell injection.

That defers the blocking exec into an intermediary post-vfork-exec child process using your already installed shell that is already capable of doing that.

gpshead added a commit to gpshead/cpython that referenced this issue May 16, 2023
We move all of the Python C API calls into the parent process up front
instead of doing PyLong_AsLong and PyErr_Occurred and PyTuple_GET from
the post-fork/vfork child process.
gpshead added a commit that referenced this issue May 17, 2023
…al safety and no GIL requirement (#104518)

Move all of the Python C API calls into the parent process up front
instead of doing PyLong_AsLong and PyErr_Occurred and PyTuple_GET from
the post-fork/vfork child process.

Much of this was long overdue. We shouldn't have been using PyTuple and
PyLong APIs within all of these low level functions anyways.
JelleZijlstra pushed a commit to JelleZijlstra/cpython that referenced this issue May 18, 2023
…c signal safety and no GIL requirement (python#104518)

Move all of the Python C API calls into the parent process up front
instead of doing PyLong_AsLong and PyErr_Occurred and PyTuple_GET from
the post-fork/vfork child process.

Much of this was long overdue. We shouldn't have been using PyTuple and
PyLong APIs within all of these low level functions anyways.
gpshead added a commit that referenced this issue May 20, 2023
…104697)

Use non-Raw malloc for c_fds_to_keep as the code could ask for 0 length.
gpshead added a commit to gpshead/cpython that referenced this issue May 23, 2023
On Linux where the `subprocess` module can use the `vfork` syscall for
faster spawning, prevent the parent process from blocking other threads
by dropping the GIL while it waits for the vfork'ed child process `exec`
outcome.  This prevents spawning a binary from a slow filesystem from
blocking the rest of the application.

Fixes python#104372.
gpshead added a commit to gpshead/cpython that referenced this issue May 23, 2023
…or async signal safety and no GIL requirement (pythonGH-104518)

Move all of the Python C API calls into the parent process up front
instead of doing PyLong_AsLong and PyErr_Occurred and PyTuple_GET from
the post-fork/vfork child process.

Much of this was long overdue. We shouldn't have been using PyTuple and
PyLong APIs within all of these low level functions anyways..
(cherry picked from commit c649df6)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
gpshead added a commit to gpshead/cpython that referenced this issue May 23, 2023
…or async signal safety and no GIL requirement (pythonGH-104518)

Move all of the Python C API calls into the parent process up front
instead of doing PyLong_AsLong and PyErr_Occurred and PyTuple_GET from
the post-fork/vfork child process.

Much of this was long overdue. We shouldn't have been using PyTuple and
PyLong APIs within all of these low level functions anyways..
(cherry picked from commit c649df6)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
gpshead added a commit to gpshead/cpython that referenced this issue May 23, 2023
…or async signal safety and no GIL requirement (pythonGH-104518)

Move all of the Python C API calls into the parent process up front
instead of doing PyLong_AsLong and PyErr_Occurred and PyTuple_GET from
the post-fork/vfork child process.

Much of this was long overdue. We shouldn't have been using PyTuple and
PyLong APIs within all of these low level functions anyways..
(cherry picked from commit c649df6)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
gpshead added a commit to gpshead/cpython that referenced this issue May 23, 2023
…c signal safety and no GIL requirement (python#104518)

Move all of the Python C API calls into the parent process up front
instead of doing PyLong_AsLong and PyErr_Occurred and PyTuple_GET from
the post-fork/vfork child process.

Much of this was long overdue. We shouldn't have been using PyTuple and
PyLong APIs within all of these low level functions anyways.

(cherry picked from commit c649df6)
gpshead added a commit to gpshead/cpython that referenced this issue May 23, 2023
…or async signal safety and no GIL requirement (pythonGH-104518)

Move all of the Python C API calls into the parent process up front
instead of doing PyLong_AsLong and PyErr_Occurred and PyTuple_GET from
the post-fork/vfork child process.

Much of this was long overdue. We shouldn't have been using PyTuple and
PyLong APIs within all of these low level functions anyways..
(cherry picked from commit c649df6)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
@gpshead gpshead added 3.11 only security fixes 3.12 bugs and security fixes 3.10 only security fixes labels May 23, 2023
@gpshead
Copy link
Member Author

gpshead commented May 23, 2023

A heavy handed workaround, disable the use of vfork entirely:

subprocess._USE_VFORK = False

if your process is large this will slow down process launching time (regular fork is used - which must copy the entire process memory map) but for most things that is perfectly fine.

gpshead added a commit that referenced this issue May 24, 2023
… signal safety gh-104518 (#104785)

Move all of the Python C API calls into the parent process up front
instead of doing PyLong_AsLong and PyErr_Occurred and PyTuple_GET from
the post-fork/vfork child process.

Much of this was long overdue. We shouldn't have been using PyTuple and
PyLong APIs within all of these low level functions anyways.

This is a backport of c649df6 for #104518 and the tiny adjustment in d1732fe #104697.

Backporting this allows backporting of the real bug fix that requires it.

Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
gpshead added a commit to gpshead/cpython that referenced this issue May 24, 2023
The ideal pattern for this.  (already in the 3.11 backport)
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 24, 2023
The ideal pattern for this.  (already in the 3.11 backport)
(cherry picked from commit 7f963bf)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
gpshead added a commit that referenced this issue May 24, 2023
The ideal pattern for this.  (already in the 3.11 backport)
gpshead added a commit that referenced this issue May 24, 2023
gh-104372: use == -1 before PyErr_Occurred (GH-104831)

The ideal pattern for this.  (already in the 3.11 backport)
(cherry picked from commit 7f963bf)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
gpshead added a commit that referenced this issue May 25, 2023
On Linux where the `subprocess` module can use the `vfork` syscall for
faster spawning, prevent the parent process from blocking other threads
by dropping the GIL while it waits for the vfork'ed child process `exec`
outcome.  This prevents spawning a binary from a slow filesystem from
blocking the rest of the application.

Fixes #104372.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 25, 2023
On Linux where the `subprocess` module can use the `vfork` syscall for
faster spawning, prevent the parent process from blocking other threads
by dropping the GIL while it waits for the vfork'ed child process `exec`
outcome.  This prevents spawning a binary from a slow filesystem from
blocking the rest of the application.

Fixes pythonGH-104372.
(cherry picked from commit d086792)

Co-authored-by: Gregory P. Smith <gps@python.org>
@gpshead
Copy link
Member Author

gpshead commented May 25, 2023

I'll take care of a 3.11.x bug fix backport, but I want to let this bake in some 3.12beta releases for a while first. It isn't urgent.

gpshead added a commit that referenced this issue May 25, 2023
…104942)

gh-104372: Drop the GIL around the vfork() call. (GH-104782)

On Linux where the `subprocess` module can use the `vfork` syscall for
faster spawning, prevent the parent process from blocking other threads
by dropping the GIL while it waits for the vfork'ed child process `exec`
outcome.  This prevents spawning a binary from a slow filesystem from
blocking the rest of the application.

Fixes GH-104372.
(cherry picked from commit d086792)

Co-authored-by: Gregory P. Smith <gps@python.org>
@gpshead gpshead reopened this May 26, 2023
gpshead added a commit to gpshead/cpython that referenced this issue May 26, 2023
On Linux where the `subprocess` module can use the `vfork` syscall for
faster spawning, prevent the parent process from blocking other threads
by dropping the GIL while it waits for the vfork'ed child process `exec`
outcome.  This prevents spawning a binary from a slow filesystem from
blocking the rest of the application.

Fixes python#104372.

(cherry picked from commit d086792)
@gpshead gpshead closed this as completed May 26, 2023
gpshead added a commit that referenced this issue Sep 1, 2023
…04958)

gh-104372: Drop the GIL around the vfork() call. (#104782)

On Linux where the `subprocess` module can use the `vfork` syscall for
faster spawning, prevent the parent process from blocking other threads
by dropping the GIL while it waits for the vfork'ed child process `exec`
outcome.  This prevents spawning a binary from a slow filesystem from
blocking the rest of the application.

Fixes #104372.

(cherry picked from commit d086792)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant