-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os.closerange optimization #57997
Comments
The current implementation of closerange essentially is a bruteforce invocation of close for every integer in the range. While this works, it's rather noisy for stracing, and for most invocations, is near a thousand close invocations more than needed. As such it should be aware of /proc/${PID}/fd, and use that to isolate down just what is actually open, and close that. |
Fixed tabs/spaces... |
Thanks for the patch. However, this cannot as far as I understand be used for the subprocess implementation due to the limitation of what can be called after a fork() and before an exec(). Take a look at bpo-8052 for some more discussion of this. |
fwiw, s/MSDOS_WINDOWS/MS_WINDOWS/. |
Reopening. Comments added to the code review. This issue is independent of the subprocess module issue in bpo-8052. The _posixsubprocess.c has its own fd closing loop. http://hg.python.org/cpython/file/050c07b31192/Modules/_posixsubprocess.c#l118 |
Two small technical comments:
|
In case someone is wondering if the approach really reduces the amount of syscalls: yes, it does. readdir() doesn't do a syscall for each entry. On Linux it uses the internal syscall getdents() to fill a buffer of directory entry structs. http://man7.org/linux/man-pages/man2/getdents.2.html On my system os.listdir() does four syscalls: $ strace python -c "import os; os.listdir('/home/heimes')" openat(AT_FDCWD, "/home/heimes", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 On Linux you can also use /proc/self/fd instead of /proc/YOURPID/fd. Other operating systems have different APIs to get a list of open FDs. AFAK /dev/fd is static on FreeBSD and Mac OS X: FreeBSD: Darwin / Mac OS X: |
_posixsubprocess already uses the Linux getdent64 syscall when available (though for different reasons: readdir is not safe in that context). http://hg.python.org/cpython/file/3f3cbfd52f94/Modules/_posixsubprocess.c#l227 Probing for procfs at configure time could be problematic. It is a virtual filesystem. It is entirely possible for a system to choose not to mount it. It might be reasonable to assume that it "might be present" only if the system had it mounted at compile time but a configure flag to override that might be desirable for some systems (not the Linux systems I usually deal with). If we're going through all of these hoops for closerange: I'd love to see an API exposed in the os module to return a list of open fd's. It is an abstraction nobody should have to write for themselves. |
FreeBSD and other OSes provide closefrom(). Why not exposing this function which is probably implemented as a single syscall? |
See also bpo-38061: "FreeBSD: Optimize subprocess.Popen(close_fds=True) using closefrom()". |
Linux has a close_range syscall since v5.9 (Oct 2020): https://man7.org/linux/man-pages/man2/close_range.2.html |
Code needed in a modern patch:
|
Is there any reason to keep this open after #84603? The only possible further optimization item seems to be procfs-based iteration over fds, but I'm not sure it makes sense in the long term given that |
I close the issue. Python now has _Py_closerange() function with 4 implementations:
_Py_closerange() is now used by:
|
It actually seems to be a Solaris thing. I can't find it on macOS (nor FreeBSD and OpenBSD). See also #82291. |
Oh ok :-) |
I see that # Debian images - Python 3.5, 3.9, 3.10, 3.11-rc1
$ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.11-rc-bullseye python /tmp/test.py 10000 100
3.084831694024615
$ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.10 python /tmp/test.py 10000 100
3.1409343010745943
$ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.9 python /tmp/test.py 10000 100
3.100054975016974
$ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.5 python /tmp/test.py 10000 100
3.143577027018182
# Alpine images - Python 3.5, 3.9, 3.10, 3.11-rc1
$ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.11-rc-alpine python /tmp/test.py 10000 100
3.0429400720167905
$ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.10-alpine python /tmp/test.py 10000 100
3.116578195942566
$ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.9-alpine python /tmp/test.py 10000 100
3.105302636977285
$ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.5-alpine python /tmp/test.py 10000 100
3.145560749922879
import os, subprocess, sys, timeit
from resource import *
soft, hard = getrlimit(RLIMIT_NOFILE)
setrlimit(RLIMIT_NOFILE, (hard, hard))
num_fds, num_iter = map(int, sys.argv[1:3])
for i in range(num_fds):
os.open('/dev/null', os.O_RDONLY)
print(timeit.timeit(lambda: subprocess.run('/bin/true'), number=num_iter)) Run from a VM guest (Fedora 36) using Docker
|
close_range() wrapper was added in glibc 2.34. Debian Bullseye uses 2.31, so CPython doesn't use close_range() there. The same applies to Note that it's important not to confuse usage of close_range() in subprocess and in If we want to use close_range() on older glibcs, we'd need to call the syscall directly. |
This issue is closed. If you consider that Python can be enhanced, please open a new issue. |
Fedora 36 has 2.35, but I assume while the Fedora 36 kernel is used by the container, for glibc dependency that depends on the Docker container right? Alpine using musl had no glibc package installed, and I assume the
I think I understood how the support and priority works for both, I shared my attempt to understand what went on under the hood here. I'm not sure why
I must have misunderstood the PR, as I was under the impression that it was decided / approved to use the syscall directly? |
That's right.
musl libc doesn't have close_range() wrapper yet, though the patch has been posted recently. As for running CPython linked with glibc on Alpine, no, gcompat won't help, since it too doesn't have close_range() .
It could be made to, but that requires additional work because subprocess and If you think adding
That comment talks about calling close_range() wrapper directly, as opposed to using (expanded) _Py_closerange() (because of problems with sharing code written for different requirements). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: