Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.closerange optimization #57997

Closed
ferringb mannequin opened this issue Jan 15, 2012 · 21 comments
Closed

os.closerange optimization #57997

ferringb mannequin opened this issue Jan 15, 2012 · 21 comments
Labels
3.11 only security fixes extension-modules C modules in the Modules dir performance Performance or resource usage

Comments

@ferringb
Copy link
Mannequin

ferringb mannequin commented Jan 15, 2012

BPO 13788
Nosy @birkenfeld, @gpshead, @ronaldoussoren, @vstinner, @tiran, @wmanley
Files
  • closerange-optimization.patch: for non-windows systems, if /proc/${PID}/fd is available make use of it
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2012-01-15.04:29:24.624>
    labels = ['extension-modules', '3.11', 'performance']
    title = 'os.closerange optimization'
    updated_at = <Date 2021-07-05.00:19:38.896>
    user = 'https://bugs.python.org/ferringb'

    bugs.python.org fields:

    activity = <Date 2021-07-05.00:19:38.896>
    actor = 'gregory.p.smith'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2012-01-15.04:29:24.624>
    creator = 'ferringb'
    dependencies = []
    files = ['24242']
    hgrepos = []
    issue_num = 13788
    keywords = ['patch']
    message_count = 12.0
    messages = ['151273', '151274', '151275', '151287', '151289', '192738', '192739', '192760', '192776', '351354', '396819', '396963']
    nosy_count = 8.0
    nosy_names = ['georg.brandl', 'gregory.p.smith', 'ronaldoussoren', 'vstinner', 'ferringb', 'christian.heimes', 'rosslagerwall', 'wmanley']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue13788'
    versions = ['Python 3.11']

    @ferringb
    Copy link
    Mannequin Author

    ferringb mannequin commented Jan 15, 2012

    The current implementation of closerange essentially is a bruteforce invocation of close for every integer in the range.

    While this works, it's rather noisy for stracing, and for most invocations, is near a thousand close invocations more than needed.

    As such it should be aware of /proc/${PID}/fd, and use that to isolate down just what is actually open, and close that.

    @ferringb ferringb mannequin added extension-modules C modules in the Modules dir performance Performance or resource usage labels Jan 15, 2012
    @ferringb
    Copy link
    Mannequin Author

    ferringb mannequin commented Jan 15, 2012

    Fixed tabs/spaces...

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Jan 15, 2012

    Thanks for the patch.

    However, this cannot as far as I understand be used for the subprocess implementation due to the limitation of what can be called after a fork() and before an exec().

    Take a look at bpo-8052 for some more discussion of this.

    @neologix neologix mannequin closed this as completed Jan 15, 2012
    @birkenfeld
    Copy link
    Member

    fwiw, s/MSDOS_WINDOWS/MS_WINDOWS/.

    @gpshead
    Copy link
    Member

    gpshead commented Jan 15, 2012

    Reopening. Comments added to the code review.

    This issue is independent of the subprocess module issue in bpo-8052. The _posixsubprocess.c has its own fd closing loop.

    http://hg.python.org/cpython/file/050c07b31192/Modules/_posixsubprocess.c#l118

    @gpshead gpshead self-assigned this Jan 15, 2012
    @gpshead gpshead reopened this Jan 15, 2012
    @ronaldoussoren
    Copy link
    Contributor

    Two small technical comments:

    1. I'd add a configure or compile-time check to determine if the procfs
      interface might be available. I don't like probing for features that
      you know are not available.

    2. MacOSX has simular functionality using /dev/fd instead of
      /proc/${PID}/fd (and other BSD systems might have this as well)

    @tiran
    Copy link
    Member

    tiran commented Jul 9, 2013

    In case someone is wondering if the approach really reduces the amount of syscalls: yes, it does. readdir() doesn't do a syscall for each entry. On Linux it uses the internal syscall getdents() to fill a buffer of directory entry structs. http://man7.org/linux/man-pages/man2/getdents.2.html

    On my system os.listdir() does four syscalls:

    $ strace python -c "import os; os.listdir('/home/heimes')"

    openat(AT_FDCWD, "/home/heimes", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
    getdents(3, /* 381 entries */, 32768) = 12880
    getdents(3, /* 0 entries */, 32768) = 0
    close(3)

    On Linux you can also use /proc/self/fd instead of /proc/YOURPID/fd.

    Other operating systems have different APIs to get a list of open FDs. AFAK /dev/fd is static on FreeBSD and Mac OS X:

    FreeBSD:
    http://www.manualpages.de/FreeBSD/FreeBSD-7.4-RELEASE/man3/kinfo_getfile.3.html

    Darwin / Mac OS X:
    proc_pidinfo()

    @gpshead
    Copy link
    Member

    gpshead commented Jul 9, 2013

    _posixsubprocess already uses the Linux getdent64 syscall when available (though for different reasons: readdir is not safe in that context). http://hg.python.org/cpython/file/3f3cbfd52f94/Modules/_posixsubprocess.c#l227

    Probing for procfs at configure time could be problematic. It is a virtual filesystem. It is entirely possible for a system to choose not to mount it. It might be reasonable to assume that it "might be present" only if the system had it mounted at compile time but a configure flag to override that might be desirable for some systems (not the Linux systems I usually deal with).

    If we're going through all of these hoops for closerange: I'd love to see an API exposed in the os module to return a list of open fd's. It is an abstraction nobody should have to write for themselves.

    @vstinner
    Copy link
    Member

    vstinner commented Jul 9, 2013

    FreeBSD and other OSes provide closefrom(). Why not exposing this function which is probably implemented as a single syscall?

    @gpshead gpshead removed their assignment Nov 23, 2013
    @vstinner
    Copy link
    Member

    vstinner commented Sep 9, 2019

    See also bpo-38061: "FreeBSD: Optimize subprocess.Popen(close_fds=True) using closefrom()".

    @gpshead gpshead added the 3.8 (EOL) end of life label Sep 9, 2019
    @wmanley
    Copy link
    Mannequin

    wmanley mannequin commented Jul 1, 2021

    Linux has a close_range syscall since v5.9 (Oct 2020): https://man7.org/linux/man-pages/man2/close_range.2.html

    @gpshead
    Copy link
    Member

    gpshead commented Jul 5, 2021

    Code needed in a modern patch:

    1. Use the system call if compile time configure detected it may be available. (if we just use syscall() rather than a libc wrapper, a configure check may not be necessary, but various #ifdefs likely are)
    2. If (1) produces an error saying it isn't available at runtime (ie: running on a kernel that doesn't support it, regardless of where we were built), fall back to another approach (3)
    3. if iterating over /proc/$pid/fd works at runtime, use that; else (4)
    4. the existing brute force code.

    @gpshead gpshead added 3.11 only security fixes and removed 3.8 (EOL) end of life labels Jul 5, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @izbyshev
    Copy link
    Contributor

    izbyshev commented May 4, 2022

    Is there any reason to keep this open after #84603?

    The only possible further optimization item seems to be procfs-based iteration over fds, but I'm not sure it makes sense in the long term given that close_range() syscall is available both in Linux and FreeBSD.

    @vstinner
    Copy link
    Member

    vstinner commented May 4, 2022

    I close the issue.

    Python now has _Py_closerange() function with 4 implementations:

    • close_range(): Linux
    • closefrom(): FreeBSD
    • fdwalk(): macOS
    • for loop calling close(): Other platforms

    _Py_closerange() is now used by:

    • os.closerange() obviously
    • the _posixsubprocess module: _close_fds_by_brute_force()

    @vstinner vstinner closed this as completed May 4, 2022
    @izbyshev
    Copy link
    Contributor

    izbyshev commented May 4, 2022

    fdwalk(): macOS

    It actually seems to be a Solaris thing. I can't find it on macOS (nor FreeBSD and OpenBSD). See also #82291.

    @vstinner
    Copy link
    Member

    vstinner commented May 5, 2022

    It actually seems to be a Solaris thing.

    Oh ok :-)

    @polarathene
    Copy link

    polarathene commented Aug 18, 2022

    The only possible further optimization item seems to be procfs-based iteration over fds, but I'm not sure it makes sense in the long term given that close_range() syscall is available both in Linux and FreeBSD.

    I see that gh-92301 landed in Python 3.11.0-rc1 (PR) and that the current official Python image is shipping builds of it, yet I cannot reproduce your results:

    # Debian images - Python 3.5, 3.9, 3.10, 3.11-rc1
    $ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.11-rc-bullseye python /tmp/test.py 10000 100
    3.084831694024615
    
    $ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.10 python /tmp/test.py 10000 100
    3.1409343010745943
    
    $ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.9 python /tmp/test.py 10000 100
    3.100054975016974
    
    $ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.5 python /tmp/test.py 10000 100
    3.143577027018182
    
    # Alpine images - Python 3.5, 3.9, 3.10, 3.11-rc1
    $ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.11-rc-alpine python /tmp/test.py 10000 100
    3.0429400720167905
    
    $ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.10-alpine python /tmp/test.py 10000 100
    3.116578195942566
    
    $ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.9-alpine python /tmp/test.py 10000 100
    3.105302636977285
    
    $ docker run --rm -it --volume '/tmp/python_test.py:/tmp/test.py' python:3.5-alpine python /tmp/test.py 10000 100
    3.145560749922879

    /tmp/python_test.py:

    import os, subprocess, sys, timeit
    from resource import *
    
    soft, hard = getrlimit(RLIMIT_NOFILE)
    setrlimit(RLIMIT_NOFILE, (hard, hard))
    
    num_fds, num_iter = map(int, sys.argv[1:3])
    
    for i in range(num_fds):
        os.open('/dev/null', os.O_RDONLY)
    
    print(timeit.timeit(lambda: subprocess.run('/bin/true'), number=num_iter))

    Run from a VM guest (Fedora 36) using Docker 20.10.17 and Kernel 5.18.9. Similar results on a Manjaro host, same docker version, kernel 5.17.15.


    os.closerange() 8 minutes to complete with only 4 FDs open due to excessive range (Docker)

    I recently ran into a unexpected performance problem with os.closerange() due to Fedora docker.service configuring LimitNOFILE=infinity which had the Docker container ulimit -Sn and ulimit -Hn report 1073741816 (Fedora host reports standard systemd 1024 soft + 524288 hard) and a popular Python based project using that range with os.closerange(), it took almost 8 minutes to complete while stressing the CPU.

    The fix was to prefer iterating through /proc/self/fd instead since there was only 4 FD to actually close. The whole command finishes in less than 300ms now instead of 8 minutes.


    Reproduction

    I can reproduce the 8 minute delay just by calling this simple script in that environment with limits over 1 billion by default (Manjaro is more conservative with docker.service usage of LimitNOFILE= being approx 1 million (1048576)):

    /tmp/py_close_test.py:

    import os;
    os.closerange(3, os.sysconf("SC_OPEN_MAX"))
    # Run on Fedora 36 VM guest:
    # This one will use the defaults that varies by host (often configured in `docker.service` for systemd distros),
    # in this case the soft/hard FD limit is roughly a billion (1 073 741 816):
    $ docker run --rm -it --volume '/tmp/py_close_test.py:/tmp/test.py' python:3.11-rc-alpine ash -c 'time python /tmp/test.py'
    
    real    7m 57.70s
    user    5m 16.45s
    sys     2m 31.62s
    
    # Debian Bullseye - Python 3.11-rc1:
    $ docker run --rm -it --volume '/tmp/py_close_test.py:/tmp/test.py' python:3.11-rc-bullseye bash -c 'time python /tmp/test.py'
    
    real    7m47.726s
    user    5m8.328s
    sys     2m30.508s
    
    # Python 3.10
    $ docker run --rm -it --volume '/tmp/py_close_test.py:/tmp/test.py' python:3.10-alpine ash -c 'time python /tmp/test.py'
    
    real    7m 59.36s
    user    5m 17.04s
    sys     2m 32.63
    
    
    # Testing 100 million-ish (45 seconds):
    $ docker run --rm -it --ulimit 'nofile=100200300' --volume '/tmp/py_close_test.py:/tmp/test.py' python:3.11-rc-alpine ash -c 'time python /tmp/test.py
    
    real    0m 45.09s
    user    0m 29.95s
    sys     0m 14.03s
    
    # With Python 3.5:
    $ docker run --rm -it --ulimit 'nofile=100200300' --volume '/tmp/py_close_test.py:/tmp/test.py' python:3.5-alpine ash -c 'time python /tmp/test.py'
    
    real    0m 44.17s
    user    0m 29.30s
    sys     0m 13.95s
    
    
    # Approx 1 million FD soft/hard limit (a common limit found in Docker containers that run as root)
    # Debian Bullseye - Python 3.11-rc1
    $ docker run --rm -it --ulimit 'nofile=1048576' --volume '/tmp/py_close_test.py:/tmp/test.py' python:3.11-rc-bullseye bash -c 'time python /tmp/test.py'
    
    real    0m0.547s
    user    0m0.326s
    sys     0m0.152s
    
    # Alpine 3.16 - Python 3.11-rc1
    $ docker run --rm -it --ulimit 'nofile=1048576' --volume '/tmp/py_close_test.py:/tmp/test.py' python:3.11-rc-alpine ash -c 'time python /tmp/test.py'
    
    real    0m 0.50s
    user    0m 0.32s
    sys     0m 0.15s
    
    # With Python 3.5:
    $ docker run --rm -it --ulimit 'nofile=1048576' --volume '/tmp/py_close_test.py:/tmp/test.py' python:3.5-alpine ash -c 'time python /tmp/test.py'
    
    real    0m 0.54s
    user    0m 0.32s
    sys     0m 0.17s

    Based on the above I assume neither close_range() or closefrom() syscalls are being used? The performance doesn't seem to change between Python 3.5 to Python 3.11-rc1 by any noteworthy amount in those tests.

    When reading through the source, it looked like there was support for iterating through /proc/self/fd (procfs?) to some extent, and if that were the case it'd work much faster... so I guess I'm hitting some inefficient path?

    I know that if I iterate through the range in a for loop with close() it was taking approx 70-80 minutes (about 10x as long), so I was curious and investigated what os.closerange() was doing. Would this suggest that close_range() is being used, just slow due to the large range?

    I have also run into this problem with Docker running other (non-python) projects that daemonize themselves by closing all FDs across that max range. RedHat resolved this problem with rpm a while back via iterating through /proc/self/fd (2018, before close_range() was available). It might be worth considering doing similar with os.closerange(), otherwise to support running certain projects in Docker the maintainers may need to DIY a workaround like fail2ban did. --ulimit is sometimes a viable workaround for the Docker container like shown above, not always viable apparently.

    @izbyshev
    Copy link
    Contributor

    I see that gh-92301 landed in Python 3.11.0-rc1 (PR) and that the current official Python image is shipping builds of it, yet I cannot reproduce your results:

    close_range() wrapper was added in glibc 2.34. Debian Bullseye uses 2.31, so CPython doesn't use close_range() there.

    The same applies to os.closerange().

    Note that it's important not to confuse usage of close_range() in subprocess and in os.closerange(). The former can use /proc/self/fd as a fallback, while the latter can't (and never could). So if your CPython was built with glibc < 2.34, on Linux os.closerange() uses a historical implementation that simply loops over all fds and calls close().

    If we want to use close_range() on older glibcs, we'd need to call the syscall directly.

    @vstinner
    Copy link
    Member

    This issue is closed. If you consider that Python can be enhanced, please open a new issue.

    @polarathene
    Copy link

    close_range() wrapper was added in glibc 2.34. Debian Bullseye uses 2.31, so CPython doesn't use close_range() there.

    Fedora 36 has 2.35, but I assume while the Fedora 36 kernel is used by the container, for glibc dependency that depends on the Docker container right?

    Alpine using musl had no glibc package installed, and I assume the gcompat package it advises won't help if the images existing Python build was built without it on the system?

    Note that it's important not to confuse usage of close_range() in subprocess and in os.closerange(). The former can use /proc/self/fd as a fallback, while the latter can't (and never could).

    I think I understood how the support and priority works for both, I shared my attempt to understand what went on under the hood here. I'm not sure why os.closerange() could not be made to support /proc/self/fd, but if that's the case thanks for clarifying that and I'll remember to suggest the DIY iteration when I run into projects with this problem 👍

    If we want to use close_range() on older glibcs, we'd need to call the syscall directly.

    I must have misunderstood the PR, as I was under the impression that it was decided / approved to use the syscall directly?

    @izbyshev
    Copy link
    Contributor

    izbyshev commented Aug 18, 2022

    Fedora 36 has 2.35, but I assume while the Fedora 36 kernel is used by the container, for glibc dependency that depends on the Docker container right?

    That's right.

    Alpine using musl had no glibc package installed, and I assume the gcompat package it advises won't help if the images existing Python build was built without it on the system?

    musl libc doesn't have close_range() wrapper yet, though the patch has been posted recently. As for running CPython linked with glibc on Alpine, no, gcompat won't help, since it too doesn't have close_range() .

    I'm not sure why os.closerange() could not be made to support /proc/self/fd

    It could be made to, but that requires additional work because subprocess and os.closerange() have different requirements for underlying fd-closing backends. The former needs them to be async-signal- and (on Linux) vfork-safe and also to support efficient exclusion of some fds for pass_fds feature. The latter has more relaxed requirements. This is further complicated by both subprocess and os.closerange() using several backends (depending on platform and/or for fallbacks).

    If you think adding /proc/self/fd backend for os.closerange() is important, please open a new issue as Victor suggested, and be sure to explain why it's become important now given that os.closerange() had only a trivial for-loop implementation for ages (before close_range() support was added in this issue).

    I must have misunderstood the PR, as I was under the impression that it was decided / approved to use the syscall directly?

    That comment talks about calling close_range() wrapper directly, as opposed to using (expanded) _Py_closerange() (because of problems with sharing code written for different requirements).

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes extension-modules C modules in the Modules dir performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants