Optimize and unite setting CLOEXEC on fds #444

kolyshkin · 2018-05-17T18:57:13Z

In case maximum number of open files limit is set too high, both luaext/Pexec() and lib/doScriptExec() spend way too much time trying to set FD_CLOEXEC for all those file descriptors that might be open, resulting in severe increase of time it takes to execute rpm or dnf. For example, this happens while running under Docker because:

$ docker run fedora ulimit -n
1048576

One obvious fix is to use procfs to get the actual list of opened file descriptors and iterate over those.

My quick-n-dirty benchmark shows the /proc approach is about 10x faster than iterating through a list of 1024 fds.

Note that the old method is still used in case /proc is not available.

While at it, unite the two implementations, and future(1) proof it by making sure we modify the existing flags.

(1) - currently the only known flag is FD_CLOEXEC.

This should fix:

cgwalters · 2018-05-17T19:19:26Z

glib code here: https://gitlab.gnome.org/GNOME/glib/blob/487b1fd20c5e494366a82ddc0fa6b53b8bd779ad/glib/gspawn.c#L1207

Also worth looking at what systemd does.

cgwalters · 2018-05-17T19:23:46Z

At a quick glance anyways, your code looks fine to me.

Saur2000 · 2018-05-17T20:26:54Z

luaext/lposix.c

+	for (fd = min_fd; fd < open_max; fd++)
+		set_cloexec(fd);
+}
+
 static int Pexec(lua_State *L)			/** exec(path,[args]) */


You need to do the same change for doScriptExec() in lib/rpmscript.c as well. At least in our case, that is where all the time is spent.

Yes, ideally those two places need to be unified

Saur2000 · 2018-05-17T20:27:47Z

luaext/lposix.c

@@ -330,26 +330,64 @@ static int Pmkfifo(lua_State *L)		/** mkfifo(path) */
 }


+static void set_cloexec(int fd)
+{
+	int flag = fcntl(fd, F_GETFD);


Change "flag" to "flags". Even if FD_CLOEXEC is currently the only defined flag, we may as well make the code forward compatible,

Saur2000 · 2018-05-17T20:28:29Z

luaext/lposix.c

+	if (flag == -1 || (flag & FD_CLOEXEC))
+		return;
+
+	fcntl(fd, F_SETFD, FD_CLOEXEC);


Change "FD_CLOEXEC" to "flags | FD_CLOEXEC".

This I just copied from the old code (and the only known flag for now is CLOEXEC).

Anyway, will do.

Saur2000 · 2018-05-17T20:36:10Z

luaext/lposix.c

+	open_max = sysconf(_SC_OPEN_MAX);
+	if (open_max == -1) {
+	    open_max = default_open_max;
+	    goto fallback;


I see no reason to goto fallback here.

Saur2000 · 2018-05-17T20:42:14Z

luaext/lposix.c

+	    goto fallback;
+	}
+
+	if (open_max <= default_open_max) {


I doubt you need to be this conservative. I don't have any measurements to support it, but I'd be very surprised if it is not a win to remove this if statement and always use the code that only traverses the open file descriptors.

I wanted to be conservative for the sole reason to minimize the change to current behavior. What you say makes sense though, let me update the patch.

kolyshkin · 2018-05-17T21:21:44Z

The CI failure on rawhide is unrelated, it's a dnf error complaining about repo metadata:

raise ValueError("The supplied metadata version isn't supported")

kolyshkin · 2018-05-17T21:38:02Z

I did some more benchmarks comparing the new /proc way with the old one. It seems that the old one is faster when open_max < 60 which should almost never be the case. Based on that and the suggestion from @Saur2000 I'm dropping the conditional switch to old method completely.

kolyshkin · 2018-05-17T21:41:02Z

For the reference, here's my quick-n-dirty benchmarking code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <dirent.h>



static void set_cloexec(int fd)
{
        int flags = fcntl(fd, F_GETFD);

        if (flags == -1 || (flags & FD_CLOEXEC))
                return;

        fcntl(fd, F_SETFD, flags | FD_CLOEXEC);
}

static void set_cloexec_all(int open_max, char method)
{
        const int min_fd = 3; // don't touch stdin/out/err
        int fd;

        if (method == 'i')
		goto fallback;

	// iterate over fds obtained from /proc
        DIR *dir = opendir("/proc/self/fd");
        if (dir == NULL) {
                goto fallback;
        }

        struct dirent *entry;
        while ((entry = readdir(dir)) != NULL) {
                fd = atoi(entry->d_name);
                if (fd >= min_fd)
                        set_cloexec(fd);
        }
        closedir(dir);

        return;

fallback:
        // iterate over all possible fds
        for (fd = min_fd; fd < open_max; fd++)
                set_cloexec(fd);
}

int main(int argc, char **argv) {
	if (argc < 4) {
		fprintf(stderr, "Usage: %s <open_max> <iterations> <method>\n", argv[0]);
		fprintf(stderr, "  <method> is either i (iterate) or p (use proc)\n");
		return 1;
	}

	int open_max = atoi(argv[1]);
	int iter = atoi(argv[2]);
	char m = argv[3][0];
	
	while (iter--)
		set_cloexec_all(open_max, m);

	return 0;
}

kolyshkin · 2018-05-17T21:41:19Z

kir@kd:~/git/rpm$ time ./a 64 1000000 p

real	0m3.413s
user	0m0.416s
sys	0m2.994s
kir@kd:~/git/rpm$ time ./a 64 1000000 i

real	0m3.661s
user	0m1.421s
sys	0m2.233s

kolyshkin · 2018-05-17T23:10:01Z

OK, patch updated to take all the review comments into account (thanks @Saur2000 for your suggestions) so it's now also fixing the doScriptExec() from lib/rpmscript.c.

kolyshkin · 2018-05-17T23:54:17Z

@cgwalters I took a look at glib implementation, they make use of fdwalk() if available (it's a function unique to SunOS/Solaris AFAICS), and reimplement it using /proc/self/fd if not (which is the case for Linux). This seems to be an unnecessary complication (unless we care much about Solaris).

One other thing is, they call getrlimit(RLIMIT_NOFILE, &rl) and use rl.rlim_max which seems to be a mistake -- rlim_cur should be used. But since this code is only used when /proc is not available it's probably nothing.

kolyshkin · 2018-05-18T00:24:16Z

One other thing is, they call getrlimit(RLIMIT_NOFILE, &rl) and use rl.rlim_max which seems to be a mistake -- rlim_cur should be used.

Filed https://bugzilla.gnome.org/show_bug.cgi?id=796227

kolyshkin · 2018-05-18T00:55:38Z

Filed https://bugzilla.gnome.org/show_bug.cgi?id=796227

This one is bad, as ulimit might have been changed during the runtime of a process, and so using rlim_max is more correct and safe. For the same reason I have added the second commit aa6cc04 to this PR.

Saur2000

Looks good to me. Thanks for the work.

kolyshkin · 2018-05-21T16:58:50Z

@pmatilai @ffesti PTAL

pmatilai · 2018-05-28T11:03:41Z

Back from vacation...

This (using /proc/self/fd when available) was what I had in mind, so we're on the right track here, thanks for the work so far. Some quick remarks == requests:

rpm comment style is "/* foo */", "// foo" should not be used
just slap the new cloexec function to end of rpmio/rpmio.c (it's quite well at home there, rpmio.c has all/most headers already and this really doesn't need a separate file) and put the prototype to rpmio_internal.h, this is not an interface we want to export
the jump to fallback label seems klunky and unnecessary, just handle the respective cases in the if-else directly

ffesti · 2018-05-28T11:06:27Z

The patch set would be even nicer if moving the code into its own function would be separated from any changes to the code.

Commit 7a7c31f ("Set FD_CLOEXEC on opened files before exec from lua script is called") copied the code that sets CLOEXEC flag on all possible file descriptors from lib/rpmscript.c to luaext/lposix.c, essentially creating two copies of the same code (modulo comments and the unused assignment). This commit moves the functionality into its own function, without any code modifications, using the version from luaext/lposix.c. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

In case maximum number of open files limit is set too high, both luaext/Pexec() and lib/doScriptExec() spend way too much time trying to set FD_CLOEXEC flag for all those file descriptors, resulting in severe increase of time it takes to execute say rpm or dnf. This becomes increasingly noticeable when running with e.g. under Docker, the reason being: > $ docker run fedora ulimit -n > 1048576 One obvious fix is to use procfs to get the actual list of opened fds and iterate over it. My quick-n-dirty benchmark shows the /proc approach is about 10x faster than iterating through a list of just 1024 fds, so it's an improvement even for default ulimit values. Note that the old method is still used in case /proc is not available. While at it, 1. fix the function by making sure we modify (rather than set) the existing flags. As the only known flag is FD_CLOEXEC, this change is currently purely aesthetical, but in case other flags will appear it will become a real bug fix. 2. get rid of magic number 3; use STDERR_FILENO Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

In case /proc is not available to get the actual list of opened fds, we fall back to iterating through the list of all possible fds. It is possible that during the course of the program execution the limit on number of open file descriptors might be lowered, so using the current limit, as returned by sysconf(_SC_OPEN_MAX), might omit some fds. Therefore, it is better to use rlim_max from the structure filled in by gertlimit(RLIMIT_NOFILE) to make sure we're checking all fds. This slows down the function, but only in the case /proc is not available, which should be rare in practice. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2018-05-30T01:12:08Z

@ffesti @pmatilai Thank you for your input; please see the updated (and rebased) patch set.

rpm comment style is "/* foo */", "// foo" should not be used

I actually thought about it so I did git grep '// ' which revealed quite a few uses and so I went ahead.
Anyway, done.

and put the prototype to rpmio_internal.h

I'm assuming you are OK with luaext/lposix.c to have #include "rpmio/rpmio_internal.h" added.

The patch set would be even nicer if moving the code into its own function would be separated from any changes to the code.

done

Saur2000

I cannot find a way to leave review comments for the commit messages using GitHub, but you should change "say rpm or dnf" to "rpm or dnf" (or "e.g. rpm or dnf") in the "Optimize rpmSetCloseOnExec" commit. You should also change "when running with e.g. under Docker" to "when e.g. running under Docker" in the same commit.

Saur2000 · 2018-05-30T07:32:16Z

configure.ac

@@ -959,7 +959,7 @@ AC_ARG_WITH([lua], [AS_HELP_STRING([--with-lua], [build with lua support])],

 AS_IF([test "$with_lua" != no],[
  PKG_CHECK_MODULES([LUA],
-    [lua >= 5.1],
+    [lua53 >= 5.1],


Was this intentional? It seems unrelated.

Saur2000 · 2018-05-30T07:34:49Z

rpmio/rpmio.c

+
+	closedir(dir);
+
+	return;


ffesti · 2018-05-30T14:02:38Z

Ok, pushed with some minor tweaks like unchanging configure.ac and adding a missing include line.

If the maximum number of open file descriptors is much greater than the usual 1024 (for example inside a Docker container), the performance drops significantly. This was reported upstream in: https://bugzilla.redhat.com/show_bug.cgi?id=1537564 which resulted in: rpm-software-management/rpm#444 The pull request above has now been integrated and this commit contains a backport of its three patches, which together change the behavior of rpm so that its performance is now independent of the maximum number of open file descriptors. Signed-off-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>

If the maximum number of open file descriptors is much greater than the usual 1024 (for example inside a Docker container), the performance drops significantly. This was reported upstream in: https://bugzilla.redhat.com/show_bug.cgi?id=1537564 which resulted in: rpm-software-management/rpm#444 The pull request above has now been integrated and this commit contains a backport of its three patches, which together change the behavior of rpm so that its performance is now independent of the maximum number of open file descriptors. Signed-off-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com> Signed-off-by: Armin Kuster <akuster808@gmail.com>

Source: poky MR: 00000 Type: Integration Disposition: Merged from poky ChangeID: 4b6ff20 Description: If the maximum number of open file descriptors is much greater than the usual 1024 (for example inside a Docker container), the performance drops significantly. This was reported upstream in: https://bugzilla.redhat.com/show_bug.cgi?id=1537564 which resulted in: rpm-software-management/rpm#444 The pull request above has now been integrated and this commit contains a backport of its three patches, which together change the behavior of rpm so that its performance is now independent of the maximum number of open file descriptors. (From OE-Core rev: 6ecb10e3952af4a77bc79160ecd81117e97d022a) Signed-off-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com> Signed-off-by: Armin Kuster <akuster808@gmail.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> Signed-off-by: Jeremy Puhlman <jpuhlman@mvista.com>

In case maximum number of open files limit is set too high, both luaext/Pexec() and lib/doScriptExec() spend way too much time trying to set FD_CLOEXEC flag for all those file descriptors, resulting in severe increase of time it takes to execute say rpm or dnf. This becomes increasingly noticeable when running with e.g. under Docker, the reason being: > $ docker run fedora ulimit -n > 1048576 One obvious fix is to use procfs to get the actual list of opened fds and iterate over it. My quick-n-dirty benchmark shows the /proc approach is about 10x faster than iterating through a list of just 1024 fds, so it's an improvement even for default ulimit values. Note that the old method is still used in case /proc is not available. While at it, 1. fix the function by making sure we modify (rather than set) the existing flags. As the only known flag is FD_CLOEXEC, this change is currently purely aesthetical, but in case other flags will appear it will become a real bug fix. 2. get rid of magic number 3; use STDERR_FILENO Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Fixes #444 (cherry picked from commit 5e6f05c)

Force ulimit nofile 1024 when building the base rpmbuild el6/el7 images and when running the container to build the RPM. Yum might take for ages to complete because it CLOEXEC on all available file descriptors and recent dockers sets a very high default limit. References: https://stackoverflow.com/questions/74345206/centos-7-docker-yum-installation-gets-stuck https://bugzilla.redhat.com/show_bug.cgi?id=1537564 rpm-software-management/rpm#444

rpm on centos 7 calls fcntl on every FD up to the max in order to set CLOEXEC, and the maximum number of open FDs in docker on our runners was 2**30 - 8 == 1073741816. (See rpm-software-management/rpm#444). ulimit can't be configured in a Dockerfile, and there doesn't seem to be a way to pass argument to `docker build` if you are using the default github action, so just call ulimit before the big yum install. There might be a way to configure it on our infra runner side too, I don't know.

Force ulimit nofile 1024 when building the base rpmbuild el6/el7 images and when running the container to build the RPM. Yum might take for ages to complete because it CLOEXEC on all available file descriptors and recent dockers sets a very high default limit. References: https://stackoverflow.com/questions/74345206/centos-7-docker-yum-installation-gets-stuck https://bugzilla.redhat.com/show_bug.cgi?id=1537564 rpm-software-management/rpm#444

If the maximum number of open file descriptors is much greater than the usual 1024 (for example inside a Docker container), the performance drops significantly. This was reported upstream in: https://bugzilla.redhat.com/show_bug.cgi?id=1537564 which resulted in: rpm-software-management/rpm#444 The pull request above has now been integrated and this commit contains a backport of its three patches, which together change the behavior of rpm so that its performance is now independent of the maximum number of open file descriptors. (From OE-Core rev: 7feed9ccfc4e656c6264f07e13d7e9ef69bdfb06) Signed-off-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>

kolyshkin mentioned this pull request May 17, 2018

Slow performance when installing RPMs to create new Docker images moby/moby#23137

Closed

Saur2000 suggested changes May 17, 2018

View reviewed changes

kolyshkin force-pushed the cloexec branch 2 times, most recently from e99d3b9 to fee7297 Compare May 17, 2018 23:04

kolyshkin changed the title ~~luaext/Pexec: optimize setting CLOEXEC~~ Optimize and unite setting CLOEXEC on fds May 17, 2018

kolyshkin force-pushed the cloexec branch from fee7297 to aa6cc04 Compare May 18, 2018 00:54

Saur2000 approved these changes May 18, 2018

View reviewed changes

pmatilai mentioned this pull request May 28, 2018

lua: add rpm.execute() #390

Closed

kolyshkin added 3 commits May 29, 2018 17:38

kolyshkin force-pushed the cloexec branch from aa6cc04 to fbf5ff7 Compare May 30, 2018 01:12

Saur2000 suggested changes May 30, 2018

View reviewed changes

ffesti closed this in 5e6f05c May 30, 2018

kolyshkin mentioned this pull request Jan 10, 2019

Spawning PTY processes is many times slower on Docker 18.09 docker/for-linux#502

Closed

3 tasks

kolyshkin mentioned this pull request Mar 1, 2019

Default per-container ulimits are too generous moby/moby#38814

Open

This was referenced Aug 15, 2022

100% CPU usage without further logs or ports opened roehling/postsrsd#122

Closed

os.closerange optimization python/cpython#57997

Closed

polarathene mentioned this pull request Mar 9, 2023

Revert commit that changed LimitNOFILE to infinity to avoid regressions containerd/containerd#7566

Closed

msullivan mentioned this pull request Dec 22, 2023

Set ulimit -n to 1024 on centos targets edgedb/edgedb-pkg#70

Merged

polarathene mentioned this pull request Jan 20, 2024

Set LimitNOFILE=1024:524288 for crio.service cri-o/cri-o#7703

Open

This was referenced Jul 24, 2024

libvncserver: fix hanging on connection if fd limit is too high. LibVNC/libvncserver#626

Open

Incredibly high RLIMIT_NOFILE results in minutes of initial connection delay LibVNC/libvncserver#600

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize and unite setting CLOEXEC on fds #444

Optimize and unite setting CLOEXEC on fds #444

kolyshkin commented May 17, 2018 •

edited

Loading

cgwalters commented May 17, 2018

cgwalters commented May 17, 2018

Saur2000 May 17, 2018

kolyshkin May 17, 2018

Saur2000 May 17, 2018

Saur2000 May 17, 2018

kolyshkin May 17, 2018

Saur2000 May 17, 2018

Saur2000 May 17, 2018

kolyshkin May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 18, 2018

kolyshkin commented May 18, 2018

Saur2000 left a comment

kolyshkin commented May 21, 2018

pmatilai commented May 28, 2018

ffesti commented May 28, 2018

kolyshkin commented May 30, 2018 •

edited

Loading

Saur2000 left a comment •

edited

Loading

Saur2000 May 30, 2018

Saur2000 May 30, 2018

ffesti commented May 30, 2018

Optimize and unite setting CLOEXEC on fds #444

Optimize and unite setting CLOEXEC on fds #444

Conversation

kolyshkin commented May 17, 2018 • edited Loading

cgwalters commented May 17, 2018

cgwalters commented May 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolyshkin commented May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 17, 2018

kolyshkin commented May 18, 2018

kolyshkin commented May 18, 2018

Saur2000 left a comment

Choose a reason for hiding this comment

kolyshkin commented May 21, 2018

pmatilai commented May 28, 2018

ffesti commented May 28, 2018

kolyshkin commented May 30, 2018 • edited Loading

Saur2000 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ffesti commented May 30, 2018

kolyshkin commented May 17, 2018 •

edited

Loading

kolyshkin commented May 30, 2018 •

edited

Loading

Saur2000 left a comment •

edited

Loading