100% CPU usage without further logs or ports opened #122

agross · 2022-04-02T20:29:20Z

Describe the bug

I have a server where postsrsd runs as part of docker-mailserver. On this instance, the main postsrsd takes 100% of the CPU cycles and logs nothing, even when being started manually on the command line (without -D). None of the ports (10001, 10002) are opened, too.

Relevant log output

Nothing.

System Configuration

OS: docker
Mailer daemon: Postfix
Version: 1.10

The text was updated successfully, but these errors were encountered:

polarathene · 2022-08-15T05:45:19Z

TL;DR: (I've collapsed the original content to focus on where the problem is)

This part of the main() method takes approx 10 minutes to iterate through a billion close() calls, and is most likely to be encountered via Docker containers running as the root user.
The most convenient workaround for users in the meantime is probably using --ulimit option on a container.
Suggested fix for postsrsd is to iterate through /proc/self/fd instead.

Earlier information / investigation

Additional information:

The docker-mailserver project runs a Debian Bullseye (11) base image docker container with the postsrsd package installed (v1.10), and I reproduced this in a VM guest running Fedora (as the referenced issue mentions).

Ignoring any init scripts from the installed package and running the binary directly with /usr/sbin/postsrsd -d example.test reproduces the failure.

If I do not provide the command the -d option, it will fail. This is expected to be the same for -s being absent too (according to -h output), but it's stalling. I don't see any reason for it to believe it had been provided a secret otherwise?

I assume that means that the command does not reach this point?:

postsrsd/postsrsd.c

Lines 518 to 533 in 6e701fa

    
           /* Read secret. The default installation makes this root accessible only. */ 
        
           if (secret_file != NULL) 
        
           { 
        
               sf = fopen(secret_file, "rb"); 
        
               if (sf == NULL) 
        
               { 
        
                   fprintf(stderr, "%s: Cannot open file with secret: %s\n", self, 
        
                           secret_file); 
        
                   return EXIT_FAILURE; 
        
               } 
        
           } 
        
           else 
        
           { 
        
               fprintf(stderr, "%s: You must set a secret (-s)\n", self); 
        
               return EXIT_FAILURE; 
        
           }

but the command does reach the earlier -d check:

postsrsd/postsrsd.c

Lines 483 to 487 in 6e701fa

    
           if (domain == NULL || *domain == 0) 
        
           { 
        
               fprintf(stderr, "%s: You must set a home domain (-d)\n", self); 
        
               return EXIT_FAILURE; 
        
           }

That would mean something in-between is likely where the command is stalling? (Confirmed: after the long delay, the -s check fails the command):

postsrsd/postsrsd.c

Lines 489 to 517 in 6e701fa

    
               if (separator != '=' && separator != '+' && separator != '-') 
        
               { 
        
                   fprintf(stderr, "%s: SRS separator character must be one of '=+-'\n", 
        
                           self); 
        
                   return EXIT_FAILURE; 
        
               } 
        
               if (forward_service == NULL) 
        
                   forward_service = strdup("10001"); 
        
               if (reverse_service == NULL) 
        
                   reverse_service = strdup("10002"); 
        
               /* Close all file descriptors (std ones will be closed later). */ 
        
               maxfd = sysconf(_SC_OPEN_MAX); 
        
               for (fd = 3; fd < maxfd; fd++) 
        
                   close(fd); 
        
               /* The stuff we do first may not be possible from within chroot or without 
        
                * privileges */ 
        
               /* Open pid file for writing (the actual process ID is filled in later) */ 
        
               if (pid_file) 
        
               { 
        
                   pf = fopen(pid_file, "w"); 
        
                   if (pf == NULL) 
        
                   { 
        
                       fprintf(stderr, "%s: Cannot write PID: %s\n\n", self, pid_file); 
        
                       return EXIT_FAILURE; 
        
                   } 
        
               }

The most likely culprit then would perhaps be:

postsrsd/postsrsd.c

Lines 500 to 506 in 6e701fa

    
               /* Close all file descriptors (std ones will be closed later). */ 
        
               maxfd = sysconf(_SC_OPEN_MAX); 
        
               for (fd = 3; fd < maxfd; fd++) 
        
                   close(fd); 
        
               /* The stuff we do first may not be possible from within chroot or without 
        
                * privileges */

# Docker container (Debian 11 Bullseye base image)
$ getconf -a | grep OPEN_MAX

OPEN_MAX                           1073741816
_POSIX_OPEN_MAX                    1073741816


# VM guest Fedora 36 (Docker host)
$ getconf -a | grep OPEN_MAX

OPEN_MAX                           1024
_POSIX_OPEN_MAX                    1024

# NOTE: `ulimit -n` and `sysctl fs.nr_open` also outputs the same value

So the for loop is doing close() 1 billion times?

Confirmation of issue and resolution - various workarounds

UPDATE: Yes, this seems to be the problem. Others have experienced this issue before with Docker, noting that it sets this massively larger value for the container, but only for the root user (fairly common).

I have confirmed that su docker -c '<command here>' (we have an unprivileged user named docker) works (both for outputting OPEN_MAX=1024 and of course postsrsd being effectively instant).

UPDATE 2 (alternative workarounds):
You can still kind of run a command as root with reduced FD limit with ulimit -n 2048 && <command here> (limit set to 2048), but it fails to allow you to use ulimit like that again in the container (even as root) with "bash: ulimit: open files: cannot modify limit: Operation not permitted". ulimit -n and getconf -a commands will both show the reduced limit, while sysctl fs.nr_open remains unchanged. (doesn't seem like good advice, but was someones solution for using with Dockerfile builds)

Docker containers can use a --ulimit option for per container limits docker run --rm -it --ulimit 'nofile=1024' alpine ash -c 'ulimit -n'. Works well as a workaround in the meantime.

There's also a Docker daemon config approach when viable, that would enforce that limit across all containers. That's the official upstream Docker issue AFAIK regarding the problems with software hitting these perf issues in docker containers.

Suggested Fix

Original suggestion

The issue I referenced of another users experience with the problem also mentioned a fix that sounds reasonable?

I am not familiar with the reason of the logic in your code, but that users similar code made this change (with a slightly more helpful comment about the purpose):

    //close all file descriptors before exit, otherwise they can segfault
    for (int i = 3; i < sysconf(_SC_OPEN_MAX); i++) {
      if(i != failure[w]){
        int err = close(i);
        if(i > 200 && err < 0)
          break;
      }
    }

They iterate the first 200 FD (I'm familiar with FD 200 being common with flock() examples), and then continue until close() has an error. I assume that means an FD of 202 with no 201 (should error with close()?) would mean that FD 202 would not get closed.

You probably know better how problematic that is. If that's not a viable solution, perhaps adding to the README (and maybe -h output) that Docker containers running as the root user will have this problem (and link to this issue for more details). Additionally consider an option that allows setting the maxfd limit (although as a user, I don't know what scenarios with postsrsd would lead to the failure it's trying to prevent).

Other alternatives I saw:

Iterating through /proc/self/fd was found to be faster than anything with a limit higher than 60 and should ensure you have access to all relevant fd? (Related RedHat bug report for the rpm project that chose this solution)
Apparently there's a close_range() method you could use instead of many separate close() calls.

roehling · 2022-08-15T06:31:40Z

Thank you for that excellent investigation. The loop you found has been added by #65; to be honest, I always found this a bit iffy, but I failed to realize that the file descriptor limit can be this insanely high.

The file descriptors are assigned by the kernel in a somewhat ascending order, so it's unlikely to hit a FD greater than 200 unless 200 files have been opened by whatever process spawns PostSRSd.

And while I was writing this, I saw you added close_range(). I did not know about that function yet, but it seems to be the best alternative. The manual page even has close_range(3, ~0U, ...) as a use-case.

polarathene · 2022-08-15T07:10:46Z

The file descriptors are assigned by the kernel in a somewhat ascending order, so it's unlikely to hit a FD greater than 200 unless 200 files have been opened by whatever process spawns PostSRSd.

I was of the understanding that you could specify an arbitrary FD number for example:

(
flock -s 200

# ... commands executed under lock ...

) 200 < /tmp/config-file

Is that not FD 200? I am not that knowledgeable in this area, so I could be misunderstanding.

And while I was writing this, I saw you added close_range().

Done with my editing 😅

Whatever makes most sense to you is fine by me 👍

I was just confused why a test we run in our CI was working fine but was having issues with postsrsd when I was running tests on our container locally. I assume Github configures the Docker daemon to have more sane limits.

Documented here for the benefit of others who stumble upon it :)

I always found this a bit iffy, but I failed to realize that the file descriptor limit can be this insanely high.

From what I've read, Docker / containerd needs this to do it's thing across many containers, but the containers themselves don't. I was surprised at the staggering difference myself 😄

roehling · 2022-08-15T07:18:44Z

close_range() seems to be relatively new (I have it on my Debian unstable, but not my Ubuntu 20.04), but it is so nice that I decided to use it anyway and add some fallback code for older systems.

polarathene · 2022-08-15T07:24:12Z

Awesome thanks for the quick fix! ❤️

roehling · 2022-08-15T07:29:40Z

I was of the understanding that you could specify an arbitrary FD number for example:
(
flock -s 200

# ... commands executed under lock ...

) 200 < /tmp/config-file
Is that not FD 200? I am not that knowledgeable in this area, so I could be misunderstanding.

Sure, you can do that in the shell, in regular programs with open() calls, file descriptors typically won't be assigned randomly.
Besides, it's not like any file descriptors have specific semantics besides the first 3 (stdin, stdout, stderr); I suspect the idea with 200 was to go high so you you don't conflict with existing open files, which ended up as cargo cult.

Also, the general rule is, you open it, you close it, so I'm just being nice with closing all the inherited FDs, and it got me a bug into the code as a reward...

agross added the bug Confirmed bug label Apr 2, 2022

polarathene mentioned this issue Aug 15, 2022

[BUG] ENABLE_SRS=1 causing high CPU usage with postsrsd docker-mailserver/docker-mailserver#2722

Closed

4 tasks

This comment was marked as outdated.

Sign in to view

roehling closed this as completed in 7ea9019 Aug 15, 2022

polarathene mentioned this issue Aug 16, 2022

[BR]: fail2ban-client start fails with Docker environments due to excessively large FD limits to iterate and close fail2ban/fail2ban#3334

Closed

3 tasks

polarathene mentioned this issue Mar 9, 2023

Revert commit that changed LimitNOFILE to infinity to avoid regressions containerd/containerd#7566

Closed

polarathene mentioned this issue Sep 18, 2023

rsyslogd fails with error message 'rsyslog startup failure, child did not respond within startup timeout' rsyslog/rsyslog#5158

Closed

polarathene mentioned this issue Jan 20, 2024

Set LimitNOFILE=1024:524288 for crio.service cri-o/cri-o#7703

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

100% CPU usage without further logs or ports opened #122

100% CPU usage without further logs or ports opened #122

agross commented Apr 2, 2022

This comment was marked as outdated.

polarathene commented Aug 15, 2022 •

edited

Loading

roehling commented Aug 15, 2022

polarathene commented Aug 15, 2022 •

edited

Loading

roehling commented Aug 15, 2022

polarathene commented Aug 15, 2022

roehling commented Aug 15, 2022 •

edited

Loading

100% CPU usage without further logs or ports opened #122

100% CPU usage without further logs or ports opened #122

Comments

agross commented Apr 2, 2022

This comment was marked as outdated.

polarathene commented Aug 15, 2022 • edited Loading

Suggested Fix

roehling commented Aug 15, 2022

polarathene commented Aug 15, 2022 • edited Loading

roehling commented Aug 15, 2022

polarathene commented Aug 15, 2022

roehling commented Aug 15, 2022 • edited Loading

polarathene commented Aug 15, 2022 •

edited

Loading

polarathene commented Aug 15, 2022 •

edited

Loading

roehling commented Aug 15, 2022 •

edited

Loading