Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[linux] closefrom_shim: Add optimized fallback for platforms without closefrom or close_range #316

Merged
merged 1 commit into from
Jun 2, 2024

Conversation

MarcT512
Copy link
Contributor

Add an optimized fallback in closefrom_shim() for Linux, if the platform doesn't support closefrom or close_range.
Iterate /proc/self/fd and close anything which is not that directory itself.

This can save significant time if "ulimit -n" (max open files) is very high. For example, in some Docker containers the limit can be 2^30 (~ 1 billion), and lsof may take 10-30 minutes to start up.

Add an optimized fallback in closefrom_shim() for Linux, if the platform doesn't support closefrom or close_range.
Iterate /proc/self/fd and close anything which is not that directory itself.
@jiegec
Copy link
Contributor

jiegec commented May 16, 2024

Interesting optimization, I hope to see some performance numbers, if possible. But I think it only applies on linux distributions with older Linux kernel versions, like 2.x?

@MarcT512
Copy link
Contributor Author

MarcT512 commented May 24, 2024

JFYI: The issue of high "ulimit -n" inside containers is described well here: https://github.com/moby/moby/issues/38814

I think it only applies on linux distributions with older Linux kernel versions, like 2.x?

Our issue was discovered inside a container based on SuSE SLES 15 SP5, running in Docker 24.x on RHEL 9.2/9.3. These are both current distributions, however I'll concede Docker on RHEL 9 is not well supported by either vendor.

The RHEL9 system has kernel 5.14.0-284.18.1.el9_2.x86_64 and glibc-2.34-60.el9.x86_64.
The SLES15 container has glibc-2.31-150300.63.1.x86_64. The kernel is provided by the container host (ie RHEL).
It seems close_range() was introduced in glibc 2.34, so SuSE SLES15 does not have support.
Also, RLEL9 has lsof-4.94.0-3.el9.x86_64, which does not have any close() optimizations, so is also slow if "ulimit -n" is large.

Interesting optimization, I hope to see some performance numbers, if possible.

Sure - see below.
The hilight is "lsof" takes > 4 mins if "ulimit -n" = 1 billion without the patch, and ~5 seconds with the patch.

Without patch, inside SLES15 SP5 container on Docker 24 on RHEL9.3.
With default "ulimit -n" = 1 billion (2^30):

# ulimit -n
1073741816
# touch /tmp/testfile
# time lsof -X /tmp/testfile
real    4m14.746s
user    1m31.868s
sys     2m39.300s

# time strace -c lsof -X /tmp/testfile
[gave up after waiting 30 mins...]

On SLES15 SP5, with "ulimit -n" = 1 million (2^20):

sles15-1:~ # ulimit -n
1048576
sles15-1:~ # touch /tmp/testfile
sles15-1:~ # time lsof -X /tmp/testfile

real    0m0.415s
user    0m0.134s
sys     0m0.269s


# time strace -c lsof -X /tmp/testfile
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.59    6.180516           5   1049440   1048573 close
  1.59    0.100966      100966         1           wait4
  0.20    0.012606           7      1698           read
  0.14    0.009059           5      1730         1 stat
  0.10    0.006468          11       540           write
  0.10    0.006147           7       844        98 readlink
  0.10    0.006039           6       870         6 openat
  0.04    0.002634           5       455           lstat
  0.04    0.002396           8       272           alarm
[snip]
------ ----------- ----------- --------- --------- ----------------
100.00    6.332883           5   1057032   1048681 total

real    0m32.068s
user    0m3.289s
sys     0m25.699s

On SLES15 SP5, with "ulimit -n" = 1024 (2^10):

# ulimit -n
1024
# touch /tmp/testfile
# time lsof -X /tmp/testfile

real    0m0.071s
user    0m0.016s
sys     0m0.036s

# time strace -c lsof -X /tmp/testfile
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 21.10    0.013442           8      1678           read
 18.31    0.011666           6      1700         1 stat
 13.39    0.008529           4      1858      1021 close
 11.52    0.007338           8       840         6 openat
 11.03    0.007025           8       814        88 readlink
  7.05    0.004493           8       540           write
  5.19    0.003309           7       455           lstat
  4.56    0.002906           4       585           fstat
  2.77    0.001766           7       242           getdents64
  2.46    0.001567           5       272           alarm
[snip]
------ ----------- ----------- --------- --------- ----------------
100.00    0.063701           6      9309      1119 total

real    0m0.341s
user    0m0.031s
sys     0m0.260s

WITH patch, inside SLES15 SP5 container on Docker 24 on RHEL9.3.
With default "ulimit -n" = 1 billion (2^30):

# ulimit -n
1073741816

# time ./lsof -X /tmp/testfile

real    0m0.379s
user    0m0.063s
sys     0m0.297s

# time strace -c ./lsof -X /tmp/testfile
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 19.96    0.208152           6     32779         8 stat
 19.34    0.201649           4     40598           read
 16.46    0.171675           8     20839         6 openat
 15.09    0.157330           7     20287       304 readlink
  9.33    0.097316           4     20836           close
  9.20    0.095907           5     18731           lstat
  7.52    0.078376           4     19259           newfstatat
  2.22    0.023129          22      1042           getdents64
  0.33    0.003477           9       380           write
  0.33    0.003413        3413         1           wait4
[snip]
------ ----------- ----------- --------- --------- ----------------
100.00    1.042756           5    175199       321 total

real    0m4.467s
user    0m0.320s
sys     0m3.658s

As an aside, is it best to open a new "issue" for this discussion?

@jiegec jiegec merged commit fe15efa into lsof-org:master Jun 2, 2024
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants