Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] zmq.error.ZMQError: Too many open files #61837

Closed
ichilton opened this issue Mar 23, 2022 · 11 comments
Closed

[BUG] zmq.error.ZMQError: Too many open files #61837

ichilton opened this issue Mar 23, 2022 · 11 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior
Milestone

Comments

@ichilton
Copy link

Description

Running salt minion on OpenBSD. It works fine for a bit, then stops and requires a restart.

Shows this in the logs:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
    ret = callback()
  File "/usr/local/lib/python3.8/site-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
    future.result()
  File "/usr/local/lib/python3.8/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/local/lib/python3.8/site-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/local/lib/python3.8/site-packages/salt/transport/zeromq.py", line 1334, in _internal_send_recv
    self._init_socket()  # re-init the zmq socket (no other way in zmq)
  File "/usr/local/lib/python3.8/site-packages/salt/transport/zeromq.py", line 1291, in _init_socket
    self.socket = self.context.socket(zmq.REQ)
  File "/usr/local/lib/python3.8/site-packages/zmq/sugar/context.py", line 146, in socket
    s = self._socket_class(self, socket_type, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/zmq/sugar/socket.py", line 59, in __init__
    super(Socket, self).__init__(*a, **kw)
  File "zmq/backend/cython/socket.pyx", line 328, in zmq.backend.cython.socket.Socket.__init__
zmq.error.ZMQError: Too many open files

Setup
(Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)

Please be as specific as possible and give set-up details.

  • [ X] on-prem machine

Steps to Reproduce the behavior

Just generally using salt minion and it'll stop working.

Expected behavior

It to be stable :)

Versions Report

$ salt --versions-report
Salt Version:
          Salt: 3003.3

Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.11.3
       libgit2: Not Installed
      M2Crypto: 0.38.0
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 3.10.1
  pycryptodome: Not Installed
        pygit2: Not Installed
        Python: 3.8.12 (default, Sep 26 2021, 13:12:50)
  python-gnupg: Not Installed
        PyYAML: 5.3.1
         PyZMQ: 18.0.1
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: openbsd 7.0
        locale: utf-8
       machine: amd64
       release: 7.0
        system: OpenBSD
       version: OpenBSD 7.0

Additional context
Add any other context about the problem here.

@ichilton ichilton added Bug broken, incorrect, or confusing behavior needs-triage labels Mar 23, 2022
@frebib
Copy link
Contributor

frebib commented Mar 23, 2022

This looks the same as #61521
Check to see if the number of open fds increases over time. I've not been able to pin-point the cause of this. Our Salt codebase is large enough that a single highstate with salt-call triggers this before it completes. Are you doing anything in particular with the minion to cause the crash/leak?

@ichilton
Copy link
Author

Not doing anything exciting to cause a problem. I've got over a hundred Linux hosts, some doing more complicated stuff and have not seen this problem, it's just the two boxes i've got on OpenBSD, and they are just doing pretty simple stuff - pretty much writing a bunch of files.

@waynew
Copy link
Contributor

waynew commented Mar 31, 2022

What's your max files? I'm expecting something small - there's a little more information https://docs.saltproject.io/en/latest/ref/configuration/master.html#master-large-scale-tuning-settings

@waynew waynew added this to the Blocked milestone Mar 31, 2022
@ichilton
Copy link
Author

@waynew Hi! - that appears to be a master only setting. I can't find a max_open_files on the minion.

@waynew
Copy link
Contributor

waynew commented Apr 4, 2022

What about the OS level?

@ichilton
Copy link
Author

ichilton commented Apr 4, 2022

$  sysctl kern.maxfiles
kern.maxfiles=7030

Is that high or low? - what would you recommend?

@waynew
Copy link
Contributor

waynew commented Apr 5, 2022

That seems reasonable. If you bump that up, like say, double it - do you still see the same issue? (it sounds like it might be a bug that's not releasing the filehandles, though)

What is the kern.maxfilesperproc? https://unix.stackexchange.com/a/104945

@ichilton
Copy link
Author

ichilton commented Apr 5, 2022

Will try and report back!

There doesn't seem to be a kern.maxfilesperproc in OpenBSD.

$ sysctl kern|grep files
kern.maxfiles=10240
kern.nfiles=263

Ian

@ggiesen
Copy link
Contributor

ggiesen commented Jul 21, 2023

I'm hitting this with 3006.1 on a proxy minion host on AlmaLinux 8. Everything works fine at first, but over time I'm unable to perform a state.apply with the error:

zmq.error.ZMQError: Too many open files
cat /proc/sys/fs/file-nr
67616   0       4903824
cat /proc/sys/fs/file-max
4903824

@dmurphy18 dmurphy18 assigned dmurphy18 and unassigned waynew Nov 29, 2023
@dmurphy18
Copy link
Contributor

dmurphy18 commented Nov 29, 2023

@ichilton The 'Too many open files' issue was addressed in the Salt 3006.3 release, can you try the latest and if that resolves the issue, please consider closing this issue.

Here are the PR's for that attempted fix for majority of the 'too many open files', see #65508, #65061, #65247 and #65559 which will help to identify places that may have been missed and what is causing them.

@dmurphy18
Copy link
Contributor

@ichilton A number of changes have been made to deal with "too many open files" issues in earlier version of 3006.x. You should retry with Salt 3006.8 or similar.

Closing this since fixed in current versions of Salt, and also due to Salt Project no longer officially supporting BSD systems, loss of @krionbsd , and additional personal after the buyout, don't have the bandwidth currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior
Projects
None yet
Development

No branches or pull requests

5 participants