-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syndic memory usage #50539
Comments
I noticed this in your config: Also did you recently start seeing this occur after an upgrade or a config change? |
I set worker_threads to 192 to prevent massive timeouts occuring when many minions are sending job results back to the masters. |
thanks for answering the questions. ping @DmitryKuzmenko do you see anything in the setup that might be concerning? or know of any issues with zmq 4.2.3? also @amuhametov does the memory usage slowly climb? or is it at 30GB at startup of the syndic process? |
would like to get comments from @DmitryKuzmenko as well as he has been in this area of the code many times. @DmitryKuzmenko do you see anything in the setup that might be concerning? or know of any issues with zmq 4.2.3? |
30G for syndic with 192 worker threads that are actually processes by default is about 150MB per process. It's at least twice more than I usually see in my dev environment but... 2K minions!
@amuhametov if you have a chance to get an additional data for us it will make sense to provide a |
@DmitryKuzmenko there is about 10k minions in total, actually. Master of masters handles 9 syndics. 8 of them serve about 6k minions. The last one serves ~4k minions together with MoM (random master minion option). Restarted all syndics with setproctitle. Will keep you up to date. P.S. updated zeromq version to 4.2.5. |
|
Btw, a month ago I tried to use zmq_filering (latest patches from develop were applied). |
@amuhametov is this full return of |
@DmitryKuzmenko nope, this is top sorted by RES. Here is full output
|
I see. Thank you. My comment about oom killer was wrong. |
|
Today I met this also, ony one syndic exception and others is ok , but dont know show to resolve |
Hm. Interesting. It looks syndic loses connection to master and for some reason doesn't restore it back. Probably the network is overloaded. Let me take a look at the code. |
@amuhametov sorry for delay on this issue. diff --git a/salt/minion.py b/salt/minion.py
index 7d1801c3bd..0faf673e6d 100644
--- a/salt/minion.py
+++ b/salt/minion.py
@@ -2731,7 +2731,7 @@ class SyndicManager(MinionBase):
func = '_return_pub_multi'
for master, syndic_future in self.iter_master_options(master_id):
if not syndic_future.done() or syndic_future.exception():
- log.error('Unable to call {0} on {1}, that syndic is not connected'.format(func, master))
+ log.error('Unable to call {0} on {1}, that syndic is not connected: {2}'.format(func, master, syndic_future))
continue
future, data = self.pub_futures.get(master, (None, None)) and provide the same log snippet as above. |
Unable to call _return_pub_multi on masterOfMasters, that syndic is not
connected: <tornado.concurrent.Future object at 0x2e1f910>
…On Mon, Jul 8, 2019 at 4:12 PM Dmitry Kuzmenko ***@***.***> wrote:
@amuhametov <https://github.com/amuhametov> sorry for delay on this issue.
@amuhametov <https://github.com/amuhametov> @wangwenchao
<https://github.com/WangWenchao> I see a possible problem in handling
errors of syndic connection to the upstream master. Is it possible for you
to try to collect an additional information that will help to better
understand your particular issue? What I need is to see the future status
when that unable to call problem is happening. To do this it's needed to
patch the syndic's code with this:
diff --git a/salt/minion.py b/salt/minion.py
index 7d1801c3bd..0faf673e6d 100644--- a/salt/minion.py+++ b/salt/minion.py@@ -2731,7 +2731,7 @@ class SyndicManager(MinionBase):
func = '_return_pub_multi'
for master, syndic_future in self.iter_master_options(master_id):
if not syndic_future.done() or syndic_future.exception():- log.error('Unable to call {0} on {1}, that syndic is not connected'.format(func, master))+ log.error('Unable to call {0} on {1}, that syndic is not connected: {2}'.format(func, master, syndic_future))
continue
future, data = self.pub_futures.get(master, (None, None))
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#50539?email_source=notifications&email_token=AAE7LFNC66BRWFJBQEBWALDP6M4NHA5CNFSM4GEZF42KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZNBDHI#issuecomment-509219229>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAE7LFLDQZNYLA2HOLQI6B3P6M4NHANCNFSM4GEZF42A>
.
--
Best regards
Anes Mukhametov
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue. |
ping |
Description of Issue/Question
salt-syndic process consumes more than 30GB memory with about 2000 minions connected and 500 minions reporting to port 4506:
Setup
(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).)
master config:
minions config:
Steps to Reproduce Issue
(Include debug logs if possible and relevant.)
Versions Report
(Provided by running
salt --versions-report
. Please also mention any differences in master/minion versions.)pyzmq and zmq versions are identical on masters and minions
The text was updated successfully, but these errors were encountered: