Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syndic memory usage #50539

Closed
amuhametov opened this issue Nov 16, 2018 · 19 comments
Closed

syndic memory usage #50539

amuhametov opened this issue Nov 16, 2018 · 19 comments
Labels
info-needed waiting for more info stale
Milestone

Comments

@amuhametov
Copy link

Description of Issue/Question

salt-syndic process consumes more than 30GB memory with about 2000 minions connected and 500 minions reporting to port 4506:

root     44276  5.2 28.5 42381760 37626656 ?   Ssl  Oct30 1302:18 /usr/bin/python /usr/bin/salt-syndic

Setup

(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).)
master config:

zmq_backlog : 4096
event_publisher_pub_hwm : 64000
worker_threads : 192
pub_hwm : 4096
sock_pool_size : 24
keep_jobs : 4
salt_event_pub_hwm : 128000
master_id : master1
syndic_master : masterofmasters

minions config:

recon_default : 1000
random_master : True
random_reauth_delay : 60
return_retry_timer_max : 30
auth_safemode : True
random_startup_delay : 60
master : master1,master2,master3
master_alive_interval : 120
verify_env : False
recon_randomize : True
ipv6 : False
mine_enabled : True
mine_return_job : False
acceptance_wait_time_max : 120
recon_max : 900000
ping_interval : 110
zmq_filtering : False
master_type : failover
retry_dns : 0
master_tries : 10
return_retry_timer : 10
master_shuffle : True

Steps to Reproduce Issue

(Include debug logs if possible and relevant.)

Versions Report

(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)

Salt Version:
           Salt: 2017.7.8
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: 3.2.2
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: Not Installed
         Python: 2.7.5 (default, Nov  6 2016, 00:28:07)
   python-gnupg: Not Installed
         PyYAML: 3.10
          PyZMQ: 16.0.4
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.2.3
 
System Versions:
           dist: centos 7.5.1804 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-514.26.2.el7.x86_64
         system: Linux
        version: CentOS Linux 7.5.1804 Core

pyzmq and zmq versions are identical on masters and minions

@Ch3LL
Copy link
Contributor

Ch3LL commented Nov 16, 2018

I noticed this in your config: worker_threads : 192 that seems like a lot of worker_threads for the amount of minions that you have. Is there a reason for this high number?

Also did you recently start seeing this occur after an upgrade or a config change?

@Ch3LL Ch3LL added the info-needed waiting for more info label Nov 16, 2018
@Ch3LL Ch3LL added this to the Blocked milestone Nov 16, 2018
@amuhametov
Copy link
Author

I set worker_threads to 192 to prevent massive timeouts occuring when many minions are sending job results back to the masters.
I think the problem started with 2017.7.7 or after updating zmq to 4.2.3, not sure.

@Ch3LL
Copy link
Contributor

Ch3LL commented Nov 19, 2018

thanks for answering the questions. ping @DmitryKuzmenko do you see anything in the setup that might be concerning? or know of any issues with zmq 4.2.3?

also @amuhametov does the memory usage slowly climb? or is it at 30GB at startup of the syndic process?

@amuhametov
Copy link
Author

As I can see it grows slowly, but at some point is seems to grow much faster.

screenshot 2018-11-19 at 20 03 05

@Ch3LL
Copy link
Contributor

Ch3LL commented Nov 20, 2018

would like to get comments from @DmitryKuzmenko as well as he has been in this area of the code many times. @DmitryKuzmenko do you see anything in the setup that might be concerning? or know of any issues with zmq 4.2.3?

@DmitryKuzmenko
Copy link
Contributor

DmitryKuzmenko commented Nov 20, 2018

30G for syndic with 192 worker threads that are actually processes by default is about 150MB per process. It's at least twice more than I usually see in my dev environment but... 2K minions!
Anyway we see the memory usage growing on that chart. So...
Here I see 2 issues:

  1. Slow memory leakage. At this moment I'm working on a memory leak issue salt-master process leaks memory when running in a container #50313. Let's see if it related.
  2. Syndic master slowdown handling 2000 minions with 500 returning results at once. This is a good point to review the Syndic Master send/recv operations for sync/async behavior.
    Both are needed to be analyzed and resolved.

@amuhametov if you have a chance to get an additional data for us it will make sense to provide a ps aux | grep salt on running Syndic with Python setproctitle module installed. Just to understand what subprocess eats what part of that 30GB memory.
Thank you.

@amuhametov
Copy link
Author

amuhametov commented Nov 20, 2018

@DmitryKuzmenko there is about 10k minions in total, actually. Master of masters handles 9 syndics. 8 of them serve about 6k minions. The last one serves ~4k minions together with MoM (random master minion option).

Restarted all syndics with setproctitle. Will keep you up to date.

P.S. updated zeromq version to 4.2.5.

@amuhametov
Copy link
Author

salt     46684  8.7  0.0 780028 106392 ?       Sl   01:00  69:55 /usr/bin/python /usr/bin/salt-master MWorker-77
salt     46580  8.9  0.0 782624 106896 ?       Sl   01:00  71:18 /usr/bin/python /usr/bin/salt-master MWorker-29
salt     46337  0.0  0.1 759176 200044 ?       Sl   01:00   0:19 /usr/bin/python /usr/bin/salt-master ZeroMQPubServerChannel
salt     46697  8.9  0.1 912112 236540 ?       Sl   01:00  71:31 /usr/bin/python /usr/bin/salt-master MWorker-79
salt     48612  8.6  0.1 912620 238284 ?       Sl   01:00  68:48 /usr/bin/python /usr/bin/salt-master MWorker-167
salt     46698  8.6  0.1 912816 238352 ?       Sl   01:00  69:16 /usr/bin/python /usr/bin/salt-master MWorker-80
salt     47118  8.8  0.1 765860 240036 ?       Rl   01:00  70:42 /usr/bin/python /usr/bin/salt-master MWorker-112
salt     48786  9.0  0.1 924488 250096 ?       Sl   01:00  72:02 /usr/bin/python /usr/bin/salt-master MWorker-173
salt     46826  9.0  0.1 929948 255200 ?       Sl   01:00  71:44 /usr/bin/python /usr/bin/salt-master MWorker-94
salt     46595  9.0  0.2 1015268 341000 ?      Sl   01:00  72:22 /usr/bin/python /usr/bin/salt-master MWorker-42
salt     46600  9.1  0.2 1031336 355400 ?      Sl   01:00  72:58 /usr/bin/python /usr/bin/salt-master MWorker-47
salt     46741  9.0  0.2 1059844 385672 ?      Sl   01:00  72:19 /usr/bin/python /usr/bin/salt-master MWorker-85
salt     46598  9.1  0.3 939500 411480 ?       Sl   01:00  72:44 /usr/bin/python /usr/bin/salt-master MWorker-45
salt     46857  9.0  0.3 1087068 411744 ?      Sl   01:00  72:22 /usr/bin/python /usr/bin/salt-master MWorker-97
salt     46652  9.0  0.3 1088652 413468 ?      Sl   01:00  71:48 /usr/bin/python /usr/bin/salt-master MWorker-70
salt     47869  8.9  0.3 943332 416260 ?       Dl   01:00  71:16 /usr/bin/python /usr/bin/salt-master MWorker-143
salt     46340  0.3  0.3 1285356 433048 ?      S    01:00   2:48 /usr/bin/python /usr/bin/salt-master EventPublisher
salt     46349 22.2  0.4 14666200 618032 ?     Sl   01:00 177:33 /usr/bin/python /usr/bin/salt-master MWorkerQueue
root     46130  7.5  6.1 13811304 8087300 ?    Ssl  01:00  59:49 /usr/bin/python /usr/bin/salt-syndic MinionProcessManager

@amuhametov
Copy link
Author

Btw, a month ago I tried to use zmq_filering (latest patches from develop were applied).
It does not work well - lot of job results did not get to the master in time.
At the moment zmq_filtering disabled and master/minions rolled back to original version.

@DmitryKuzmenko
Copy link
Contributor

@amuhametov is this full return of ps aux | grep salt? It looks a number of workers was killed. Probably by OOMKiller or something else. This could be checked in some way like dmesg | grep -i "killed process". The salt process manager tries to restart that process logging a message that could be found by regex like Process .* died with exit status. Could you please check this?
Anyway MinionProcessManager shall not eat that much memory it does. There could be an issue in it's logic.

@amuhametov
Copy link
Author

@DmitryKuzmenko nope, this is top sorted by RES.

Here is full output

root      4307  0.0  0.0 439132 21404 ?        S    Oct02   0:00 /usr/bin/python /usr/bin/salt-minion
root     46130  7.5  6.1 13450680 8087636 ?    Ssl  01:00  69:18 /usr/bin/python /usr/bin/salt-syndic MinionProcessManager
root     46165  0.0  0.0 367576 21576 ?        S    01:00   0:00 /usr/bin/python /usr/bin/salt-syndic MultiprocessingLoggingQueue
salt     46207  0.0  0.0 479624 38504 ?        Ss   01:00   0:02 /usr/bin/python /usr/bin/salt-master ProcessManager
salt     46309  0.0  0.0 361884 21700 ?        S    01:00   0:00 /usr/bin/python /usr/bin/salt-master MultiprocessingLoggingQueue
root     46336  0.0  0.0 287776 19676 ?        Ss   01:00   0:00 /usr/bin/python /usr/bin/salt-minion
salt     46337  0.0  0.1 759464 202520 ?       Sl   01:00   0:22 /usr/bin/python /usr/bin/salt-master ZeroMQPubServerChannel
salt     46340  0.3  0.3 1304984 455776 ?      S    01:00   3:12 /usr/bin/python /usr/bin/salt-master EventPublisher
salt     46347  0.0  0.0 480368 32784 ?        S    01:00   0:10 /usr/bin/python /usr/bin/salt-master ReqServer_ProcessManager
salt     46349 22.2  0.4 14666200 629180 ?     Sl   01:00 202:57 /usr/bin/python /usr/bin/salt-master MWorkerQueue
salt     46368  9.2  0.0 745264 68396 ?        Sl   01:00  84:57 /usr/bin/python /usr/bin/salt-master MWorker-0
salt     46430  9.0  0.0 596292 67100 ?        Sl   01:00  82:43 /usr/bin/python /usr/bin/salt-master MWorker-1
salt     46495  8.9  0.0 749064 73544 ?        Sl   01:00  81:51 /usr/bin/python /usr/bin/salt-master MWorker-2
salt     46549  8.7  0.0 597192 69080 ?        Sl   01:00  80:01 /usr/bin/python /usr/bin/salt-master MWorker-3
salt     46550  8.5  0.0 746068 69688 ?        Sl   01:00  78:27 /usr/bin/python /usr/bin/salt-master MWorker-4
salt     46552  9.2  0.0 747660 72772 ?        Sl   01:00  84:58 /usr/bin/python /usr/bin/salt-master MWorker-5
salt     46553  9.1  0.0 745744 70368 ?        Rl   01:00  83:36 /usr/bin/python /usr/bin/salt-master MWorker-6
salt     46555  9.2  0.0 779052 103440 ?       Sl   01:00  84:47 /usr/bin/python /usr/bin/salt-master MWorker-7
salt     46557  8.9  0.0 779808 103232 ?       Sl   01:00  81:32 /usr/bin/python /usr/bin/salt-master MWorker-8
salt     46560  9.1  0.0 747072 70556 ?        Sl   01:00  83:17 /usr/bin/python /usr/bin/salt-master MWorker-9
salt     46561  8.8  0.0 743620 67304 ?        Sl   01:00  80:43 /usr/bin/python /usr/bin/salt-master MWorker-10
salt     46562  8.8  0.0 746260 70532 ?        Rl   01:00  80:48 /usr/bin/python /usr/bin/salt-master MWorker-11
salt     46563  8.8  0.0 747484 72016 ?        Sl   01:00  81:03 /usr/bin/python /usr/bin/salt-master MWorker-12
salt     46564  8.9  0.0 744836 68432 ?        Sl   01:00  82:10 /usr/bin/python /usr/bin/salt-master MWorker-13
salt     46565  8.8  0.0 747664 72260 ?        Sl   01:00  80:52 /usr/bin/python /usr/bin/salt-master MWorker-14
salt     46566  9.0  0.0 596456 68464 ?        Sl   01:00  82:58 /usr/bin/python /usr/bin/salt-master MWorker-15
salt     46567  9.1  0.0 595224 67228 ?        Sl   01:00  83:52 /usr/bin/python /usr/bin/salt-master MWorker-16
salt     46568  8.9  0.0 745188 68860 ?        Sl   01:00  81:32 /usr/bin/python /usr/bin/salt-master MWorker-17
salt     46569  9.0  0.0 744992 70024 ?        Sl   01:00  82:36 /usr/bin/python /usr/bin/salt-master MWorker-18
salt     46570  9.0  0.0 746492 71404 ?        Sl   01:00  82:17 /usr/bin/python /usr/bin/salt-master MWorker-19
salt     46571  9.2  0.0 747324 71816 ?        Rl   01:00  84:37 /usr/bin/python /usr/bin/salt-master MWorker-20
salt     46572  9.2  0.0 782288 105640 ?       Sl   01:00  84:48 /usr/bin/python /usr/bin/salt-master MWorker-21
salt     46573  9.2  0.0 596760 69732 ?        Rl   01:00  84:09 /usr/bin/python /usr/bin/salt-master MWorker-22
salt     46574  8.8  0.0 597488 69312 ?        Sl   01:00  80:27 /usr/bin/python /usr/bin/salt-master MWorker-23
salt     46575  9.1  0.0 598100 70172 ?        Rl   01:00  83:22 /usr/bin/python /usr/bin/salt-master MWorker-24
salt     46576  9.4  0.0 778400 102744 ?       Sl   01:00  86:04 /usr/bin/python /usr/bin/salt-master MWorker-25
salt     46577  9.0  0.0 746816 71228 ?        Sl   01:00  82:54 /usr/bin/python /usr/bin/salt-master MWorker-26
salt     46578  9.0  0.0 596660 68548 ?        Rl   01:00  82:54 /usr/bin/python /usr/bin/salt-master MWorker-27
salt     46579  9.0  0.0 595784 66716 ?        Sl   01:00  82:21 /usr/bin/python /usr/bin/salt-master MWorker-28
salt     46580  8.8  0.0 782624 106904 ?       Sl   01:00  80:54 /usr/bin/python /usr/bin/salt-master MWorker-29
salt     46581  9.1  0.0 744672 68012 ?        Sl   01:00  83:36 /usr/bin/python /usr/bin/salt-master MWorker-30
salt     46582  8.6  0.0 746744 70456 ?        Sl   01:00  79:17 /usr/bin/python /usr/bin/salt-master MWorker-31
salt     46583  8.8  0.0 745644 70196 ?        Sl   01:00  81:16 /usr/bin/python /usr/bin/salt-master MWorker-32
salt     46584  9.1  0.0 746124 69264 ?        Sl   01:00  83:56 /usr/bin/python /usr/bin/salt-master MWorker-33
salt     46585  9.2  0.0 746960 71400 ?        Rl   01:00  84:32 /usr/bin/python /usr/bin/salt-master MWorker-34
salt     46586  9.0  0.0 779212 104112 ?       Rl   01:00  82:23 /usr/bin/python /usr/bin/salt-master MWorker-35
salt     46587  9.1  0.0 748168 71552 ?        Sl   01:00  83:32 /usr/bin/python /usr/bin/salt-master MWorker-36
salt     46588  9.1  0.0 745976 70252 ?        Rl   01:00  83:14 /usr/bin/python /usr/bin/salt-master MWorker-37
salt     46589  9.3  0.0 744944 68272 ?        Rl   01:00  85:35 /usr/bin/python /usr/bin/salt-master MWorker-38
salt     46590  8.9  0.0 747196 71516 ?        Sl   01:00  81:38 /usr/bin/python /usr/bin/salt-master MWorker-39
salt     46591  9.2  0.0 746292 69740 ?        Rl   01:00  84:29 /usr/bin/python /usr/bin/salt-master MWorker-40
salt     46592  9.0  0.0 747060 71484 ?        Sl   01:00  82:35 /usr/bin/python /usr/bin/salt-master MWorker-41
root     46593  0.0  0.0 829488 59908 ?        Sl   01:00   0:45 /usr/bin/python /usr/bin/salt-minion KeepAlive MultiMinionProcessManager MinionProcessManager
salt     46595  8.9  0.2 1015268 341000 ?      Sl   01:00  82:08 /usr/bin/python /usr/bin/salt-master MWorker-42
salt     46596  9.2  0.0 746028 71728 ?        Rl   01:00  84:11 /usr/bin/python /usr/bin/salt-master MWorker-43
salt     46597  9.0  0.0 745764 70996 ?        Sl   01:00  83:02 /usr/bin/python /usr/bin/salt-master MWorker-44
salt     46598  9.0  0.3 939500 411480 ?       Sl   01:00  82:22 /usr/bin/python /usr/bin/salt-master MWorker-45
salt     46599  8.9  0.0 780364 103624 ?       Rl   01:00  81:49 /usr/bin/python /usr/bin/salt-master MWorker-46
salt     46600  9.1  0.2 1031336 355400 ?      Sl   01:00  83:54 /usr/bin/python /usr/bin/salt-master MWorker-47
salt     46601  8.9  0.0 597092 69936 ?        Sl   01:00  81:36 /usr/bin/python /usr/bin/salt-master MWorker-48
salt     46602  8.9  0.0 600280 72224 ?        Sl   01:00  82:11 /usr/bin/python /usr/bin/salt-master MWorker-49
salt     46607  9.1  0.0 748624 73604 ?        Sl   01:00  83:20 /usr/bin/python /usr/bin/salt-master MWorker-50
salt     46609  8.7  0.0 744684 69252 ?        Sl   01:00  80:09 /usr/bin/python /usr/bin/salt-master MWorker-51
salt     46612  8.8  0.0 597772 68708 ?        Sl   01:00  81:05 /usr/bin/python /usr/bin/salt-master MWorker-52
salt     46616  8.8  0.0 745264 68644 ?        Rl   01:00  81:03 /usr/bin/python /usr/bin/salt-master MWorker-53
salt     46618  8.7  0.0 744312 68560 ?        Rl   01:00  79:32 /usr/bin/python /usr/bin/salt-master MWorker-54
salt     46620  8.8  0.0 597740 68712 ?        Sl   01:00  80:36 /usr/bin/python /usr/bin/salt-master MWorker-55
salt     46621  9.0  0.0 598200 69576 ?        Sl   01:00  82:28 /usr/bin/python /usr/bin/salt-master MWorker-56
salt     46623  9.0  0.0 745140 69548 ?        Rl   01:00  82:45 /usr/bin/python /usr/bin/salt-master MWorker-57
salt     46625  8.9  0.3 1087012 411412 ?      Sl   01:00  81:36 /usr/bin/python /usr/bin/salt-master MWorker-58
salt     46626  8.9  0.0 744176 67552 ?        Sl   01:00  81:57 /usr/bin/python /usr/bin/salt-master MWorker-59
salt     46631  8.5  0.0 596740 67384 ?        Sl   01:00  77:53 /usr/bin/python /usr/bin/salt-master MWorker-60
salt     46634  9.0  0.0 599172 69832 ?        Sl   01:00  82:54 /usr/bin/python /usr/bin/salt-master MWorker-61
salt     46638  8.5  0.0 745608 70480 ?        Rl   01:00  78:00 /usr/bin/python /usr/bin/salt-master MWorker-62
salt     46641  8.8  0.0 595868 67660 ?        Sl   01:00  81:18 /usr/bin/python /usr/bin/salt-master MWorker-63
salt     46644  9.0  0.0 744760 69164 ?        Sl   01:00  82:41 /usr/bin/python /usr/bin/salt-master MWorker-64
salt     46645  8.9  0.0 746656 71324 ?        Rl   01:00  81:49 /usr/bin/python /usr/bin/salt-master MWorker-65
salt     46648  8.8  0.0 781116 104360 ?       Sl   01:00  80:43 /usr/bin/python /usr/bin/salt-master MWorker-66
salt     46649  8.9  0.0 745228 69624 ?        Sl   01:00  81:28 /usr/bin/python /usr/bin/salt-master MWorker-67
salt     46650  8.9  0.0 747708 73536 ?        Sl   01:00  81:22 /usr/bin/python /usr/bin/salt-master MWorker-68
salt     46651  9.1  0.0 745308 68992 ?        Sl   01:00  84:05 /usr/bin/python /usr/bin/salt-master MWorker-69
salt     46652  9.1  0.3 1088652 413468 ?      Sl   01:00  83:31 /usr/bin/python /usr/bin/salt-master MWorker-70
salt     46654  9.0  0.0 744580 67896 ?        Rl   01:00  82:28 /usr/bin/python /usr/bin/salt-master MWorker-71
salt     46655  8.9  0.0 780784 104240 ?       Sl   01:00  81:42 /usr/bin/python /usr/bin/salt-master MWorker-72
salt     46656  8.8  0.0 628204 101072 ?       Sl   01:00  80:44 /usr/bin/python /usr/bin/salt-master MWorker-73
salt     46660  9.0  0.0 778680 102832 ?       Sl   01:00  82:50 /usr/bin/python /usr/bin/salt-master MWorker-74
salt     46666  9.0  0.0 599408 70332 ?        Sl   01:00  82:52 /usr/bin/python /usr/bin/salt-master MWorker-75
salt     46670  8.9  0.0 598524 69540 ?        Rl   01:00  81:27 /usr/bin/python /usr/bin/salt-master MWorker-76
salt     46684  8.7  0.0 780028 106392 ?       Sl   01:00  80:04 /usr/bin/python /usr/bin/salt-master MWorker-77
salt     46693  8.7  0.0 597348 68856 ?        Sl   01:00  80:14 /usr/bin/python /usr/bin/salt-master MWorker-78
root     46695  0.0  0.0 441228 21456 ?        S    01:00   0:00 /usr/bin/python /usr/bin/salt-minion KeepAlive MultiprocessingLoggingQueue
salt     46697  8.8  0.1 912112 236540 ?       Rl   01:00  81:18 /usr/bin/python /usr/bin/salt-master MWorker-79
salt     46698  8.7  0.1 912816 238352 ?       Sl   01:00  79:48 /usr/bin/python /usr/bin/salt-master MWorker-80
salt     46701  8.7  0.0 747144 70544 ?        Sl   01:00  80:12 /usr/bin/python /usr/bin/salt-master MWorker-81
salt     46714  9.0  0.0 744924 69536 ?        Sl   01:00  82:34 /usr/bin/python /usr/bin/salt-master MWorker-82
salt     46723  8.9  0.0 751872 76380 ?        Rl   01:00  81:44 /usr/bin/python /usr/bin/salt-master MWorker-83
salt     46729  8.7  0.0 748184 73200 ?        Rl   01:00  80:19 /usr/bin/python /usr/bin/salt-master MWorker-84
salt     46741  9.0  0.2 1059844 385676 ?      Sl   01:00  82:40 /usr/bin/python /usr/bin/salt-master MWorker-85
salt     46743  8.8  0.0 746892 70828 ?        Sl   01:00  80:38 /usr/bin/python /usr/bin/salt-master MWorker-86
salt     46753  9.0  0.0 596624 68524 ?        Sl   01:00  83:02 /usr/bin/python /usr/bin/salt-master MWorker-87
salt     46762  8.9  0.0 746336 70264 ?        Sl   01:00  81:22 /usr/bin/python /usr/bin/salt-master MWorker-88
salt     46769  8.8  0.0 745264 69096 ?        Sl   01:00  80:38 /usr/bin/python /usr/bin/salt-master MWorker-89
salt     46779  9.0  0.0 745740 69152 ?        Rl   01:00  83:08 /usr/bin/python /usr/bin/salt-master MWorker-90
salt     46784  8.9  0.0 779916 105532 ?       Rl   01:00  81:44 /usr/bin/python /usr/bin/salt-master MWorker-91
salt     46802  9.0  0.0 745984 70528 ?        Sl   01:00  82:50 /usr/bin/python /usr/bin/salt-master MWorker-92
salt     46809  9.0  0.0 747596 72140 ?        Rl   01:00  82:38 /usr/bin/python /usr/bin/salt-master MWorker-93
salt     46826  8.9  0.1 929948 255200 ?       Sl   01:00  81:48 /usr/bin/python /usr/bin/salt-master MWorker-94
salt     46833  8.9  0.0 746284 69916 ?        Sl   01:00  81:56 /usr/bin/python /usr/bin/salt-master MWorker-95
salt     46847  8.8  0.0 745684 70144 ?        Rl   01:00  80:32 /usr/bin/python /usr/bin/salt-master MWorker-96
salt     46857  9.0  0.3 1087068 411744 ?      Sl   01:00  82:22 /usr/bin/python /usr/bin/salt-master MWorker-97
salt     46871  8.9  0.0 746324 70840 ?        Sl   01:00  82:02 /usr/bin/python /usr/bin/salt-master MWorker-98
salt     46877  8.8  0.0 745096 68612 ?        Rl   01:00  81:01 /usr/bin/python /usr/bin/salt-master MWorker-99
salt     46883  8.8  0.0 598272 70100 ?        Sl   01:00  80:44 /usr/bin/python /usr/bin/salt-master MWorker-100
salt     46923  8.7  0.0 596152 67288 ?        Sl   01:00  80:14 /usr/bin/python /usr/bin/salt-master MWorker-101
salt     46934  8.8  0.0 745852 70836 ?        Sl   01:00  80:56 /usr/bin/python /usr/bin/salt-master MWorker-102
salt     46952  8.9  0.0 746372 69884 ?        Sl   01:00  82:00 /usr/bin/python /usr/bin/salt-master MWorker-103
salt     46958  8.8  0.0 748376 72792 ?        Rl   01:00  80:31 /usr/bin/python /usr/bin/salt-master MWorker-104
salt     46979  8.8  0.0 594552 66356 ?        Sl   01:00  80:42 /usr/bin/python /usr/bin/salt-master MWorker-105
salt     47005  8.8  0.0 745324 69956 ?        Sl   01:00  81:16 /usr/bin/python /usr/bin/salt-master MWorker-106
salt     47031  8.6  0.0 771628 95876 ?        Sl   01:00  79:12 /usr/bin/python /usr/bin/salt-master MWorker-107
salt     47038  9.1  0.0 599048 70864 ?        Rl   01:00  83:11 /usr/bin/python /usr/bin/salt-master MWorker-108
salt     47045  8.5  0.0 625736 98804 ?        Sl   01:00  78:17 /usr/bin/python /usr/bin/salt-master MWorker-109
salt     47059  9.1  0.0 596820 69376 ?        Sl   01:00  83:43 /usr/bin/python /usr/bin/salt-master MWorker-110
salt     47100  9.0  0.0 745112 72084 ?        Rl   01:00  83:04 /usr/bin/python /usr/bin/salt-master MWorker-111
salt     47118  8.8  0.1 765860 240036 ?       Sl   01:00  80:53 /usr/bin/python /usr/bin/salt-master MWorker-112
salt     47128  8.9  0.0 746712 71992 ?        Sl   01:00  82:14 /usr/bin/python /usr/bin/salt-master MWorker-113
salt     47161  8.5  0.0 747220 71764 ?        Sl   01:00  78:26 /usr/bin/python /usr/bin/salt-master MWorker-114
salt     47172  8.9  0.0 747208 73176 ?        Sl   01:00  82:05 /usr/bin/python /usr/bin/salt-master MWorker-115
salt     47180  9.0  0.0 744436 70028 ?        Sl   01:00  82:17 /usr/bin/python /usr/bin/salt-master MWorker-116
salt     47227  9.1  0.0 746064 71624 ?        Sl   01:00  83:50 /usr/bin/python /usr/bin/salt-master MWorker-117
salt     47263  8.9  0.0 630724 102616 ?       Rl   01:00  81:56 /usr/bin/python /usr/bin/salt-master MWorker-118
salt     47271  8.8  0.0 778456 102708 ?       Sl   01:00  80:38 /usr/bin/python /usr/bin/salt-master MWorker-119
salt     47308  8.7  0.0 780108 104012 ?       Sl   01:00  79:45 /usr/bin/python /usr/bin/salt-master MWorker-120
salt     47325  8.6  0.0 746792 71640 ?        Sl   01:00  79:21 /usr/bin/python /usr/bin/salt-master MWorker-121
salt     47345  8.6  0.0 746488 70860 ?        Sl   01:00  79:12 /usr/bin/python /usr/bin/salt-master MWorker-122
salt     47382  8.8  0.0 631144 102832 ?       Sl   01:00  80:52 /usr/bin/python /usr/bin/salt-master MWorker-123
salt     47395  8.6  0.0 598360 71088 ?        Sl   01:00  78:43 /usr/bin/python /usr/bin/salt-master MWorker-124
salt     47408  9.1  0.0 746032 69316 ?        Sl   01:00  83:48 /usr/bin/python /usr/bin/salt-master MWorker-125
salt     47447  8.7  0.0 748072 73452 ?        Sl   01:00  79:44 /usr/bin/python /usr/bin/salt-master MWorker-126
salt     47461  8.7  0.0 771176 96368 ?        Sl   01:00  79:43 /usr/bin/python /usr/bin/salt-master MWorker-127
salt     47493  9.0  0.0 596600 68384 ?        Sl   01:00  82:26 /usr/bin/python /usr/bin/salt-master MWorker-128
salt     47534  8.8  0.0 746004 70508 ?        Sl   01:00  81:16 /usr/bin/python /usr/bin/salt-master MWorker-129
salt     47562  8.7  0.0 746116 70340 ?        Rl   01:00  79:56 /usr/bin/python /usr/bin/salt-master MWorker-130
salt     47592  8.6  0.0 598364 70424 ?        Sl   01:00  78:48 /usr/bin/python /usr/bin/salt-master MWorker-131
salt     47618  8.8  0.0 746108 70744 ?        Sl   01:00  81:14 /usr/bin/python /usr/bin/salt-master MWorker-132
salt     47645  8.6  0.0 744880 69224 ?        Sl   01:00  78:37 /usr/bin/python /usr/bin/salt-master MWorker-133
salt     47673  8.8  0.0 745172 71040 ?        Sl   01:00  80:34 /usr/bin/python /usr/bin/salt-master MWorker-134
salt     47686  8.7  0.0 745624 69936 ?        Sl   01:00  79:32 /usr/bin/python /usr/bin/salt-master MWorker-135
salt     47696  8.9  0.0 747108 71540 ?        Sl   01:00  81:43 /usr/bin/python /usr/bin/salt-master MWorker-136
salt     47746  8.8  0.0 600644 71688 ?        Sl   01:00  81:11 /usr/bin/python /usr/bin/salt-master MWorker-137
salt     47757  8.7  0.0 595992 67692 ?        Sl   01:00  80:08 /usr/bin/python /usr/bin/salt-master MWorker-138
salt     47797  8.8  0.0 599624 71680 ?        Sl   01:00  80:48 /usr/bin/python /usr/bin/salt-master MWorker-139
salt     47805  8.7  0.0 599812 71680 ?        Rl   01:00  79:46 /usr/bin/python /usr/bin/salt-master MWorker-140
salt     47847  8.9  0.0 600936 71984 ?        Sl   01:00  81:33 /usr/bin/python /usr/bin/salt-master MWorker-141
salt     47861  8.7  0.0 747164 71964 ?        Sl   01:00  80:02 /usr/bin/python /usr/bin/salt-master MWorker-142
salt     47869  8.9  0.3 943332 416260 ?       Sl   01:00  81:58 /usr/bin/python /usr/bin/salt-master MWorker-143
salt     47911  8.9  0.0 746212 70596 ?        Sl   01:00  81:56 /usr/bin/python /usr/bin/salt-master MWorker-144
salt     47950  8.7  0.0 779280 104584 ?       Sl   01:00  79:49 /usr/bin/python /usr/bin/salt-master MWorker-145
salt     47973  8.8  0.0 596048 66636 ?        Sl   01:00  80:55 /usr/bin/python /usr/bin/salt-master MWorker-146
salt     47995  8.8  0.0 746964 72020 ?        Sl   01:00  80:34 /usr/bin/python /usr/bin/salt-master MWorker-147
salt     48025  8.9  0.0 745776 69852 ?        Sl   01:00  81:57 /usr/bin/python /usr/bin/salt-master MWorker-148
salt     48056  8.8  0.0 746016 70216 ?        Sl   01:00  80:54 /usr/bin/python /usr/bin/salt-master MWorker-149
salt     48095  8.9  0.0 747200 71604 ?        Sl   01:00  81:27 /usr/bin/python /usr/bin/salt-master MWorker-150
salt     48130  8.8  0.0 601016 72612 ?        Sl   01:00  80:53 /usr/bin/python /usr/bin/salt-master MWorker-151
salt     48149  8.8  0.0 747192 72572 ?        Sl   01:00  81:08 /usr/bin/python /usr/bin/salt-master MWorker-152
salt     48172  8.9  0.0 746148 70492 ?        Sl   01:00  81:33 /usr/bin/python /usr/bin/salt-master MWorker-153
salt     48205  8.8  0.0 746884 71360 ?        Sl   01:00  81:02 /usr/bin/python /usr/bin/salt-master MWorker-154
salt     48239  8.8  0.0 625708 97660 ?        Sl   01:00  81:11 /usr/bin/python /usr/bin/salt-master MWorker-155
salt     48270  8.3  0.0 596904 69780 ?        Sl   01:00  76:27 /usr/bin/python /usr/bin/salt-master MWorker-156
salt     48303  9.0  0.0 744780 69028 ?        Sl   01:00  82:21 /usr/bin/python /usr/bin/salt-master MWorker-157
salt     48338  8.9  0.0 746168 70528 ?        Sl   01:00  82:02 /usr/bin/python /usr/bin/salt-master MWorker-158
salt     48347  8.9  0.0 604308 76096 ?        Sl   01:00  81:36 /usr/bin/python /usr/bin/salt-master MWorker-159
salt     48391  8.8  0.0 745560 70376 ?        Sl   01:00  80:40 /usr/bin/python /usr/bin/salt-master MWorker-160
salt     48398  9.0  0.0 744908 69312 ?        Sl   01:00  82:35 /usr/bin/python /usr/bin/salt-master MWorker-161
salt     48449  8.7  0.0 746520 71288 ?        Sl   01:00  79:49 /usr/bin/python /usr/bin/salt-master MWorker-162
salt     48459  8.7  0.0 778268 102676 ?       Sl   01:00  79:32 /usr/bin/python /usr/bin/salt-master MWorker-163
salt     48491  8.7  0.0 745756 71116 ?        Rl   01:00  79:35 /usr/bin/python /usr/bin/salt-master MWorker-164
salt     48576  8.9  0.0 745768 70980 ?        Rl   01:00  81:31 /usr/bin/python /usr/bin/salt-master MWorker-165
salt     48596  8.7  0.0 745736 69504 ?        Sl   01:00  79:59 /usr/bin/python /usr/bin/salt-master MWorker-166
salt     48612  8.6  0.1 912620 238284 ?       Sl   01:00  78:51 /usr/bin/python /usr/bin/salt-master MWorker-167
salt     48636  8.8  0.0 745036 69556 ?        Rl   01:00  80:33 /usr/bin/python /usr/bin/salt-master MWorker-168
salt     48665  8.7  0.0 596912 67696 ?        Sl   01:00  79:39 /usr/bin/python /usr/bin/salt-master MWorker-169
salt     48722  9.0  0.3 1093140 417304 ?      Sl   01:00  82:39 /usr/bin/python /usr/bin/salt-master MWorker-170
salt     48741  8.6  0.0 745504 70788 ?        Sl   01:00  79:30 /usr/bin/python /usr/bin/salt-master MWorker-171
salt     48761  8.6  0.0 780616 104120 ?       Sl   01:00  78:58 /usr/bin/python /usr/bin/salt-master MWorker-172
salt     48786  8.9  0.1 924488 250104 ?       Sl   01:00  82:09 /usr/bin/python /usr/bin/salt-master MWorker-173
salt     48855  8.7  0.0 781308 105768 ?       Sl   01:00  79:44 /usr/bin/python /usr/bin/salt-master MWorker-174
salt     48875  8.7  0.0 597968 68848 ?        Rl   01:00  80:25 /usr/bin/python /usr/bin/salt-master MWorker-175
salt     48888  8.7  0.0 598696 71308 ?        Sl   01:00  79:31 /usr/bin/python /usr/bin/salt-master MWorker-176
salt     48958  8.8  0.0 746844 70964 ?        Sl   01:00  81:05 /usr/bin/python /usr/bin/salt-master MWorker-177
salt     48974  8.9  0.0 597136 68040 ?        Sl   01:00  81:30 /usr/bin/python /usr/bin/salt-master MWorker-178
salt     49075  8.9  0.0 597496 70272 ?        Rl   01:00  81:38 /usr/bin/python /usr/bin/salt-master MWorker-179
salt     49088  8.6  0.0 745316 70676 ?        Sl   01:00  79:07 /usr/bin/python /usr/bin/salt-master MWorker-180
salt     49139  8.8  0.0 746664 69800 ?        Sl   01:00  80:28 /usr/bin/python /usr/bin/salt-master MWorker-181
salt     49168  8.8  0.0 597436 68728 ?        Sl   01:00  80:42 /usr/bin/python /usr/bin/salt-master MWorker-182
salt     49208  9.0  0.0 746944 71852 ?        Rl   01:00  82:39 /usr/bin/python /usr/bin/salt-master MWorker-183
salt     49228  8.7  0.0 779560 103996 ?       Sl   01:00  80:10 /usr/bin/python /usr/bin/salt-master MWorker-184
salt     49258  8.8  0.0 748564 73088 ?        Sl   01:00  80:48 /usr/bin/python /usr/bin/salt-master MWorker-185
salt     49286  8.4  0.0 745504 70484 ?        Rl   01:00  77:02 /usr/bin/python /usr/bin/salt-master MWorker-186
salt     49309  8.6  0.0 746420 70988 ?        Sl   01:00  79:00 /usr/bin/python /usr/bin/salt-master MWorker-187
salt     49337  8.6  0.0 746696 71092 ?        Sl   01:00  79:21 /usr/bin/python /usr/bin/salt-master MWorker-188
salt     49395  8.6  0.0 782960 108428 ?       Sl   01:00  79:17 /usr/bin/python /usr/bin/salt-master MWorker-189
salt     49418  8.7  0.0 779016 102444 ?       Sl   01:00  79:56 /usr/bin/python /usr/bin/salt-master MWorker-190
salt     49430  8.8  0.0 745832 69188 ?        Sl   01:00  81:03 /usr/bin/python /usr/bin/salt-master MWorker-191
salt     52578  1.4  0.0 510600 62116 ?        S    09:04   6:01 /usr/bin/python /usr/bin/salt-master ProcessManager Maintenance

@DmitryKuzmenko
Copy link
Contributor

I see. Thank you. My comment about oom killer was wrong.
So here we have about 1/3 of memory used by Syndic main process and 2/3 by master worker processes.
@amuhametov if you can share your syndic log file it could also help to understand why Syndic main process grows so much. The most possible reason is that Syndic sends data slower than data appears from minions.

@amuhametov
Copy link
Author

2018-11-21 00:14:42,240 [salt.utils.parsers:1051][WARNING ][42052] Syndic received a SIGTERM. Exiting.
2018-11-21 00:15:32,954 [salt.minion      :589 ][WARNING ][26042] random_master is True but there is only one master specified. Ignoring.
2018-11-21 00:15:33,503 [salt.minion      :2715][ERROR   ][26042] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 00:15:33,503 [salt.minion      :2725][CRITICAL][26042] Unable to call _fire_master on any masters!
2018-11-21 00:15:34,003 [salt.minion      :2715][ERROR   ][26042] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 00:15:34,003 [salt.minion      :2725][CRITICAL][26042] Unable to call _fire_master on any masters!
2018-11-21 00:15:34,503 [salt.minion      :2715][ERROR   ][26042] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 00:15:34,503 [salt.minion      :2725][CRITICAL][26042] Unable to call _fire_master on any masters!
2018-11-21 00:15:35,003 [salt.minion      :2715][ERROR   ][26042] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 00:15:35,003 [salt.minion      :2725][CRITICAL][26042] Unable to call _fire_master on any masters!
2018-11-21 00:15:35,003 [salt.minion      :2734][ERROR   ][26042] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 00:15:36,774 [salt.minion      :2715][ERROR   ][26042] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 00:15:36,775 [salt.minion      :2725][CRITICAL][26042] Unable to call _fire_master on any masters!
2018-11-21 00:15:36,775 [salt.minion      :2734][ERROR   ][26042] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 00:58:34,016 [salt.minion      :1762][WARNING ][26042] The minion failed to return the job information for job 20181121005717431438. This is often due to the master being shut down or overloaded. If the master is running consider increasing the worker_threads value.
2018-11-21 00:58:34,504 [salt.minion      :2748][ERROR   ][26042] Unable to call _return_pub_multi on masterofmasters, trying another...
2018-11-21 00:58:34,504 [salt.minion      :589 ][WARNING ][26042] random_master is True but there is only one master specified. Ignoring.
2018-11-21 01:00:49,857 [salt.utils.parsers:1051][WARNING ][26042] Syndic received a SIGTERM. Exiting.
2018-11-21 01:01:07,829 [salt.minion      :589 ][WARNING ][46130] random_master is True but there is only one master specified. Ignoring.
2018-11-21 01:01:08,382 [salt.minion      :2715][ERROR   ][46130] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 01:01:08,382 [salt.minion      :2725][CRITICAL][46130] Unable to call _fire_master on any masters!
2018-11-21 01:01:08,382 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:08,882 [salt.minion      :2715][ERROR   ][46130] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 01:01:08,882 [salt.minion      :2725][CRITICAL][46130] Unable to call _fire_master on any masters!
2018-11-21 01:01:08,882 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:08,882 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,382 [salt.minion      :2715][ERROR   ][46130] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,382 [salt.minion      :2725][CRITICAL][46130] Unable to call _fire_master on any masters!
2018-11-21 01:01:09,382 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,382 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,882 [salt.minion      :2715][ERROR   ][46130] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,882 [salt.minion      :2725][CRITICAL][46130] Unable to call _fire_master on any masters!
2018-11-21 01:01:09,882 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,882 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:11,418 [salt.minion      :2715][ERROR   ][46130] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 01:01:11,418 [salt.minion      :2725][CRITICAL][46130] Unable to call _fire_master on any masters!
2018-11-21 01:01:11,419 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:11,419 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 12:18:59,921 [salt.minion      :1762][WARNING ][46130] The minion failed to return the job information for job 20181121121550443401. This is often due to the master being shut down or overloaded. If the master is running consider increasing the worker_threads value.
2018-11-21 12:19:00,383 [salt.minion      :2748][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, trying another...
2018-11-21 12:19:00,384 [salt.minion      :589 ][WARNING ][46130] random_master is True but there is only one master specified. Ignoring.
2018-11-21 12:25:09,416 [salt.minion      :1762][WARNING ][46130] The minion failed to return the job information for job 20181121122036894532. This is often due to the master being shut down or overloaded. If the master is running consider increasing the worker_threads value.
2018-11-21 12:25:09,883 [salt.minion      :2748][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, trying another...
2018-11-21 12:25:09,889 [salt.minion      :589 ][WARNING ][46130] random_master is True but there is only one master specified. Ignoring.
2018-11-21 14:51:08,691 [salt.minion      :2490][WARNING ][46130] Unable to forward pub data: Salt request timed out. The master is not responding. You may need to run your command with `--async` in order to bypass the congested event bus. With `--async`, the CLI tool will print the job id (jid) and exit immediately without listening for responses. You can then use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.

@wangwenchao
Copy link

2018-11-21 00:14:42,240 [salt.utils.parsers:1051][WARNING ][42052] Syndic received a SIGTERM. Exiting.
2018-11-21 00:15:32,954 [salt.minion      :589 ][WARNING ][26042] random_master is True but there is only one master specified. Ignoring.
2018-11-21 00:15:33,503 [salt.minion      :2715][ERROR   ][26042] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 00:15:33,503 [salt.minion      :2725][CRITICAL][26042] Unable to call _fire_master on any masters!
2018-11-21 00:15:34,003 [salt.minion      :2715][ERROR   ][26042] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 00:15:34,003 [salt.minion      :2725][CRITICAL][26042] Unable to call _fire_master on any masters!
2018-11-21 00:15:34,503 [salt.minion      :2715][ERROR   ][26042] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 00:15:34,503 [salt.minion      :2725][CRITICAL][26042] Unable to call _fire_master on any masters!
2018-11-21 00:15:35,003 [salt.minion      :2715][ERROR   ][26042] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 00:15:35,003 [salt.minion      :2725][CRITICAL][26042] Unable to call _fire_master on any masters!
2018-11-21 00:15:35,003 [salt.minion      :2734][ERROR   ][26042] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 00:15:36,774 [salt.minion      :2715][ERROR   ][26042] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 00:15:36,775 [salt.minion      :2725][CRITICAL][26042] Unable to call _fire_master on any masters!
2018-11-21 00:15:36,775 [salt.minion      :2734][ERROR   ][26042] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 00:58:34,016 [salt.minion      :1762][WARNING ][26042] The minion failed to return the job information for job 20181121005717431438. This is often due to the master being shut down or overloaded. If the master is running consider increasing the worker_threads value.
2018-11-21 00:58:34,504 [salt.minion      :2748][ERROR   ][26042] Unable to call _return_pub_multi on masterofmasters, trying another...
2018-11-21 00:58:34,504 [salt.minion      :589 ][WARNING ][26042] random_master is True but there is only one master specified. Ignoring.
2018-11-21 01:00:49,857 [salt.utils.parsers:1051][WARNING ][26042] Syndic received a SIGTERM. Exiting.
2018-11-21 01:01:07,829 [salt.minion      :589 ][WARNING ][46130] random_master is True but there is only one master specified. Ignoring.
2018-11-21 01:01:08,382 [salt.minion      :2715][ERROR   ][46130] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 01:01:08,382 [salt.minion      :2725][CRITICAL][46130] Unable to call _fire_master on any masters!
2018-11-21 01:01:08,382 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:08,882 [salt.minion      :2715][ERROR   ][46130] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 01:01:08,882 [salt.minion      :2725][CRITICAL][46130] Unable to call _fire_master on any masters!
2018-11-21 01:01:08,882 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:08,882 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,382 [salt.minion      :2715][ERROR   ][46130] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,382 [salt.minion      :2725][CRITICAL][46130] Unable to call _fire_master on any masters!
2018-11-21 01:01:09,382 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,382 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,882 [salt.minion      :2715][ERROR   ][46130] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,882 [salt.minion      :2725][CRITICAL][46130] Unable to call _fire_master on any masters!
2018-11-21 01:01:09,882 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:09,882 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:11,418 [salt.minion      :2715][ERROR   ][46130] Unable to call _fire_master on masterofmasters, that syndic is not connected
2018-11-21 01:01:11,418 [salt.minion      :2725][CRITICAL][46130] Unable to call _fire_master on any masters!
2018-11-21 01:01:11,419 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 01:01:11,419 [salt.minion      :2734][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, that syndic is not connected
2018-11-21 12:18:59,921 [salt.minion      :1762][WARNING ][46130] The minion failed to return the job information for job 20181121121550443401. This is often due to the master being shut down or overloaded. If the master is running consider increasing the worker_threads value.
2018-11-21 12:19:00,383 [salt.minion      :2748][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, trying another...
2018-11-21 12:19:00,384 [salt.minion      :589 ][WARNING ][46130] random_master is True but there is only one master specified. Ignoring.
2018-11-21 12:25:09,416 [salt.minion      :1762][WARNING ][46130] The minion failed to return the job information for job 20181121122036894532. This is often due to the master being shut down or overloaded. If the master is running consider increasing the worker_threads value.
2018-11-21 12:25:09,883 [salt.minion      :2748][ERROR   ][46130] Unable to call _return_pub_multi on masterofmasters, trying another...
2018-11-21 12:25:09,889 [salt.minion      :589 ][WARNING ][46130] random_master is True but there is only one master specified. Ignoring.
2018-11-21 14:51:08,691 [salt.minion      :2490][WARNING ][46130] Unable to forward pub data: Salt request timed out. The master is not responding. You may need to run your command with `--async` in order to bypass the congested event bus. With `--async`, the CLI tool will print the job id (jid) and exit immediately without listening for responses. You can then use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.

Today I met this also, ony one syndic exception and others is ok , but dont know show to resolve

@DmitryKuzmenko
Copy link
Contributor

Hm. Interesting. It looks syndic loses connection to master and for some reason doesn't restore it back. Probably the network is overloaded. Let me take a look at the code.

@DmitryKuzmenko
Copy link
Contributor

DmitryKuzmenko commented Jul 8, 2019

@amuhametov sorry for delay on this issue.
@amuhametov @wangwenchao I see a possible problem in handling errors of syndic connection to the upstream master. Is it possible for you to try to collect an additional information that will help to better understand your particular issue? What I need is to see the future status when that unable to call problem is happening. To do this it's needed to patch the syndic's code with this:

diff --git a/salt/minion.py b/salt/minion.py
index 7d1801c3bd..0faf673e6d 100644
--- a/salt/minion.py
+++ b/salt/minion.py
@@ -2731,7 +2731,7 @@ class SyndicManager(MinionBase):
         func = '_return_pub_multi'
         for master, syndic_future in self.iter_master_options(master_id):
             if not syndic_future.done() or syndic_future.exception():
-                log.error('Unable to call {0} on {1}, that syndic is not connected'.format(func, master))
+                log.error('Unable to call {0} on {1}, that syndic is not connected: {2}'.format(func, master, syndic_future))
                 continue
 
             future, data = self.pub_futures.get(master, (None, None))

and provide the same log snippet as above.
Thank you!

@DmitryKuzmenko DmitryKuzmenko added Confirmed Salt engineer has confirmed bug/feature - often including a MCVE and removed Confirmed Salt engineer has confirmed bug/feature - often including a MCVE labels Jul 8, 2019
@amuhametov
Copy link
Author

amuhametov commented Aug 18, 2019 via email

@stale
Copy link

stale bot commented Jan 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Jan 8, 2020
@stale stale bot closed this as completed Jan 15, 2020
@amuhametov
Copy link
Author

ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info-needed waiting for more info stale
Projects
None yet
Development

No branches or pull requests

4 participants