-
-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VRRP child process() died: Respawning #390
Comments
Leaving aside the Receive message overrun message for now, since I presume that is not causing keepalived to terminate, the issue with keepalived is an assertion failure in keepalived_malloc(). Since you have the keepalived_malloc() function, it would appear that you have have run configure with the --enable-debug option, which will use more memory than building without that option (and it will also make keepalived run slightly slower). The assert failure in keepalived is If you want to continue building with --enable-debug, then increase MAX_ALLOC_LIST in lib/memory.h. Otherwise build without --enable-debug. The assert should then no longer occur. Included in keepalived v1.2.23 is a patch that separates out the memory checking from the --enable-debug option, by introducing an --enable-mem-check option, so you could move to 1.2.23 and still build with --enable-debug, and without --enable-mem-check, and then you won't have the MAX_ALLOC_LIST limit. As ever, moving to the latest release makes it easier to support any issues you may have. |
100% agreed, Please give it a try without debug stuff... Memory debug was just in while debugging leak, it is there just for coding purpose. I am closing this issue since this is the root cause IMHO. If not please reopen it while reporting. Regs, |
Yeap, you've right. removal of --enable-debug solved the issue we had while testing many instances. Regards! |
Hey,
faced a problem with keepalived when running for more instances (on separate vlan subinterface each instance).
I am trying to run about 50-200 instances, which results in SIGABRT for childproces (on which threshold - depends on the configuration).
Version: v1.2.22 commit gbde660e
In logs I see:
Keepalived_vrrp[]: Netlink: Received message overrun (No buffer space available)
Seems like memory issue, so tried to increase buffers system-wide via sysctl - didn't help. Logs and backtrace below. No matter if advert_int is 1s or 30s or more. Just crash is later (normally at the time of sending GARPs).
150 instances, no-vmac:
Aug 5 12:47:35 test1 Keepalived[89733]: Starting Keepalived v1.2.22 (06/28,2016), git commit v1.2.22-15-gbde660e
Aug 5 12:47:35 test1 Keepalived[89734]: Starting Healthcheck child process, pid=89736
Aug 5 12:47:35 test1 Keepalived[89734]: Starting VRRP child process, pid=89737
Aug 5 12:47:35 test1 Keepalived_healthcheckers[89736]: Registering Kernel netlink reflector
Aug 5 12:47:35 test1 Keepalived_healthcheckers[89736]: Registering Kernel netlink command channel
Aug 5 12:47:35 test1 Keepalived_healthcheckers[89736]: Opening file '/run/conf/keepalived.conf'.
Aug 5 12:47:35 test1 Keepalived_vrrp[89737]: Registering Kernel netlink reflector
Aug 5 12:47:35 test1 Keepalived_vrrp[89737]: Registering Kernel netlink command channel
Aug 5 12:47:35 test1 Keepalived_vrrp[89737]: Registering gratuitous ARP shared channel
Aug 5 12:47:35 test1 Keepalived_vrrp[89737]: Opening file '/run/conf/keepalived.conf'.
Aug 5 12:47:35 test1 Keepalived_healthcheckers[89736]: Using LinkWatch kernel netlink reflector...
Aug 5 12:47:35 test1 Keepalived_vrrp[89737]: Using LinkWatch kernel netlink reflector...
Aug 5 12:47:35 test1 Keepalived[89734]: pid 89737 exited due to signal 6
Aug 5 12:47:35 test1 Keepalived[89734]: VRRP child process(89737) died: Respawning
Aug 5 12:47:35 test1 Keepalived[89734]: Starting VRRP child process, pid=89761
Aug 5 12:47:35 test1 Keepalived_vrrp[89761]: Registering Kernel netlink reflector
Aug 5 12:47:35 test1 Keepalived_vrrp[89761]: Registering Kernel netlink command channel
Aug 5 12:47:35 test1 Keepalived_vrrp[89761]: Registering gratuitous ARP shared channel
Aug 5 12:47:35 test1 Keepalived_vrrp[89761]: Opening file '/run/conf/keepalived.conf'.
Aug 5 12:47:35 test1 Keepalived_vrrp[89761]: Using LinkWatch kernel netlink reflector...
Aug 5 12:47:35 test1 Keepalived[89734]: pid 89761 exited due to signal 6
Aug 5 12:47:35 test1 Keepalived[89734]: VRRP child process(89761) died: Respawning
Aug 5 12:47:35 test1 Keepalived[89734]: Starting VRRP child process, pid=89770
(...)
Core was generated by `/usr/sbin/keepalived -D --core-dump -f /run/conf/keepalived.conf'.
Program terminated with signal 6, Aborted.
#0 0x00007f122823a125 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007f122823a125 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f122823d325 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f1228233311 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x0000000000430b20 in keepalived_malloc ()
#4 0x00000000004336ad in list_add ()
#5 0x0000000000423861 in vrrp_dispatcher_init ()
#6 0x000000000043360d in launch_scheduler ()
#7 0x000000000041aee9 in start_vrrp_child ()
#8 0x000000000041afe6 in ?? ()
#9 0x000000000043360d in launch_scheduler ()
#10 0x000000000040ab23 in main ()
100 instances, vmac:
Aug 5 12:58:53 test1 Keepalived[105533]: Starting Keepalived v1.2.22 (06/28,2016), git commit v1.2.22-15-gbde660e
Aug 5 12:58:53 test1 Keepalived[105534]: Starting Healthcheck child process, pid=105535
Aug 5 12:58:53 test1 Keepalived[105534]: Starting VRRP child process, pid=105536
...
Aug 5 12:58:56 test1 Keepalived_vrrp[105536]: VRRP_Instance(41) Entering BACKUP STATE
..
Aug 5 12:58:56 test1 Keepalived_vrrp[105536]: VRRP_Instance(4100) Entering BACKUP STATE
Aug 5 12:58:56 test1 Keepalived_vrrp[105536]: VRRP sockpool: [ifindex(3112), proto(112), unicast(0), fd(11,12)]
Aug 5 12:58:56 test1 Keepalived_vrrp[105536]: VRRP sockpool: [ifindex(3113), proto(112), unicast(0), fd(13,14)]
Aug 5 12:58:56 test1 Keepalived_vrrp[105536]: VRRP sockpool: [ifindex(3114), proto(112), unicast(0), fd(15,16)]
Aug 5 12:58:56 test1 Keepalived_vrrp[105536]: VRRP sockpool: [ifindex(3115), proto(112), unicast(0), fd(17,18)]
...
Aug 5 12:58:56 test1 Keepalived_vrrp[105536]: Netlink: Received message overrun (No buffer space available)
...
Aug 5 12:59:00 test1 Keepalived_vrrp[105536]: VRRP_Instance(414) Transition to MASTER STATE
Aug 5 12:59:00 test1 Keepalived_vrrp[105536]: VRRP_Instance(49) Transition to MASTER STATE
...
Aug 5 12:59:01 test1 Keepalived_vrrp[105536]: VRRP_Instance(414) Entering MASTER STATE
Aug 5 12:59:01 test1 Keepalived_healthcheckers[105535]: Netlink reflector reports IP 169.254.14.254 added
...
Aug 5 12:59:01 test1 Keepalived_vrrp[105536]: Sending gratuitous ARP on vrrp.64.1 for 169.254.64.254
Aug 5 12:59:01 test1 Keepalived_vrrp[105536]: Sending gratuitous ARP on vrrp.64.1 for 169.254.64.254
Aug 5 12:59:01 test1 Keepalived_vrrp[105536]: Sending gratuitous ARP on vrrp.64.1 for 169.254.64.254
Aug 5 12:59:01 test1 Keepalived[105534]: pid 10553 exited due to signal 6
Aug 5 12:59:01 test1 Keepalived[105534]: VRRP child process(105536) died: Respawning
Aug 5 12:59:01 test1 Keepalived[105534]: Starting VRRP child process, pid=106399
...
Aug 5 12:59:01 test1 Keepalived_vrrp[106399]: Netlink reflector reports IP 91.121.124.15 added
Aug 5 12:59:01 test1 Keepalived_vrrp[106399]: Netlink reflector reports IP 37.187.231.159 added
Aug 5 12:59:01 test1 Keepalived_vrrp[106399]: Netlink reflector reports IP 169.254.1.254 added
Aug 5 12:59:01 test1 Keepalived_vrrp[106399]: Netlink reflector reports IP 169.254.2.254 added
...
Aug 5 12:59:06 test1 Keepalived_vrrp[106399]: Sending gratuitous ARP on vrrp.92.1 for 169.254.92.254
Aug 5 12:59:06 test1 Keepalived_vrrp[106399]: Sending gratuitous ARP on vrrp.92.1 for 169.254.92.254
Aug 5 12:59:06 test1 Keepalived[105534]: pid 10639 exited due to signal 6
Aug 5 12:59:06 test1 Keepalived[105534]: VRRP child process(106399) died: Respawning
Aug 5 12:59:06 test1 Keepalived[105534]: Starting VRRP child process, pid=106511
...
Core was generated by `/usr/sbin/keepalived -D --core-dump -f /run/conf/keepalived.conf'.
Program terminated with signal 6, Aborted.
#0 0x00007fcc27571125 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007fcc27571125 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fcc27574325 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007fcc2756a311 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x0000000000430b20 in keepalived_malloc ()
#4 0x0000000000431aa1 in ?? ()
#5 0x000000000043206d in thread_add_read ()
#6 0x00000000004235c6 in ?? ()
#7 0x000000000043360d in launch_scheduler ()
#8 0x000000000041aee9 in start_vrrp_child ()
#9 0x000000000041afe6 in ?? ()
#10 0x000000000043360d in launch_scheduler ()
#11 0x000000000040ab23 in main ()
(gdb) quit
[root@test1 ~] stat /core
Modify: 2016-08-05 12:59:11.289464114 +0200
Increasing OS rmem via sysctl didn't help.
Keepalive template to generate test instances:
interface test setup: (type of interface doesn't matter since I tested on linux bond, eth with same results)
and in the loop:
and generate the config from template.
PS. for about 200 instances with vmac I got respawning so often, keepalived child processes went up to few thousands (oO).
Do you think playing with SO_RCVBUF would help?
Regards!
The text was updated successfully, but these errors were encountered: