-
-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keepalived can't call notify_master script timely #2407
Comments
There is no delay in the keepalived code between logging that the VRRP instance is entering master state and the notify_master script being run (see vrrp_state_master_tx() in keepalived/vrrp/vrrp.c). It is of course possible that the kernel suspends keepalived to schedule other processes to run, but that is outside the control of keepalived. If you suspect this is happening you could set It isn't clear where the time is logged in The bigger problem you have is that VRRP adverts appear to be being "lost". This could be a network problem, or it could be a scheduling problem; I suggest that this is the more important problem to resolve. |
refresh-arp logged timestamp at the beginning, so I guess enable-arp was scheduled very lately because enable-arp is very simple, it is impossible to have a so big delay. Is it possible VM system clock is changed at those points because of some unkown reasons? |
Does vrrp_rt_priority have any side effect? Is rt scheduler policy is FIFO or RR? Will notify_master script also inherit vrrp_rt_priority?
|
This seems unlikely, but I don't know what is happening on your systems.
It uses SCHED_RR. It also sets flag SCHED_RESET_ON_FORK, so notify scripts will not inherit the realtime scheduling. I really think you need to concentrate on discovering why you are getting the following messages:
Fixing these happening could well resolve the issue of the scripts being delayed in their execution. |
Describe the bug
we configured 'notify_master "/usr/bin/refresh-and-enable-arp MASTER"', but "/usr/bin/refresh-and-enable-arp" can't be called timely when VRRP instance transitions to MASTER. Here are two long delay cases:
case 1
Apr 15 08:54:36 node2 Keepalived_vrrp[7989]: (VI_1) Entering MASTER STATE
but log of "/usr/bin/refresh-and-enable-arp" indicates it is called after about 15 seconds, what happened here?
2024/04/15 08:54:51.460339 ...
case 2
Apr 15 11:38:54 node2 Keepalived_vrrp[7989]: (VI_1) Entering MASTER STATE
but log of "/usr/bin/refresh-and-enable-arp" indicates it is called after about 3 seconds, what happened here?
2024/04/15 11:38:57.276773 ...
To Reproduce
Not easy to reproduce it. we don't know how to reproduce it. but this did happen several time on our online system. By the way, in most of cases, it is timely, so we're confused, what happened there in those bad cases?
Expected behavior
It should be called very timely.
Keepalived version
Keepalived v2.1.5 (07/13,2020)
Copyright(C) 2001-2020 Alexandre Cassen, acassen@gmail.com
Built with kernel headers for Linux 5.10.70
Running on Linux 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31)
configure options: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-kernel-dir=debian/ --enable-snmp --enable-sha1 --enable-snmp-rfcv2 --enable-snmp-rfcv3 --enable-dbus --enable-json --enable-bfd --enable-regex build_alias=x86_64-linux-gnu CFLAGS=-g -O2 -ffile-prefix-map=/build/keepalived-jCUlld/keepalived-2.1.5=. -fstack-protector-strong -Wformat -Werror=format-security LDFLAGS=-Wl,-z,relro CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2
Config options: NFTABLES LVS REGEX VRRP VRRP_AUTH JSON BFD OLD_CHKSUM_COMPAT FIB_ROUTING SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 DBUS
System options: PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV4_DEVCONF IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_OIFNAME FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_VMAC VRRP_IPVLAN IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE VRF SO_MARK SCHED_RESET_ON_FORK
Distro (please complete the following information):
Details of any containerisation or hosted service (e.g. AWS)
Run on VM
Configuration file:
! Configuration File for keepalived
global_defs {
vrrp_garp_master_refresh 60
vrrp_garp_master_refresh_repeat 3
}
vrrp_instance VI_1 {
state BACKUP
nopreempt
interface enp6s19.91
virtual_router_id 101
priority 99
advert_int 1
authentication {
auth_type PASS
auth_pass ********
}
virtual_ipaddress {
10.4.91.3/24 dev enp6s19.91
}
track_interface {
dummy0
enp6s19
}
notify_master "/usr/bin/refresh-and-enable-arp MASTER"
notify_master_rx_lower_pri "/usr/bin/refresh-and-enable-arp MASTER_RX_LOWER_PRI"
notify_backup "/usr/bin/pkill-and-disable-arp BACKUP"
notify_fault "/usr/bin/pkill-and-disable-arp FAULT"
}
Notify and track scripts
$ cat refresh-and-enable-arp
#!/bin/bash
/usr/bin/enable-arp
/usr/bin/refresh-arp >> /tmp/refresh-arp.log 2>&1
/usr/bin/keepalived-state-change $1
System Log entries
Apr 05 00:23:22 node2 Keepalived_vrrp[7989]: (VI_1) Entering BACKUP STATE
Apr 12 00:49:43 node2 Keepalived_vrrp[7989]: (VI_1) Entering MASTER STATE
Apr 12 00:49:43 node2 Keepalived_vrrp[7989]: (VI_1) Master received advert from 10.4.91.1 with higher priority 100, ours 99
Apr 12 00:49:43 node2 Keepalived_vrrp[7989]: (VI_1) Entering BACKUP STATE
Apr 15 08:54:36 node2 Keepalived_vrrp[7989]: (VI_1) Entering MASTER STATE
Apr 15 08:54:37 node2 Keepalived_vrrp[7989]: (VI_1) Master received advert from 10.4.91.1 with higher priority 100, ours 99
Apr 15 08:54:37 node2 Keepalived_vrrp[7989]: (VI_1) Entering BACKUP STATE
Apr 15 11:38:54 node2 Keepalived_vrrp[7989]: (VI_1) Entering MASTER STATE
Apr 15 11:38:54 node2 Keepalived_vrrp[7989]: (VI_1) Master received advert from 10.4.91.1 with higher priority 100, ours 99
Apr 15 11:38:54 node2 Keepalived_vrrp[7989]: (VI_1) Entering BACKUP STATE
Did keepalived coredump?
Additional context
vrrp advert packets lost, so BACKUP node becomes MASTER, then it becomes BACKUP because of receiving higher priority advert in one second.
The text was updated successfully, but these errors were encountered: