Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One of the vrrp instance remains in FAULT state after reload #2357

Closed
Anj1Krish opened this issue Nov 21, 2023 · 3 comments
Closed

One of the vrrp instance remains in FAULT state after reload #2357

Anj1Krish opened this issue Nov 21, 2023 · 3 comments

Comments

@Anj1Krish
Copy link

Describe the bug
One of the vrrp instance remains down , after reload. The vrrp device is in down state in keepalived, eventhough actual status is UP To recover the issue , need to toggle the vrrp linux device(vrrp5.4 in this case) created.

To Reproduce
Reload with the given configs (intermittent)

Expected behavior
All the vrrp instances should be up , since the underlying interfaces are UP.

Keepalived version

Output of `keepalived -v`

Keepalived v2.0.19 (10/19,2019)

Copyright(C) 2001-2019 Alexandre Cassen, acassen@gmail.com

Built with kernel headers for Linux 4.15.7
Running on Linux 4.14.67-yocto-standard #1 SMP Tue Nov 14 22:05:10 UTC 2023

configure options: --build=x86_64-linux --host=x86_64-poky-linux --target=x86_64-poky-linux --prefix=/usr --exec_prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --libexecdir=/usr/libexec --datadir=/usr/share --sysconfdir=/etc --sharedstatedir=/com --localstatedir=/var --libdir=/usr/lib --includedir=/usr/include --oldincludedir=/usr/include --infodir=/usr/share/info --mandir=/usr/share/man --disable-silent-rules --disable-dependency-tracking --with-libtool-sysroot=/sandbox-local/bamboo/bamboo-all-agents-build/xml-data/build-dir/CR-MBAXOS2343-BLD/E9-ASM/srcBuild/build/tmp/work/corei7-64-poky-linux/keepalived/2.0.19-r0/recipe-sysroot --disable-libiptc --enable-log-file --enable-json --disable-static json-c-devel --enable-libnl --disable-snmp --with-init=SYSV build_alias=x86_64-linux host_alias=x86_64-poky-linux target_alias=x86_64-poky-linux PKG_CONFIG_PATH=/sandbox-local/bamboo/bamboo-all-agents-build/xml-data/build-dir/CR-MBAXOS2343-BLD/E9-ASM/srcBuild/build/tmp/work/corei7-64-poky-linux/keepalived/2.0.19-r0/recipe-sysroot/usr/lib/pkgconfig:/sandbox-local/bamboo/bamboo-all-agents-build/xml-data/build-dir/CR-MBAXOS2343-BLD/E9-ASM/srcBuild/build/tmp/work/corei7-64-poky-linux/keepalived/2.0.19-r0/recipe-sysroot/usr/share/pkgconfig PKG_CONFIG_LIBDIR=/sandbox-local/bamboo/bamboo-all-agents-build/xml-data/build-dir/CR-MBAXOS2343-BLD/E9-ASM/srcBuild/build/tmp/work/corei7-64-poky-linux/keepalived/2.0.19-r0/recipe-sysroot/usr/lib/pkgconfig CC=x86_64-poky-linux-gcc -m64 -march=corei7 -mtune=corei7 -mfpmath=sse -msse4.2 --sysroot=/sandbox-local/bamboo/bamboo-all-agents-build/xml-data/build-dir/CR-MBAXOS2343-BLD/E9-ASM/srcBuild/build/tmp/work/corei7-64-poky-linux/keepalived/2.0.19-r0/recipe-sysroot CFLAGS= -O2 -pipe -g -feliminate-unused-debug-types -fdebug-prefix-map=/sandbox-local/bamboo/bamboo-all-agents-build/xml-data/build-dir/CR-MBAXOS2343-BLD/E9-ASM/srcBuild/build/tmp/work/corei7-64-poky-linux/keepalived/2.0.19-r0=/usr/src/debug/keepalived/2.0.19-r0 -fdebug-prefix-map=/sandbox-local/bamboo/bamboo-all-agents-build/xml-data/build-dir/CR-MBAXOS2343-BLD/E9-ASM/srcBuild/build/tmp/work/corei7-64-poky-linux/keepalived/2.0.19-r0/recipe-sysroot= -fdebug-prefix-map=/sandbox-local/bamboo/bamboo-all-agents-build/xml-data/build-dir/CR-MBAXOS2343-BLD/E9-ASM/srcBuild/build/tmp/work/corei7-64-poky-linux/keepalived/2.0.19-r0/recipe-sysroot-native= LDFLAGS=-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed CPPFLAGS= CPP=x86_64-poky-linux-gcc -E --sysroot=/sandbox-local/bamboo/bamboo-all-agents-build/xml-data/build-dir/CR-MBAXOS2343-BLD/E9-ASM/srcBuild/build/tmp/work/corei7-64-poky-linux/keepalived/2.0.19-r0/recipe-sysroot -m64 -march=corei7 -mtune=corei7 -mfpmath=sse -msse4.2

Config options: IPTABLES_CMD LVS VRRP VRRP_AUTH JSON OLD_CHKSUM_COMPAT FIB_ROUTING FILE_LOGGING LOG_FILE_APPEND

System options: PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV4_DEVCONF IPV6_ADVANCED_API RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_OIFNAME RTA_TTL_PROPAGATE IFA_FLAGS IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS VRRP_VMAC VRRP_IPVLAN IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE VRF SO_MARK SCHED_RT SCHED_RESET_ON_FORK

Distro (please complete the following information):

  • Name: [e.g. Fedora, Ubuntu]
  • Version: [e.g. 29]
  • Architecture: [e.g. x86_64]

Details of any containerisation or hosted service (e.g. AWS)
If keepalived is being run in a container or on a hosted service, provide full details

Configuration file:

A full copy of the configuration file, obfuscated if necessary to protect passwords and IP addresses


**Notify and track scripts**

If any notify or track scripts are in use, please provide copies of them


**System Log entries**

Full keepalived system log entries from when keepalived started

[vrrp_configs_logs.txt](https://github.com/acassen/keepalived/files/13430217/vrrp_configs_logs.txt)

**Did keepalived coredump?**
```No
If so, can you please provide a stacktrace from the coredump, using gdb.

Additional context
I can see the netlink notification for vrrp5.4 , But looks like keepalived didnt receive/process the same and hence the vrrp interface remains as down in keepalived. The linux device is showing UP as well.

ifconfig vrrp5.4
vrrp5.4 Link encap:Ethernet HWaddr 00:00:5e:00:01:04
UP BROADCAST RUNNING MULTICAST MTU:2020 Metric:1
RX packets:15856 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:354432 (346.1 KiB) TX bytes:0 (0.0 B)

Netlink notification for vrrp 5.4 which was seen down in keepalived.

2023-11-06T20:16:47.239757312(0.001282522) NETLINK Indication NEWLINK, len 1324, flags , seq 0, pid 0| fam 0:route, ifname vrrp5.4, flags 1002:broadcast,multicast, change ffffffff
2023-11-06T20:16:47.240646518(0.000712356) NETLINK Indication NEWLINK, len 1352, flags , seq 0, pid 0| fam 0:route, ifname vrrp5.4, flags 1002:broadcast,multicast, change 0
2023-11-06T20:16:47.259572527(0.000676314) NETLINK Indication NEWLINK, len 1352, flags , seq 0, pid 0| fam 0:route, ifname vrrp5.4, flags 11043:broadcast,multicast,up,running,lowerup, change 1 -> This notification is received on other process. But why keepalived is not receiving/processing this which would have brought the interface (vrrp5.4) up by calling "interface_up" api?

@Anj1Krish
Copy link
Author

All other vrrp instances and vrrp devices are seen up on keepalived .

Eg:
2023-11-06T20:16:47.131102882(0.000680087) NETLINK Indication NEWLINK, len 1324, flags , seq 0, pid 0| fam 0:route, ifname vrrp3.4, flags 1002:broadcast,multicast, change ffffffff
2023-11-06T20:16:47.131774258(0.000423241) NETLINK Indication NEWLINK, len 1352, flags , seq 0, pid 0| fam 0:route, ifname vrrp3.4, flags 1002:broadcast,multicast, change 0
2023-11-06T20:16:47.132978262(0.000765018) NETLINK Indication NEWLINK, len 1352, flags , seq 0, pid 0| fam 0:route, ifname vrrp3.4, flags 11043:broadcast,multicast,up,running,lowerup, change 1
2023-11-06T20:16:47.258600274(0.000248887) NETLINK Indication NEWLINK, len 1352, flags , seq 0, pid 0| fam 0:route, ifname vrrp3.4, flags 11043:broadcast,multicast,up,running,lowerup, change 0
root@ASM2_VRRP_ROUTER:/var/log#

grep -r vrrp3.4 keepalived_vrrp.log
Mon Nov 6 20:16:47 2023: (VI_4_704_v704): Success creating VMAC interface vrrp3.4
Mon Nov 6 20:16:47 2023: send_vrrp_state_change_to_vrrpd system cmd successfull exec dcli vrrpd show slave-alarm VI_4_704_v704 vrrp3.4 2 v704
Mon Nov 6 20:16:50 2023: Sending gratuitous ARP on vrrp3.4 for 10.174.1.10
Mon Nov 6 20:16:50 2023: (VI_4_704_v704) Sending/queueing gratuitous ARPs on vrrp3.4 for 10.174.1.10

@Anj1Krish
Copy link
Author

Anj1Krish commented Dec 28, 2023

Could solve this by setting the rcv buffer size in config file as mentioned in keepalived man page

The following options are only needed for large configurations, where either

       # keepalived creates a large number of interface, or the system has a large
       # number of interface. These options only need using if
       # "Netlink: Receive buffer overrun" messages are seen in the system logs.
       # If the buffer size needed exceeds the value in [/proc/sys/net/core/rmem_max](file://proc/sys/net/core/rmem_max)
       #  the corresponding force option will need to be set.
       # --
       # Set netlink receive buffer size. This is useful for
       # very large configurations where a large number of interfaces exist, and
       # the initial read of the interfaces on the system causes a netlink buffer
       # overrun.
       vrrp_netlink_cmd_rcv_bufs BYTES
       vrrp_netlink_cmd_rcv_bufs_force <BOOL>
       vrrp_netlink_monitor_rcv_bufs BYTES
       vrrp_netlink_monitor_rcv_bufs_force <BOOL>

@Anj1Krish
Copy link
Author

Anj1Krish commented Dec 28, 2023

Changed the value in config file for following variables
vrrp_netlink_cmd_rcv_bufs BYTES
vrrp_netlink_monitor_rcv_bufs BYTES

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant