-
-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keepalived Master/Backup swapping frequently - V2.2.0 #1995
Comments
@senthilnathann The log entries are showing that your configuration file is at least 1795 lines long, whereas the configuration you have provided is only 45 lines long. Can you please provide a copy of your actual configuration file, unmodified other than passwords and email addresses, so that we can try and work out what is happening (please either add the configuration file as an attachment here, or email it to me directly). Also, if you are able to generate a stack backtrace with gdb from the coredump that would be very helpful, but your actual configuration is what we need first. |
Hi ,
Please find the attachment of conf Life.
Note : Conf directly sent to your mail-id
Senthil.
…On Mon, Sep 13, 2021 at 11:37 PM Quentin Armitage ***@***.***> wrote:
@senthilnathann <https://github.com/senthilnathann> The log entries are
showing that your configuration file is at least 1795 lines long, whereas
the configuration you have provided is only 45 lines long.
Can you please provide a copy of your actual configuration file,
unmodified other than passwords and email addresses, so that we can try and
work out what is happening (please either add the configuration file as an
attachment here, or email it to me directly). Also, if you are able to
generate a stack backtrace with gdb from the coredump that would be very
helpful, but your actual configuration is what we need first.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1995 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADXEEP3WXBVNHY2DIDALSA3UBY4WRANCNFSM5D55ENPQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Thanks,
*N.SENTHILNATHAN.*
*9176122077*
|
@senthilnathann Many thanks for your config file. I have now identified that the issue is being caused by a bug in the v2.2.0 code when reading configuration files larger than 4096 bytes. Commit ca03dc2 resolved to issue. You can either apply commit ca03dc2 directly to the v2.2.0 source, or better update to v2.2.1 which is v2.2.0 with a small number of bug fixes and improvements. |
@pqarmitage Thank you so much for your quick action . Thanks..... |
@pqarmitage I have upgraded the keepalived version with 2.2.1 , again getting below error and keepalived swapped frequently. Sep 25 00:41:24 172 Keepalived_vrrp[15447]: (VI_GATEWAY): send advert error 22 (Invalid argument) Note : Apart from above errors,we are not getting any other errors ( last time faced errors was not occurred).Same conf file only used. Keepalived -v Keepalived v2.2.1 (01/17,2021) Copyright(C) 2001-2021 Alexandre Cassen, acassen@gmail.com Built with kernel headers for Linux 3.10.0 configure options: --prefix=/root/sen Config options: LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING System options: PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_PREF FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK FRA_OIFNAME IFA_FLAGS IP_MULTICAST_ALL NET_LINUX_IF_H_COLLISION NET_LINUX_IF_ETHER_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION LIBIPVS_NETLINK VRRP_VMAC IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE SO_MARK SCHED_RESET_ON_FORK |
@senthilnathann When keepalived is running, could you please execute Could you please post those files on this issue (or email them to me) - you may want to remove any sensitive information before posting the files. Please also include the full system logs from the time keepalived starts to when the .data files were created. We will then be able to see what state keepalived is in. |
@pqarmitage needed keepalived data files and log file separately sent to your mail-id.Please check. |
@senthilnathann Since keepalived.data shows that the only network interfaces on the system are lo, eth0, eth1 and bond0, I am assuming that eth0 and eth1 are both enslaved to bond0. I also note from keepalived.data that eth1 is showing as not running; e.g. its network cable is disconnected. I suspect that this is the cause of the problems. The messages
also indicate that there is a network problem. From 15:36:41 onwards keepalived was reporting Can you please post/send the following information:
With the above information we should get a better idea about what is happening. |
@pqarmitage eth1 interface was already down state ( before we upgrade a keepalived version form 2.0.7 to 2.2.1) and this error(send advert error 22 ) not occurred in 2.0.7 version,will explain the full details again, Chronology of events : 172.20.133.246 logs: 172.20.133.247 We don't know why we got a send advert error 22 errors in 172.20.133.246 , No networks issues between Master/Backup servers and no systems log printed on that time . ip -d link show After execute the ( kill -USR1 $(cat /run/keepalived.pid) command ,those errors not printed in logs .Master/Backup also not swapped . Why we are getting send advert error 22 errors ? .Please clarify on this . |
The reason you are getting the send advert error 22 errors with v2.2.1, but not with v2.0.7, is that I added to code to print the error in commit 3f038ae, which was first included in v2.0.16. Prior to that any send errors were silently ignored, so presumably they were occurring, but were just not being logged. I will investigate further to see if I can identify what can cause the error 22s. |
@senthilnathann Do you ever get the send advert error 22 errors on 172.20.133.247 when it is master, or are the only occurring on 172.20.133.246? |
@pqarmitage on that day (October 4th) send advert error 22 occurred only in 172.20.133.246 server alone and not occurred in 172.20.133.247 (when it is master) . |
@senthilnathann Are you still getting the error 22s? If so, would it be possible to change your configuration so that the VRRP instances use the eth0 interface rather than bond0. I would like to understand if the problem relates to using a bond interface. |
@pqarmitage I have changed the interface from bond0 to eth0 ,but getting same error and Master/Backup swapped frequently . Oct 27 20:00:46 172 Keepalived_vrrp[19087]: (VI_GATEWAY) Entering BACKUP STATE |
@senthilnathann I am going to produce a patch for you to apply that will log what is being sent to sendmsg(), which is what is returning the error 22, so that we can see if anything changes. Will you be happy to build with that patch for test purposes? |
@pqarmitage Thanks for your prompt reply. Please provide the patch ,I will build and test. |
@senthilnathann That was good timing; I have just completed the patch, which is attached. The patch will write all the parameters and data of the sendmsg() call to /tmp/sendmsg.dmp (feel free to edit the patch if you want to change the location of the file). The file might get quite large, but what we want to see is the details of successful messages, and if anything changes for the messages for which error 22 is returned. |
@pqarmitage I have attached senmsg logs ( success & invalid argument ) .Please check. Few minutes getting ( send advert error 22 (Invalid argument)) and Master/Backup swap happened,after 3-4 minutes that high priority server act as Master ,error not came again. Note : Now I am using bond0 interface. |
@senthilnathann Many thanks for this data. This shows that keepalived is not doing anything different between the successful send of packets at 00:20:43 and the failed messages at 00:20:44. The only difference between the successfully sent packets and the ones returning error 22 is the 6th byte of the iov block, which is the id field of the IPv4 header, which is incremented for each advert sent per VRRP instance. This therefore looks as though something strange is happening in the kernel causing the error 22. Can you have a look at the output of |
@senthilnathann The interface information doesn't appear to show any significant errors. I think we have now established a position that keepalived is not doing anything wrong in the sendmsg (all arguments being passed are the same), so it is difficult to see how we can go any further with investigating keepalived. The only other thought I have had to try is can you ping the master and backup from each other when you are getting the error 22s? A search on the internet for |
@pqarmitage Thanks for your patient and great help on this issue. Please find the issue details , Once again, Thanks for your great help and support.... |
@senthilnathann I am glad we have got to the bottom of this, and that the problem is resolved for you. For those who don't have access to the RedHat solution, the problem occurs when using raw, udp (including multicast) or icmp sockets and there are a large number of neighbours on the network, causing the ARP table to fill up. The recommendation is if there are 900 neighbours,
This can be made permanent by setting If there are more than 1000 neighbours, then the values above would need to be increased accordingly. EINVAL is a somewhat unexpected error for this issue, and really doesn't help identify the problem. The error is returned by kernel function |
Describe the issue
We have upgrading keepalived version to V2.2.0 ,After upgrading ,we are getting segmentation fault error and Master/Backup servers are swapping frequently and facing split brain issue too.
To Reproduce
If we using sync group based keepalived conf,we are getting above issues.
Expected behavior
Keepalived version
Output of
keepalived -v
Keepalived v2.2.0 (01/09,2021)
Copyright(C) 2001-2021 Alexandre Cassen, acassen@gmail.com
Built with kernel headers for Linux 3.10.0
Running on Linux 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
Distro: CentOS Linux 7 (Core)
configure options: --prefix=/root/sen
Config options: LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING
System options: PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_PREF FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK FRA_OIFNAME IFA_FLAGS IP_MULTICAST_ALL NET_LINUX_IF_H_COLLISION NET_LINUX_IF_ETHER_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION LIBIPVS_NETLINK VRRP_VMAC IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE SO_MARK SCHED_RESET_ON_FORK
Distro (please complete the following information):
Details of any containerisation or hosted service (e.g. AWS)
If keepalived is being run in a container or on a hosted service, provide full details
Configuration file:
A full copy of the configuration file, obfuscated if necessary to protect passwords and IP addresses
global_defs {
notification_email {
xxxx@gmail.com
}
notification_email_from xxx@gmail.com
smtp_server xxx
smtp_connect_timeout 30
router_id LVSID01
}
vrrp_sync_group VG1 {
group {
VI_1
VI_GATEWAY
}
}
vrrp_instance VI_GATEWAY {
interface bond0
virtual_router_id 77
priority 100
advert_int 1
higher_prio_send_advert true
smtp_alert
authentication {
auth_type PASS
auth_pass xxxxx
}
virtual_ipaddress {
172.20.1.1
}
}
vrrp_instance VI_1 {
interface bond0
virtual_router_id 76
priority 100
advert_int 1
higher_prio_send_advert true
smtp_alert
authentication {
auth_type PASS
auth_pass xxxxx
}
virtual_ipaddress {
172.20.131.127
172.20.131.130
}
Notify and track scripts
If any notify or track scripts are in use, please provide copies of them
System Log entries
Full keepalived system log entries from when keepalived started
Sep 13 16:23:25 172 Keepalived[31592]: Starting Keepalived v2.2.0 (01/09,2021)
Sep 13 16:23:25 172 Keepalived[31592]: Running on Linux 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 (built for Linux 3.10.0)
Sep 13 16:23:25 172 Keepalived[31592]: Command line: '/sbin/keepalived' '-D'
Sep 13 16:23:25 172 Keepalived[31592]: Opening file '/etc/keepalived/keepalived.conf'.
Sep 13 16:23:25 172 Keepalived[31592]: Configuration file /etc/keepalived/keepalived.conf
Sep 13 16:23:25 172 Keepalived[31593]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Sep 13 16:23:25 172 Keepalived[31593]: Starting Healthcheck child process, pid=31594
Sep 13 16:23:25 172 Keepalived[31593]: Starting VRRP child process, pid=31595
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: Registering Kernel netlink reflector
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: Registering Kernel netlink command channel
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 590) Extra '}' found
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 590) Unknown keyword '}'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 591) Unknown keyword 'real_server'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 591) Unexpected '{' - ignoring
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 592) Unknown keyword 'weight'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 593) Unknown keyword 'uthreshold'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 594) Unknown keyword 'lthreshold'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 596) Unknown keyword 'TCP_CHECK'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 596) Unexpected '{' - ignoring
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 597) Unknown keyword 'connect_timeout'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 598) Unknown keyword 'connect_port'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 599) Unknown keyword '}'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 600) Unknown keyword '}'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 601) Extra '}' found
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 601) Unknown keyword '}'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1781) Unterminated quote 'lthrn/https.sh 172.20.47.143 443"'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1781) Unmatched quote: 'lthrn/https.sh 172.20.47.143 443"'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1785) Unknown keyword 'real_server'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1785) Unexpected '{' - ignoring
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1786) Unknown keyword 'weight'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1787) Unknown keyword 'uthreshold'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1788) Unknown keyword 'lthreshold'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1790) Unknown keyword 'MISC_CHECK'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1790) Unexpected '{' - ignoring
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1791) Unknown keyword 'misc_path'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1792) Unknown keyword 'misc_timeout'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1793) Unknown keyword '}'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1794) Unknown keyword '}'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1795) Extra '}' found
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (/etc/keepalived/keepalived.conf: Line 1795) Unknown keyword '}'
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: Assigned address 172.20.133.246 for interface bond0
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: Assigned address fe80::ec4:7aff:feb0:112a for interface bond0
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: Registering gratuitous ARP shared channel
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (VI_GATEWAY) removing VIPs.
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (VI_1) removing VIPs.
Sep 13 16:23:25 172 Keepalived[31593]: pid 31594 exited due to segmentation fault (SIGSEGV).
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (VI_1) removing E-VIPs.
Sep 13 16:23:25 172 Keepalived[31593]: Please report a bug at https://github.com/acassen/keepalived/issues
Sep 13 16:23:25 172 Keepalived[31593]: and include this log from when keepalived started, a description
Sep 13 16:23:25 172 Keepalived[31593]: of what happened before the crash, your configuration file and the details below.
Sep 13 16:23:25 172 Keepalived[31593]: Also provide the output of keepalived -v, what Linux distro and version
Sep 13 16:23:25 172 Keepalived[31593]: you are running on, and whether keepalived is being run in a container or VM.
Sep 13 16:23:25 172 Keepalived[31593]: A failure to provide all this information may mean the crash cannot be investigated.
Sep 13 16:23:25 172 Keepalived[31593]: If you are able to provide a stack backtrace with gdb that would really help.
Sep 13 16:23:25 172 Keepalived[31593]: Source version 2.2.0
Sep 13 16:23:25 172 Keepalived[31593]: Built with kernel headers for Linux 3.10.0
Sep 13 16:23:25 172 Keepalived[31593]: Running on Linux 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
Sep 13 16:23:25 172 Keepalived[31593]: Command line: '/sbin/keepalived' '-D'
Sep 13 16:23:25 172 Keepalived[31593]: configure options: --prefix=/root/sen
Sep 13 16:23:25 172 Keepalived[31593]: Config options: LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING
Sep 13 16:23:25 172 Keepalived[31593]: System options: PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV6_ADVANCED_API LIBNL3 RTA_ENCAP
Sep 13 16:23:25 172 Keepalived[31593]: RTA_EXPIRES RTA_PREF FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK FRA_OIFNAME IFA_FLAGS
Sep 13 16:23:25 172 Keepalived[31593]: IP_MULTICAST_ALL NET_LINUX_IF_H_COLLISION NET_LINUX_IF_ETHER_H_COLLISION
Sep 13 16:23:25 172 Keepalived[31593]: LIBIPTC_LINUX_NET_IF_H_COLLISION LIBIPVS_NETLINK VRRP_VMAC IFLA_LINK_NETNSID CN_PROC
Sep 13 16:23:25 172 Keepalived[31593]: SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE SO_MARK
Sep 13 16:23:25 172 Keepalived[31593]: SCHED_RESET_ON_FORK
Sep 13 16:23:25 172 Keepalived[31593]: Healthcheck child process(31594) died: Respawning
Sep 13 16:23:25 172 Keepalived[31593]: Please log an issue at https://github.com/acassen/keepalived/issues/
Sep 13 16:23:25 172 Keepalived[31593]: and include a full copy of your keepalived configuration files, and
Sep 13 16:23:25 172 Keepalived[31593]: copies of the keepalived system log entries around the time this happened
Sep 13 16:23:25 172 Keepalived[31593]: Restart of Healthcheck process delayed 0 seconds to limit respawn rate
Sep 13 16:23:25 172 Keepalived[31593]: Starting Healthcheck child process, pid=31598
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (VI_GATEWAY) Entering BACKUP STATE (init)
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: (VI_1) Entering BACKUP STATE (init)
Sep 13 16:23:25 172 Keepalived_vrrp[31595]: VRRP sockpool: [ifindex( 4), family(IPv4), proto(112), fd(12,13)]
Sep 13 16:23:29 172 Keepalived_vrrp[31595]: (VI_GATEWAY) Receive advertisement timeout
Sep 13 16:23:29 172 Keepalived_vrrp[31595]: (VI_1) Receive advertisement timeout
Sep 13 16:23:29 172 Keepalived_vrrp[31595]: (VI_1) Entering MASTER STATE
Sep 13 16:23:29 172 Keepalived_vrrp[31595]: (VI_1) setting VIPs.
Sep 13 16:23:29 172 Keepalived_vrrp[31595]: (VI_1) setting E-VIPs.
Master/backup swap issue is due to below ,
Master log:
Sep 13 16:23:54 172 Keepalived_vrrp[4866]: (VI_1): send advert error 22 (Invalid argument)
Sep 13 16:23:54 172 Keepalived_vrrp[4866]: (VI_GATEWAY): send advert error 22 (Invalid argument).
Backup Log:
Sep 13 16:23:56 172 Keepalived_vrrp[31595]: (VI_GATEWAY) Receive advertisement timeout
Sep 13 16:23:56 172 Keepalived_vrrp[31595]: (VI_GATEWAY) Entering MASTER STATE
Did keepalived coredump?
If so, can you please provide a stacktrace from the coredump, using gdb.
Additional context
The text was updated successfully, but these errors were encountered: