Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machine restart and enter BACKUP state but no advert received even master exist #2158

Closed
qqliuxiaoran opened this issue Jul 6, 2022 · 8 comments

Comments

@qqliuxiaoran
Copy link

qqliuxiaoran commented Jul 6, 2022

Describe the issue
I have two machine nodes, 10.19.220.67 for master, 10.19.220.62 for backup. I power off 10.19.220.62 machine for a while and restart it then start Keeaplived. Vrrp instance(instance_10.19.220.106) turn BACKUP state and turn MASTER after 30 seconds(advert_int 10;down_timer_adverts default for 3)
Snipaste_2022-07-06_13-34-38
During this period, the master node(10.19.220.67) keeps running.
Snipaste_2022-07-06_13-34-53
Shouldn't 10.19.220.62 node always maintain backup?

To Reproduce
Power off machine

Expected behavior
When restart machine and start Keepalived, vrrp instance turn BACKUP state and stay.

Keepalived version
Keepalived v2.2.7 (01/16,2022)

Copyright(C) 2001-2022 Alexandre Cassen, acassen@gmail.com

Built with kernel headers for Linux 3.10.0
Running on Linux 4.19.90-17.ky10.x86_64 #1 SMP Sun Jun 28 15:41:49 CST 2020
Distro: Kylin Linux Advanced Server V10 (Tercel)

configure options: --prefix=/opt_zfs/keepalived/keepalived_release/ CFLAGS=-I/usr/local/openssl LDFLAGS=-L/usr/local/lib64/ -Wl,-rpath,./:./bin/:./lib:../sbin:../lib -Wl,-z,origin

Config options: LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT INIT=systemd

System options: VSYSLOG LIBKMOD LIBNL1 RTA_ENCAP RTA_EXPIRES RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTA_VIA IFA_FLAGS NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION LIBIPVS_NETLINK IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE SO_MARK

Distro (please complete the following information):
Linux 4.19.90-17.ky10.x86_64

Details of any containerisation or hosted service (e.g. AWS)
Physical machine

Configuration file:
Config file at 10.19.220.67
vrrp_instance instance_10.19.220.106 {
state BACKUP
interface eth0
virtual_router_id 106
priority 60
advert_int 10
authentication {
auth_type PASS
auth_pass xxxxxx
}
virtual_ipaddress {
10.19.220.106
}
unicast_src_ip 10.19.220.67
unicast_peer {
10.19.220.62
}
}

Config file at 10.19.220.62
vrrp_instance instance_10.19.220.106 {
state MASTER
interface eth0
virtual_router_id 106
priority 10
advert_int 10
authentication {
auth_type PASS
auth_pass xxxxxx
}
virtual_ipaddress {
10.19.220.106
}
unicast_src_ip 10.19.220.62
unicast_peer {
10.19.220.67
}
}

Notify and track scripts
none

System Log entries
Full keepalived system log entries from when keepalived started

Did keepalived coredump?
If so, can you please provide a stacktrace from the coredump, using gdb.

Additional context
State switching with very very very short time interval has brought me great trouble.
please help! thank you!

@pqarmitage
Copy link
Collaborator

Yes, 10.19.220.62 should remain in backup state and not transition to master state.

What happens if you just stop keepalived on 10.19.220.62 and then restart keepalived. Does it transition to master state for the very short time interval? or is it that the problem only occurs after the machine has be restarted and then keepalived starts.

Whenever this issue has been reported before it has ALWAYS been found to be a problem with the networking stack/hardware not being fully operational until some time after keepalived starts. The consequence of that is that the system starting up does not receive the VRRP adverts from the master, and itself transitions to master state (as you are seeing). Once the networking stack/hardware has fully started up, then keepalived on the newly started system starts seeing the adverts and reverts to backup mode.

It was for this reason that I added the global_defs keyword vrrp_startup_delay that allows a delay after keepalived starts before the vrrp instaces start working. If the delay is set high enough, then it should allow time for the network stack/hardware to have started up properly before the vrrp protocol starts running.

In your case, it appears that packets start being received only after they start being sent (unless it is a coincidence that as soon as packets start being sent it starts receiving them). Using a vrrp_startup_delay of 32 seconds may resolve the problem, although it may be necessary to somehow send some packets on the network interfaces before keepalived starts.

@qqliuxiaoran
Copy link
Author

@pqarmitage
In your case, it appears that packets start being received only after they start being sent (unless it is a coincidence that as soon as packets start being sent it starts receiving them) —— Yes! it is not a coincidence! every time.

Does it transition to master state for the very short time interval? or is it that the problem only occurs after the machine has be restarted and then keepalived starts. —— Only occurs after the machine has be restarted and then keepalived starts,the system starting up does not receive the VRRP adverts from the master, and itself transitions to master state and quickly transitions to backup state(as you say 'only after they start being sent'). To tell the truth, i have Notify scripts, the problem is that my scripts may run failed with concurrency.

Using a vrrp_startup_delay of 32 seconds may resolve the problem —— I will try this, thank you very much!

@qqliuxiaoran
Copy link
Author

@pqarmitage
Using a vrrp_startup_delay of 32 seconds may resolve the problem, although it may be necessary to somehow send some packets on the network interfaces before keepalived starts. —— I've tried vrrp_startup_delay 32 but do not work, Receive advertisement timeout 32 seconds late and transitions to MASTER, packets start being received only after they start being sent

@pqarmitage
Copy link
Collaborator

@qqliuxiaoran I'm sorry to say that you will need to investigate why keepalived starts receiving VRRP adverts only after it has sent one. That is not normal behaviour and we haven't seen it elsewhere, so I suspect that it is some feature peculiar to your system/configuration/distro (I don't recall any previous reports from anyone using Kylin).

@qqliuxiaoran
Copy link
Author

@pqarmitage I forgot one thing. This happens only when you force a shutdown(power off), not when you reboot.

@qqliuxiaoran
Copy link
Author

@pqarmitage Maybe I should change my way of thinking, is there any configuration that can create a time interval between two notify, my scripts will not run with concurrency.

@pqarmitage
Copy link
Collaborator

@qqliuxiaoran I think your comment before last (i.e. the problem happens only after power off) indicates that this is not a problem with keepalived itself (since it starts up fine after a reboot) but something to do with the way your system starts up after a poweroff.

There is no way in keepalived to control the running of notify scripts. The scripts are fire and forget, and they are fired when the event occurs and there is not a queue of events to be notified. Further, keepalived does not detect when a notify script completes. I can think of two options, 1) you implement some form of locking within your scripts, or 2) (and this would be the better solution) use a notify fifo rather that the scripts, and the process reading the fifo could then execute the appropriate scripts sequentially.

@qqliuxiaoran
Copy link
Author

@pqarmitage I also think it's a system/network problem, not a Keepalived problem. I will solve it in other ways, thank you very much anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants