-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bfdd: add BFD support #2294
bfdd: add BFD support #2294
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! A few nits inline below.
bfdd/bfdd.c
Outdated
|
||
static struct quagga_signal_t bfd_signals[] = { | ||
{ | ||
.signal = SIGPIPE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary, signal_init()
takes care of ignoring SIGPIPE for us.
bfdd/bfdctl.h
Outdated
* Protocol definitions | ||
*/ | ||
|
||
#define BFD_CONTROL_SOCK_PATH "/var/run/frr/bfdd.sock" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to do something like this instead (inside configure.ac):
AC_DEFINE_UNQUOTED(BFDD_SOCKET, "$frr_statedir/bfdd.sock",bfdd control socket)
This way we'd respect the --localstatedir
parameter given to the configure script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I didn't know how to do this. I'll fix this in the next iteration.
bfdd/bfd_packet.c
Outdated
static int rcvttl = BFD_RCV_TTL_VAL; | ||
static int pktinfo = BFD_PKT_INFO_VAL; | ||
|
||
typedef struct udp_psuedo_header_s { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
bfdd/bfd_packet.c
Outdated
{ | ||
int sd; | ||
|
||
sd = socket(AF_INET, SOCK_DGRAM | SOCK_CLOEXEC, IPPROTO_UDP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about future portability issues, it would be good to get rid of SOCK_CLOEXEC
and use fcntl(fd, F_SETFD, flags | FD_CLOEXEC)
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on the portability part now. I'll fix this in the next iteration.
bfdd/bfdd_vty.c
Outdated
{ | ||
bfd_session *bs, *tmp; | ||
|
||
HASH_ITER (sh, session_hash, bs, tmp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since we're iterating over a hash table, the peers won't be ordered in the output. What you could do is to add all elements from the hash table to a sorted linked list (like Quentin did on a recent PR) and iterate from there. Alternatively we can wait for the new northbound API to be merged since it will provide a better infrastructure to solve this problem.
bfdd/bfdd_vty.c
Outdated
} | ||
|
||
if (bs->pl) | ||
vty_out(vty, "\t\tlabel: %s\n", bs->pl->pl_label); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to replace the \t
's by spaces and remove the colon as well.
bfdd/bfd.c
Outdated
#include <stdio.h> | ||
#include <stdbool.h> | ||
#include <stdlib.h> | ||
#include <unistd.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: zebra.h includes most of these system headers for us.
bgpd/bgp_bfd_vty.c
Outdated
{ | ||
struct bgp_peer_notification *bpn; | ||
|
||
bpn = calloc(1, sizeof(*bpn)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in several other places I think it would be better to use XCALLOC/XMALLOC
instead.
doc/manpages/bfd-options.rst
Outdated
|
||
Opens the BFD daemon control socket located at the pointed location. | ||
|
||
(default: /var/run/frr/ebfdd.sock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be bfdd.sock
instead of ebfdd.sock
now?
configure.ac
Outdated
;; | ||
|
||
*) | ||
AC_MSG_ERROR(["bfdd is not yet supported in this platform"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned in the commit message that bfdd uses some socket filtering code that is only available on Linux. Could you provide more details on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only VxLAN definitely needs the filter code. I'm working in something more generic for the echo packets.
Here is the code I was mentioning:
https://github.com/opensourcerouting/frr/blob/bfd-final/bfdd/bfd_packet.c#L116
Linux uses:
setsockopt(s, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf));
*BSDs use:
s = open("/dev/bpf");
// https://github.com/openbsd/src/blob/master/usr.sbin/dhcrelay6/bpf.c#L103
p.bf_len = dhcp6_bpf_sfilter_len;
p.bf_insns = dhcp6_bpf_sfilter;
if (ioctl(s, BIOCSETF, &p) == -1)
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rzalamena thanks for the updates, I couldn't test the BSD support but the latest commits look excellent.
I think we're getting there, but I found a few more problems on bfdd using netgen:
(My test topology: https://gist.github.com/rwestphal/a3279272313c2e27120742da822a10f6)
1 - When shutting down the bridge interface connected to rt2, rt2 clears all BGP sessions. This includes the session with rt1, which is reestablished shortly afterwards.
The sessions to rt3 and rt4 are put into the Active
state, whereas Idle
is expected according to what @xiaofeng6WIND said in the previous PR.
2 - When killing the bfdd daemon on rt2, nothing happens and all BGP sessions remain up and running. I believe the client daemons should detect when bfdd dies and act accordingly.
3 - I have a topology with 4 routers and in one of them the transmit-interval
command is now showing up in the running configuration:
rt4-bfdd(config)# bfd
rt4-bfdd(config-bfd)# peer 2001:db8:1::2 local-address 2001:db8:1::4
rt4-bfdd(config-bfd-peer)# receive-interval 50
rt4-bfdd(config-bfd-peer)# transmit-interval 50
rt4-bfdd(config-bfd-peer)# no shutdown
rt4-bfdd(config-bfd-peer)#
rt4-bfdd(config-bfd-peer)# do sh run
[snip]
!
bfd
!
peer 2001:db8:1::2
receive-interval 50
no shutdown
!
!
end
4 - The command to configure IPv6 BFD peers doesn't work when it's present in the startup configuration, and the log file shows no errors. To make ipv6 peers work, I need to configure them in the CLI manually. Example:
debian(config)# bfd
debian(config-bfd)# peer 2001:db8:1::4 local-address 2001:db8:1::2
debian(config-bfd-peer)# receive-interval 50
debian(config-bfd-peer)# transmit-interval 50
debian(config-bfd-peer)# no shutdown
5 - bfdd is segfaulting on exit:
BFD: Received signal 15 at 1528381547 (si_addr 0x25, PC 0x7fd25805f690); exiting...
Program counter: /lib/x86_64-linux-gnu/libc.so.6(__poll+0x10)[0x7fd25805f690]
Backtrace for 11 stack frames:
/usr/local/lib/libfrr.so.0(zlog_backtrace_sigsafe+0x34)[0x7fd2590de665]
/usr/local/lib/libfrr.so.0(zlog_signal+0x45b)[0x7fd2590de321]
/usr/local/lib/libfrr.so.0(+0x6ec57)[0x7fd2590fdc57]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7fd2583300c0]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x10)[0x7fd25805f690]
/usr/local/lib/libfrr.so.0(+0x7f1ca)[0x7fd25910e1ca]
/usr/local/lib/libfrr.so.0(thread_fetch+0x21b)[0x7fd25910fa83]
/usr/local/lib/libfrr.so.0(frr_run+0x22e)[0x7fd2590dc06d]
bfdd(main+0x2a7)[0x558640526b7f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd257fa02b1]
bfdd(_start+0x2a)[0x5586405263da]
no thread information available
6 - Since the bfdd daemon was integrated into FRR now, do we have any plans to remove or adapt the old BFD commands from bgpd/ospfd/ospf6d?
/* Set ifname to empty to signalize failures. */ | ||
memset(ifname, 0, ifnamelen); | ||
|
||
if (if_indextoname(ifindex, ifname_tmp) == NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of using if_indextoname()
and if_nametoindex()
, you can use ifindex2ifname()
and ifname2ifindex()
respectively. The first functions aren't VRF aware and should be avoided in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no VRF handling yet, but I'll keep this in mind when implementing the VRF part later. Thanks for the tip!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @rwestphal is saying it's easier to switch to these now, rather than rebuilding for vrf aware later (?).
bfdd/bfdd_vty.c
Outdated
vty_out(vty, " peer %s", satostr(&bs->mhop.peer)); | ||
vty_out(vty, " multihop"); | ||
vty_out(vty, " local-address %s", | ||
satostr(&bs->mhop.local)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we display the local-address
even for single hop peers?
I have a peer configured with peer 2001:db8:1::4 local-address 2001:db8:1::2
but only peer 2001:db8:1::4
is appearing the running configuration.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Just had a conversation with @rzalamena about this. Recommendations for approach:
This way bfd api stays the same and just works for old/new styles |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond the issues @rwestphal brought up, and the check failure, these look good to me.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-4695/ This is a comment from an EXPERIMENTAL automated CI system. CLANG Static Analyzer Summary
No Changes in Static Analysis warnings compared to base5 Static Analyzer issues remaining.See details at |
Save a few lines when iterating over JSON objects using the new JSON_FOREACH macro. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Import source code from external `bfdd` daemon ported from Cumulus PTM. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add BFD daemon to the build process and packaging instructions. Currently the bfdd daemon does nothing, this is just to document how the daemon insertion step occured. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Implement vty shell integration and allow `bfdd` to be configured through FRR's vtysh. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When `bfdd` is enabled - which it is by default - re-route the PTM-BFD messages to the FRR's internal BFD daemon instead of the external PTM daemon. This will help the migration of BFD implementations and avoid duplicating code. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When BFD timers are configured, don't show it anymore in the daemon side. This will help us migrate the timers command from daemons to `bfdd`. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add BFD daemon documentation: * commands; * man page; * manual / description; Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When configured transmission speed doesn't match the actual speed, show the difference in the output. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
After configuring a new value set the polling bit to negotiate speeds again next transmission cycle. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Show local-address on single hop when configured. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
This will make `bfdd` synchronize with its client when zebra dies or bfdd is restarted. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Avoid a memory leak on client daemons restart by getting rid of old registrations. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Simplify code and remove unnecessary log messages. The old log messages are going to be shown by the caller anyway. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Most of the headers we need are included by zebra.h, so lets simplify this. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Show a little more details, remove some duplicated calls and remove the macro compatibility with old debugging functions. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
On `zebra` / `bfdd` shutdown we now clean up all client data to avoid memory leaks (ghost clients). This also prevents 'slow' shutdown on `zebra` sparing us from seeing some rare topotests shutdown failures (signal handler getting stopped by signal). Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When using link-local address we must specify the scope-id for the address in order to bind to the interface. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Don't show BFD commands with timers since it might confuse users ("show running-config" won't display timers in client daemons anymore), but keep accepting this command from previous configurations. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
I just rebased the code to fix code conflicts with the newly imported |
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-4776/ This is a comment from an EXPERIMENTAL automated CI system. CLANG Static Analyzer Summary
No Changes in Static Analysis warnings compared to base4 Static Analyzer issues remaining.See details at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. The issue about dynamic peers showing up in the running configuration of bfdd is still present, but I wouldn't block this PR because of this (this problem can be solved when converting the bfdd commands to the new northbound model later).
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-4827/ This is a comment from an EXPERIMENTAL automated CI system. CLANG Static Analyzer Summary
No Changes in Static Analysis warnings compared to base4 Static Analyzer issues remaining.See details at |
This PR introduces a new daemon to FRR which adds support for configuring BFD monitoring and integration with routing daemons. The new code is actually a port of Cumulus BFD implementation to a stand-alone daemon with a few differences:
shutdown
;bfdctl
somehow);For people willing to test the code I have written a small manual that can be found here. Since setting up a topology with multiple machines might be expensive and time consuming, I have also written a topology test in topotest that can be easily expanded.
I don't expect this code to get merged soon as it requires testing, but I want to bring attention to it so people can already start using / testing.
Feel free to report problems and ask questions in the FRR Slack channel
#bfdd
. You can find instructions on how to join the channel here.Note 1: Cumulus allowed the daemon code to be relicensed with GPLv2:
rzalamena/bfdd#1
Note 2: this PR deprecates the previous BFD implementation that was just an adapter:
#1853
Note 3: support for OSes other than Linux is on going, it will require more socket filtering code like what
isisd
has.