Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel default route added after frr starts is marked inactive #10104

Closed
ajschorr opened this issue Nov 21, 2021 · 5 comments
Closed

kernel default route added after frr starts is marked inactive #10104

ajschorr opened this issue Nov 21, 2021 · 5 comments
Labels
triage Needs further investigation zebra

Comments

@ajschorr
Copy link


On a Rocky 8.5 Linux system with FRR versions 7.5, 8.0, or 8.1 installed, a kernel default route
installed after FRR has started is marked as inactive and ignored. I have a cable modem which
picks up a default route after FRR starts, and the route it injects into the kernel routing table
is ignored by FRR. If I restart FRR, the route is accepted properly.

[ ] Did you check if this is a duplicate issue?
[ ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce
Install FRR on a system with a cable modem. Reboot the system. Note that FRR starts
before the DHCP interface comes up. Note that the kernel default route created by DHCP
is ignored in favor of an inferior route received by OSPF.

Expected behavior
The kernel default route should be accepted as the best route.

Screenshots

Hello, this is FRRouting (version 8.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

ti14# show ip route 0.0.0.0/0
Routing entry for 0.0.0.0/0
  Known via "ospf", distance 110, metric 103, best
  Last update 00:00:44 ago
  * 192.168.39.5, via lan0.9, weight 1

Routing entry for 0.0.0.0/0
  Known via "kernel", distance 0, metric 0
  Last update 00:01:47 ago
  * 207.237.112.1, via lan1 inactive

Versions

  • OS Version: Rocky Linux 8.5
  • Kernel: 4.18.0-348.2.1.el8_5.x86_64
  • FRR Version: 7.5, 8.0, and 8.1

Additional context
There's some additional info on the FROG mailing list here:
https://lists.frrouting.org/pipermail/frog/2021-November/001171.html

@ajschorr ajschorr added the triage Needs further investigation label Nov 21, 2021
@ajschorr
Copy link
Author

On the mailing list, Donald Sharp had requested that I make a log with debug zebra rib detail.
I made 2 logfiles showing the normal case where FRR starts after the kernel route had already been installed by DHCP (zebra.restart.log) and the buggy case where the system boots up and FRR starts before DHCP has installed the default route (zebra.bootup-bug.log). This gives an idea of the timing:

[root@ti14 frr]# journalctl -b | egrep -i 'frr|dhcp'
Nov 14 10:32:55 ti14 systemd[1]: Starting FRRouting...
Nov 14 10:32:56 ti14 watchfrr[904]: watchfrr 7.5 starting: vty@0
Nov 14 10:32:56 ti14 watchfrr[904]: zebra state -> down : initial connection attempt failed
Nov 14 10:32:56 ti14 watchfrr[904]: ospfd state -> down : initial connection attempt failed
Nov 14 10:32:56 ti14 watchfrr[904]: staticd state -> down : initial connection attempt failed
Nov 14 10:32:56 ti14 watchfrr[904]: Forked background command [pid 905]: /usr/lib/frr/watchfrr.sh restart all
Nov 14 10:32:56 ti14 watchfrr.sh[913]: Cannot stop staticd: pid file not found
Nov 14 10:32:56 ti14 watchfrr.sh[915]: Cannot stop ospfd: pid file not found
Nov 14 10:32:56 ti14 watchfrr.sh[917]: Cannot stop zebra: pid file not found
Nov 14 10:32:56 ti14 watchfrr[904]: zebra state -> up : connect succeeded
Nov 14 10:32:56 ti14 watchfrr[904]: ospfd state -> up : connect succeeded
Nov 14 10:32:56 ti14 watchfrr[904]: staticd state -> up : connect succeeded
Nov 14 10:32:56 ti14 watchfrr[904]: all daemons up, doing startup-complete notify
Nov 14 10:32:56 ti14 frrinit.sh[707]: Started watchfrr
Nov 14 10:32:56 ti14 systemd[1]: Started FRRouting.
Nov 14 10:33:00 ti14 dhclient[1823]: DHCPREQUEST on lan1 to 255.255.255.255 port 67 (xid=0x7fabd46c)
Nov 14 10:33:07 ti14 dhclient[1823]: DHCPREQUEST on lan1 to 255.255.255.255 port 67 (xid=0x7fabd46c)
Nov 14 10:33:21 ti14 dhclient[1823]: DHCPDISCOVER on lan1 to 255.255.255.255 port 67 interval 8 (xid=0xe1a98663)
Nov 14 10:33:29 ti14 dhclient[1823]: DHCPDISCOVER on lan1 to 255.255.255.255 port 67 interval 10 (xid=0xe1a98663)
Nov 14 10:33:39 ti14 dhclient[1823]: DHCPDISCOVER on lan1 to 255.255.255.255 port 67 interval 19 (xid=0xe1a98663)
Nov 14 10:33:39 ti14 dhclient[1823]: DHCPREQUEST on lan1 to 255.255.255.255 port 67 (xid=0xe1a98663)
Nov 14 10:33:39 ti14 dhclient[1823]: DHCPOFFER from 10.22.200.1
Nov 14 10:33:39 ti14 dhclient[1823]: DHCPACK from 10.22.200.1 (xid=0xe1a98663)

These logs are from 7.5, but I see the same problem in 8.0 and 8.1.
zebra.restart.log
zebra.bootup-bug.log

@qlyoung qlyoung added the zebra label Nov 23, 2021
@mruprich
Copy link
Contributor

Hi, is there any update here?

@eqvinox
Copy link
Contributor

eqvinox commented Jul 12, 2022

Quick look:

2021/11/14 10:33:58.096 ZEBRA: default(0:254):207.237.112.0/21: Processing rn 0x55d860b53020
2021/11/14 10:33:58.096 ZEBRA: default(0:254):207.237.112.0/21: Examine re 0x55d860b52c70 (connected) status 11 flags 8 dist 0 metric 0
2021/11/14 10:33:58.096 ZEBRA: default(0:254):207.237.112.0/21: After processing: old_selected 0x55d860b52c70 new_selected 0x0 old_fib 0x55d860b52c70 new_fib 0x0
2021/11/14 10:33:58.096 ZEBRA: default(0:254):207.237.112.0/21: Deleting route rn 0x55d860b53020, re 0x55d860b52c70 (connected)
2021/11/14 10:33:58.096 ZEBRA: 0:207.237.112.0/21: Redist del: re 0x55d860b52c70 (connected), new re 0x0 (None)
2021/11/14 10:33:58.096 ZEBRA: rib_process: (0:254):207.237.112.0/21: rn 0x55d860b53020, removing re 0x55d860b52c70
2021/11/14 10:33:58.096 ZEBRA: rib_unlink: (0:254):207.237.112.0/21: rn 0x55d860b53020, re 0x55d860b52c70
2021/11/14 10:33:58.096 ZEBRA: rib_gc_dest: (0:0):207.237.112.0/21: removing dest from table
2021/11/14 10:33:58.096 ZEBRA: default(0:0):207.237.112.0/21: rn 0x55d860b53020 dequeued from sub-queue 1
2021/11/14 10:33:58.096 ZEBRA: default(0:254):207.237.112.0/21: Processing rn 0x55d860b531c0
2021/11/14 10:33:58.096 ZEBRA: default(0:254):207.237.112.0/21: Examine re 0x55d860b52bc0 (connected) status 1 flags 8 dist 0 metric 0
2021/11/14 10:33:58.096 ZEBRA: default(0:254):207.237.112.0/21: After processing: old_selected 0x55d860b52bc0 new_selected 0x0 old_fib 0x0 new_fib 0x0
2021/11/14 10:33:58.096 ZEBRA: 0:207.237.112.0/21: Redist del: re 0x55d860b52bc0 (connected), new re 0x0 (None)
2021/11/14 10:33:58.096 ZEBRA: rib_process: (0:254):207.237.112.0/21 (MRIB): rn 0x55d860b531c0, removing re 0x55d860b52bc0
2021/11/14 10:33:58.096 ZEBRA: rib_unlink: (0:254):207.237.112.0/21 (MRIB): rn 0x55d860b531c0, re 0x55d860b52bc0
2021/11/14 10:33:58.096 ZEBRA: rib_gc_dest: (0:0):207.237.112.0/21 (MRIB): removing dest from table
2021/11/14 10:33:58.096 ZEBRA: default(0:0):207.237.112.0/21: rn 0x55d860b531c0 dequeued from sub-queue 1
2021/11/14 10:33:58.096 ZEBRA: default(0:254):0.0.0.0/0: Processing rn 0x55d8609ea610
2021/11/14 10:33:58.096 ZEBRA: default(0:254):0.0.0.0/0: Examine re 0x55d860b52760 (ospf) status 0 flags 0 dist 110 metric 200
2021/11/14 10:33:58.096 ZEBRA: default(0:254):0.0.0.0/0: Examine re 0x55d860b53130 (kernel) status 12 flags 8 dist 0 metric 0
2021/11/14 10:33:58.096 ZEBRA:         nexthop_active: Matched against ourself and prefix length is not max bit length
2021/11/14 10:33:58.096 ZEBRA:         nexthop_active_check: Unable to find active nexthop
2021/11/14 10:33:58.096 ZEBRA: default(0:254):0.0.0.0/0: After processing: old_selected 0x55d860b53130 new_selected 0x55d860b52760 old_fib 0x55d860b53130 new_fib 0x55d860b52760
2021/11/14 10:33:58.096 ZEBRA: default(0:254):0.0.0.0/0: Updating route rn 0x55d8609ea610, re 0x55d860b52760 (ospf) old 0x55d860b53130 (kernel)
2021/11/14 10:33:58.096 ZEBRA: default(0:254):0.0.0.0/0: rn 0x55d8609ea610 dequeued from sub-queue 2

looks like 207.237.112.1 is disappearing. It's re-added later

2021/11/14 10:34:01.804 ZEBRA: rib_add_multipath_nhe: (0:0):207.237.112.0/21: Inserting route rn 0x55d860b53020, re 0x55d860b4f870 (connected) existing (nil)

But at that point the INSTALLED flag is no longer set on the kernel 0.0.0.0/0 route - presumably because zebra assumes the kernel deleted the 0.0.0.0/0 route.

2021/11/14 10:34:02.495 ZEBRA: default(0:254):0.0.0.0/0: Processing rn 0x55d8609ea610
2021/11/14 10:34:02.495 ZEBRA: default(0:254):0.0.0.0/0: Examine re 0x55d860b52760 (ospf) status 11 flags 8 dist 110 metric 200
2021/11/14 10:34:02.495 ZEBRA: default(0:254):0.0.0.0/0: Examine re 0x55d860b53130 (kernel) status 10 flags 0 dist 0 metric 0
2021/11/14 10:34:02.495 ZEBRA: default(0:254):0.0.0.0/0: After processing: old_selected 0x55d860b52760 new_selected 0x0 old_fib 0x55d860b52760 new_fib 0x0
2021/11/14 10:34:02.495 ZEBRA: default(0:254):0.0.0.0/0: Deleting route rn 0x55d8609ea610, re 0x55d860b52760 (ospf)
2021/11/14 10:34:02.495 ZEBRA: 0:0.0.0.0/0: Redist del: re 0x55d860b52760 (ospf), new re 0x0 (None)
2021/11/14 10:34:02.495 ZEBRA: rib_process: (0:254):0.0.0.0/0: rn 0x55d8609ea610, removing re 0x55d860b52760
2021/11/14 10:34:02.495 ZEBRA: rib_unlink: (0:254):0.0.0.0/0: rn 0x55d8609ea610, re 0x55d860b52760
2021/11/14 10:34:02.495 ZEBRA: default(0:254):0.0.0.0/0: rn 0x55d8609ea610 dequeued from sub-queue 2

(note "flags 0" on re re 0x55d860b53130)

Possibly related to issue #11592 (address remove/add)?

@github-actions
Copy link

github-actions bot commented Jan 9, 2023

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

@frrbot
Copy link

frrbot bot commented Jan 9, 2023

This issue will be automatically closed in the specified period unless there is further activity.

@frrbot frrbot bot closed this as completed Jan 16, 2023
@frrbot frrbot bot removed the autoclose label Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation zebra
Projects
None yet
Development

No branches or pull requests

4 participants