Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fpmsyncd] Skip routes to eth0 or docker0 #1606

Merged
merged 2 commits into from
Jan 25, 2021
Merged

Conversation

shi-su
Copy link
Contributor

@shi-su shi-su commented Jan 23, 2021

What I did
Skip routes to eth0 or docker0 in fpmsyncd.

Fix sonic-net/sonic-buildimage#6483

Why I did it
An FRR behavior change from 7.2 to 7.5 makes FRR update the default route to eth0 in interface up/down events. Skipping routes to eth0 or docker0 to avoid such behavior.

How I verified it
Tested locally that the default route does not change in interface up/down events.

Details if related

@lguohan lguohan requested a review from prsunny January 23, 2021 05:00
@shi-su shi-su marked this pull request as ready for review January 23, 2021 06:56
@lguohan lguohan merged commit f4e8245 into sonic-net:master Jan 25, 2021
@imzyxwvu
Copy link

Which FRR behaviour change is it that causes fpmsyncd to update default route? What did this behaviour break?

{
SWSS_LOG_NOTICE("Skip routes to eth0 or docker0: %s %s %s",
destipprefix, nexthops.c_str(), ifnames.c_str());
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take this as a example: At first route 0.0.0.0/0 had a multipath nexthop "Ethernet0,Ethernet1", and some minutes later it became "Ethernet0,Ethernet1,eth0". Last it becomes "eth0". Since the change filters all routes including the eth0 nexthop, in the example this changes prevent "Ethernet0,Ethernet1" from exiting the next-hop list in ASIC. This is a rare situation but I suggest to take this into consideration.

Copy link
Contributor

@lguohan lguohan Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our design, we would not have such mixed scenario.

lguohan pushed a commit that referenced this pull request Jan 25, 2021
Fix sonic-net/sonic-buildimage#6483

An FRR behavior change from 7.2 to 7.5 makes FRR update the default route to eth0 in interface up/down events. Skipping routes to eth0 or docker0 to avoid such behavior.
*/
if (alias == "eth0" || alias == "docker0")
{
SWSS_LOG_NOTICE("Skip routes to eth0 or docker0: %s %s %s",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, route updates can be in INFO level log.

@imzyxwvu
Copy link

After some investigation, I figured that the problem is actually introduced by this change. SONiC uses rtm_table field (which is designed to pass table ID) of netlink_route protocol to pass VRF ID to FPM, and table ID info is lost. Generally there should be only table main in the default VRF, so this is harmless. But in fact zebra allows multiple route tables per VRF (or at least the main VRF). This leads to a mix of default VRF route updates across all tables and leaves a broken APPL_DB ROUTE_TABLE.

See other_tables field in struct zebra_vrf: https://github.com/Azure/sonic-frr/blob/frr/7.5/zebra/zebra_vrf.h#L85

@shi-su
Copy link
Contributor Author

shi-su commented Jan 29, 2021

After some investigation, I figured that the problem is actually introduced by this change. SONiC uses rtm_table field (which is designed to pass table ID) of netlink_route protocol to pass VRF ID to FPM, and table ID info is lost. Generally there should be only table main in the default VRF, so this is harmless. But in fact zebra allows multiple route tables per VRF (or at least the main VRF). This leads to a mix of default VRF route updates across all tables and leaves a broken APPL_DB ROUTE_TABLE.

See other_tables field in struct zebra_vrf: https://github.com/Azure/sonic-frr/blob/frr/7.5/zebra/zebra_vrf.h#L85

@imzyxwvu Thanks for your investigation. I still have some trouble understanding how other_tables leads to this behavior. And it seems that this patch is basically reverting the changes in FRRouting/frr@6dfcd75

@imzyxwvu
Copy link

@imzyxwvu Thanks for your investigation. I still have some trouble understanding how other_tables leads to this behavior. And it seems that this patch is basically reverting the changes in FRRouting/frr@6dfcd75

I'd suggest you to have a look at zebra_vrf_lookup_table_with_table_id.

https://github.com/Azure/sonic-frr/blob/frr/7.5/zebra/zebra_vrf.c#L361

This function includes a lookup on other_tables. The comment also contains some information on table_id.

prsunny added a commit that referenced this pull request Feb 4, 2021
Reduce loglevel to DEBUG for eth0 route update. Ref PR: #1606
daall pushed a commit that referenced this pull request Feb 5, 2021
Reduce loglevel to DEBUG for eth0 route update. Ref PR: #1606
vmittal-msft pushed a commit to vmittal-msft/sonic-swss that referenced this pull request Feb 6, 2021
Reduce loglevel to DEBUG for eth0 route update. Ref PR: sonic-net#1606
DavidZagury pushed a commit to DavidZagury/sonic-swss that referenced this pull request Mar 4, 2021
Fix sonic-net/sonic-buildimage#6483

An FRR behavior change from 7.2 to 7.5 makes FRR update the default route to eth0 in interface up/down events. Skipping routes to eth0 or docker0 to avoid such behavior.
DavidZagury pushed a commit to DavidZagury/sonic-swss that referenced this pull request Mar 4, 2021
Reduce loglevel to DEBUG for eth0 route update. Ref PR: sonic-net#1606
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
Fix sonic-net/sonic-buildimage#6483

An FRR behavior change from 7.2 to 7.5 makes FRR update the default route to eth0 in interface up/down events. Skipping routes to eth0 or docker0 to avoid such behavior.
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
Reduce loglevel to DEBUG for eth0 route update. Ref PR: sonic-net#1606
abdosi pushed a commit that referenced this pull request Oct 6, 2021
Fix sonic-net/sonic-buildimage#6483

An FRR behavior change from 7.2 to 7.5 makes FRR update the default route to eth0 in interface up/down events. Skipping routes to eth0 or docker0 to avoid such behavior.
abdosi added a commit that referenced this pull request Nov 24, 2021
abdosi added a commit that referenced this pull request Feb 18, 2022
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
* Add 'default' option for sFlow.
Signed-off-by: Venkatesan Mahalingam <venkatesan_mahalinga@dell.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect route for 0.0.0.0 in APPL_DB
5 participants