Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: Fix startup problem with namespaces and interface names #17650

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

donaldsharp
Copy link
Member

Currently we have a startup problem with interface yang data passed through the system. The sequence of events is this:

a) mgmtd is sending yang data to daemons before the daemon is connected to zebra.
b) the upper level daemon connects to zebra and gets if zebra is configured to use namespace based vrfs.

Since a) happens before b), lib/if.c is using operational data about the system to decide if the yang passed in is valid. Thus making it invalid in the above sequence of events. If b) happens before a) then things just work.

During debugging I noticed that the interface name passed in during yang was one of three things:

a) <namespace name>:<interface name>
b) <vrf name>:<interface name>
c) <interface name>

Remove the dependancy on knowing the backend to parse the interface name. If there is a delimiter grab the interface name as appropriate and put in the vrf name either the vrf name or the namespace name. If there is no delimeter just return a empty vrf name and set the interface name as appropriate.

Copy link
Contributor

@idryzhov idryzhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always receive "ifname" in case of vrflite backend, and always receive "netns:ifname" in case of netns backend. If you see that it works differently, then there should be some kind of a race condition, when a daemon receives configuration before it knows the backend type.

The change itself is correct, but it won't fix the actual underlying problem. And I worry it will make the problem even more hidden and difficult to debug. If I would change something in this code, I would rather add some asserts for vrflite case, to make sure that we never receive an ifname with a colon in its name.

A bigger problem is that it's not the only place where netns backend affects daemons. If you grep for vrf_is_backend_netns/vrf_get_backend you'll see that it's used in several daemons and also in if/vrf lib functions. So if a daemon starts processing configuration before it knows the backend type, it will end up in a bad state anyway.

To fix all potential race conditions, I propose to make -n/-vrfwnetns option global and require users to specify it for all daemons. Making it a startup option for all daemons makes sense, because we don't support changing the backend in runtime anyway. It's also easy to do using frr_global_options variable in frr/daemons file.

What do you think?

static void netns_ifname_split(const char *xpath, char *ifname, char *vrfname)
/*
* sometimes the interface name passed is just the interface name `ifname`
* sometimes it's `<vrf name>:<interface>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true. At least it shouldn't be. For vrf-lite it should always be just ifname, and for netns it should always be netns:ifname.

Currently we have a startup problem with interface yang data
passed through the system.  The sequence of events is this:

a) mgmtd is sending yang data to daemons before the daemon
is connected to zebra.
b) the upper level daemon connects to zebra and gets if
zebra is configured to use namespace based vrfs.

Since a) happens before b), lib/if.c is using operational
data about the system to decide if the yang passed in
is valid.  Thus making it invalid in the above sequence
of events.  If b) happens before a) then things just work.

During debugging I noticed that the interface name passed
in during yang was one of three things:

a) `<namespace name>:<interface name>`
c) `<interface name>`

Remove the dependancy on knowing the backend to parse the interface
name.  If there is a delimiter grab the interface name as appropriate
and put in the namespace name.  If there is no delimeter just return
a empty vrf name and set the interface name as appropriate.
With this change we just need to look at the vrf namespace name to
decide if it is necessary to treat as namespace or not.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
@donaldsharp donaldsharp force-pushed the if_vrfbackend_problem branch from 9bd8ef1 to 6c7d696 Compare December 17, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants