Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MCLAG] the change of port mac would trigger teamd to send LACP pdu instantly #3764

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

shine4chen
Copy link
Contributor

@shine4chen shine4chen commented Nov 15, 2019

Signed-off-by: shine.chen shine.chen@mediatek.com

- What I did
In MCLAG scenario when keep-alive connection between active and standby node is down , standby node port mac will be changed back to his original mac. After host node receives LACP PDU from standby node, it will remove the local port connected to standby node from port-channel and switch all traffic to active node. But In existed teamd implementation when the port src mac is changed teamd doesn't send LACP PDU instantly , but wait for timer event expiry to send it. It would cause host still forward half of the traffic to standby node for quite a long period( maybe up to 30 seconds). During this period these traffic would be dropped by standby node. So we add a patch for teamd here. The change of port mac would trigger teamd to send LACP pdu instantly.

- How I did it
The change of port mac would trigger teamd to send LACP pdu instantly.

- How to verify it
After apply this patch the disruptive time decrease to less than 20ms.

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: shine.chen <shine.chen@mediatek.com>
@shine4chen shine4chen changed the title The change of port mac would trigger teamd to send LACP pdu instantly [MCLAG] the change of port mac would trigger teamd to send LACP pdu instantly Nov 15, 2019
@shine4chen
Copy link
Contributor Author

retest vs please

@shine4chen
Copy link
Contributor Author

please help to review it @rlhui

shine.chen and others added 3 commits February 20, 2020 20:28
Signed-off-by: shine.chen <shine.chen@mediatek.com>
Signed-off-by: shine.chen <shine.chen@mediatek.com>
@shine4chen
Copy link
Contributor Author

@pavel-shirshov Could you please to review and approve this PR?

@shine4chen
Copy link
Contributor Author

retest vsimage please

@madhukar-kamarapu
Copy link

madhukar-kamarapu commented Feb 22, 2020

@shine4chen - This code change is not required in teamd to send the LACPDUs instantly when MCLAG on the Standby device is DOWN (the port-channel MAC address is reverted back to the device's MAC address).

Your commit (https://github.com/shine4chen/sonic-buildimage/blob/b697da2fb436cf0b430d47378acce8abfb2a4562/src/iccpd/src/iccp_netlink.c#L675
) to flap admin status of the port-channel will restart the LACP SM in teamd. When the LACP SM is restarted for all the member ports, the LACPDUs are transmitted immediately with the new MAC address.

The above changes are from your PR (Iccpd support ipv6-nd) -
shine4chen@b697da2#diff-5b542ccf50497fd7faa6e5c5dfa20597R675

But there is a bug in teamd to handle the above change (restart the LACP SM when admin status is flapped):
During MCLAG test scenarios, when a portchannel is removed/added from an operational MCLAG, the ICCPd component issues a MAC change, followed by a IFF_UP clear & IFF_UP set operations in quick succession. Quite often (depending on the load on the system and the timings of various events in the system) teamd does not get to process the IFF_UP clear operation. This results in teamd not syncing-up immediately with the partner upon MAC change; and it can take a max of 30 seconds (LACP timeout) to complete the synchronization.

Teamd relies on RTM_NEWLINK messages to be notified of the IFF_UP clear and set operations from the kernel. When teamd receives an RTM_NEWLINK message from the kernel, it turns around and enquires the kernel for a full set of information regarding the link event. The kernel populates all the contents of the netlink message, including updating/overwriting the flags field (that carries the IFF_UP status) with its current state in the kernel. Depending on the timing of the call into the kernel, a previous IFF_UP state could have been changed in the kernel, hence overwriting the previous flags as it returns. If this happens, the teamd application would end-up missing a IFF_UP event (clear or set), thereby not effecting state-machine changes properly.

Here is the patch for above mentioned problem, I'll be submitting a PR for the fix in teamd:


diff --git a/libteam/ifinfo.c b/libteam/ifinfo.c
index a7a83ac..7a8d733 100644
--- a/libteam/ifinfo.c
+++ b/libteam/ifinfo.c
@@ -273,6 +273,7 @@ static void obj_input_newlink(struct nl_object *obj, void *arg, bool event)
        struct rtnl_link *link;
        struct team_ifinfo *ifinfo;
        uint32_t ifindex;
+       uint32_t flags_cache = 0;
        int err;
        char *ifname;

@@ -280,6 +281,14 @@ static void obj_input_newlink(struct nl_object *obj, void *arg, bool event)

        link = (struct rtnl_link *) obj;

+       /* Cache the flags received in the new-link notification. Without this, the subsequent
+        * calls to the kernel to get complete details would override the received flags,
+        * overwriting with kernel's current state. This could/does result in application
+        * missing some of the state changes - especially the IFF_UP flag changes when there
+        * are back to back IFF_UP clear and set operations.
+        */
+       flags_cache = rtnl_link_get_flags(link);
+
        ifname = rtnl_link_get_name(link);
        if (ifname)
        {
@@ -310,6 +319,14 @@ static void obj_input_newlink(struct nl_object *obj, void *arg, bool event)
                return;

        clear_changed(ifinfo);
+
+       /* Restore the IFF_UP flag from the cache */
+       if (flags_cache & IFF_UP) {
+               rtnl_link_set_flags(link, IFF_UP);
+       } else {
+               rtnl_link_unset_flags(link, IFF_UP);
+       }
+
        ifinfo_update(ifinfo, link);

        if (event)


Note: The above fix would resolve few timing issues in teamd to honor the port-channel admin flap triggered by iccpd.

I strongly recommend not to merge the changes in PR#3764.

Rgds,
Madhukar

@shine4chen
Copy link
Contributor Author

@madhukar-kamarapu Thanks you for the feedback. I study your reply and summary it as the follows:

  • Portchannel status flapping would trigger lacp-sm restart which can sync updated mac address to peer.
  • The existing teamd implementation has some issue so that it can't handle very fast port status flapping properly.
  • You are ready for a patch to resolve it.

I admit your solution make some sense. But it depends on some factors.

  • It depend on my outgoing PR (Iccpd support ipv6-nd).
  • Linux netlink message could be dropped in kernel if much netlink events is generated or system load is very high.

#3764 can simply solve the converge time issue without iccpd-support-nd PR. And I don't see any side effort.
How about your comment?

@madhukar-kamarapu
Copy link

@madhukar-kamarapu Thanks you for the feedback. I study your reply and summary it as the follows:

  • Portchannel status flapping would trigger lacp-sm restart which can sync updated mac address to peer.
  • The existing teamd implementation has some issue so that it can't handle very fast port status flapping properly.
  • You are ready for a patch to resolve it.

I admit your solution make some sense. But it depends on some factors.

  • It depend on my outgoing PR (Iccpd support ipv6-nd).
  • Linux netlink message could be dropped in kernel if much netlink events is generated or system load is very high.

#3764 can simply solve the converge time issue without iccpd-support-nd PR. And I don't see any side effort.
How about your comment?

@shine4chen - Actor_System (in LACPDU) is defined as the MAC address of the system. In a typical scenario Actor_System for a given port never changes. Reason - Actor_System is the MAC address of the system; MAC address does not change on the fly.
MCLAG is a special scenario where we change the MAC address of the port-channel netdevice.

The IEEE standards 802.3AD(old) or 802.1AX(new) do not talk about what needs to be done when MAC address change happens.

With the current fix (PR#3764), we'd be transmitting LACPDUs when the MAC address change happens. This behavior is not defined in the standard.

Since MCLAG is special case scenario of MAC address change, it would be better to stick with the port-channel admin state flap which would restart the LACP SM with new MAC address instantaneously. We'd not be deviating from the standard.

Note: This fix mentioned by me (retain IFF_UP flag) is anyways required in teamd; if this fix is taken (along with port-channel admin state flap), the changes done in PR#3764 would be redundant. We'd unnecessarily send more LACPDUs from the port.

@shine4chen
Copy link
Contributor Author

@madhukar-kamarapu Sure, I will close this PR after you submit your teamd patch PR.

@shine4chen
Copy link
Contributor Author

@madhukar-kamarapu Could you please to send me your patch file for libteam? Then we can test it locally. shine.chen@mediatek.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants