-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MCLAG] the change of port mac would trigger teamd to send LACP pdu instantly #3764
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: shine.chen <shine.chen@mediatek.com>
retest vs please |
please help to review it @rlhui |
Signed-off-by: shine.chen <shine.chen@mediatek.com>
merge master branch
Signed-off-by: shine.chen <shine.chen@mediatek.com>
@pavel-shirshov Could you please to review and approve this PR? |
retest vsimage please |
@shine4chen - This code change is not required in teamd to send the LACPDUs instantly when MCLAG on the Standby device is DOWN (the port-channel MAC address is reverted back to the device's MAC address). Your commit (https://github.com/shine4chen/sonic-buildimage/blob/b697da2fb436cf0b430d47378acce8abfb2a4562/src/iccpd/src/iccp_netlink.c#L675 The above changes are from your PR (Iccpd support ipv6-nd) - But there is a bug in teamd to handle the above change (restart the LACP SM when admin status is flapped): Teamd relies on RTM_NEWLINK messages to be notified of the IFF_UP clear and set operations from the kernel. When teamd receives an RTM_NEWLINK message from the kernel, it turns around and enquires the kernel for a full set of information regarding the link event. The kernel populates all the contents of the netlink message, including updating/overwriting the flags field (that carries the IFF_UP status) with its current state in the kernel. Depending on the timing of the call into the kernel, a previous IFF_UP state could have been changed in the kernel, hence overwriting the previous flags as it returns. If this happens, the teamd application would end-up missing a IFF_UP event (clear or set), thereby not effecting state-machine changes properly. Here is the patch for above mentioned problem, I'll be submitting a PR for the fix in teamd:
Note: The above fix would resolve few timing issues in teamd to honor the port-channel admin flap triggered by iccpd. I strongly recommend not to merge the changes in PR#3764. Rgds, |
@madhukar-kamarapu Thanks you for the feedback. I study your reply and summary it as the follows:
I admit your solution make some sense. But it depends on some factors.
#3764 can simply solve the converge time issue without iccpd-support-nd PR. And I don't see any side effort. |
@shine4chen - Actor_System (in LACPDU) is defined as the MAC address of the system. In a typical scenario Actor_System for a given port never changes. Reason - Actor_System is the MAC address of the system; MAC address does not change on the fly. The IEEE standards 802.3AD(old) or 802.1AX(new) do not talk about what needs to be done when MAC address change happens. With the current fix (PR#3764), we'd be transmitting LACPDUs when the MAC address change happens. This behavior is not defined in the standard. Since MCLAG is special case scenario of MAC address change, it would be better to stick with the port-channel admin state flap which would restart the LACP SM with new MAC address instantaneously. We'd not be deviating from the standard. Note: This fix mentioned by me (retain IFF_UP flag) is anyways required in teamd; if this fix is taken (along with port-channel admin state flap), the changes done in PR#3764 would be redundant. We'd unnecessarily send more LACPDUs from the port. |
@madhukar-kamarapu Sure, I will close this PR after you submit your teamd patch PR. |
@madhukar-kamarapu Could you please to send me your patch file for libteam? Then we can test it locally. shine.chen@mediatek.com |
Signed-off-by: shine.chen shine.chen@mediatek.com
- What I did
In MCLAG scenario when keep-alive connection between active and standby node is down , standby node port mac will be changed back to his original mac. After host node receives LACP PDU from standby node, it will remove the local port connected to standby node from port-channel and switch all traffic to active node. But In existed teamd implementation when the port src mac is changed teamd doesn't send LACP PDU instantly , but wait for timer event expiry to send it. It would cause host still forward half of the traffic to standby node for quite a long period( maybe up to 30 seconds). During this period these traffic would be dropped by standby node. So we add a patch for teamd here. The change of port mac would trigger teamd to send LACP pdu instantly.
- How I did it
The change of port mac would trigger teamd to send LACP pdu instantly.
- How to verify it
After apply this patch the disruptive time decrease to less than 20ms.
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)