-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SWSS] Static LAG Support #3090
base: master
Are you sure you want to change the base?
Conversation
Why ? Static lag support changes in SWSS module sonic-net/SONiC#1039 Patch explanation 1. static lag supported with option roundrobin. 2. tlm_teamd -> teamdctl changes to handle json dump for static lag. 3. test cases -> updated UT:- Test cases 1 Create static port channel with static flag pass pass 2 verify static has option flag true or false pass pass 3 Add static member see the portchannel is up pass pass 4 verify teamd is created with round-robin option by default pass pass 5 Remove last portchannel member check port channel down pass pass 6 Remove portchannel member check port channel still up pass pass 7 verify teamdctl config dump pass pass 8 verify teamdctl state dump pass pass 9 shutdown the portchannel check the kernel state pass pass 10 no shutdown the portchannel check the kernel state pass pass 11 "Check the show output matches the review comment root@sonic:~# show inter port Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available, S - selected, D - deselected, * - not synced No. Team Dev Protocol Ports ----- ------------- ----------- ----------------------------------------- 01 PortChannel01 LACP(A)(Up) Ethernet16(S) 02 PortChannel02 NONE(A)(Up) Ethernet48(S) Ethernet64(S) Ethernet32(S) root@sonic:~# 12 teamnl is set to roundrobin pass pass 13 save and reload and verify portchannel is up pass pass 14 "docker restart teamd teamd stopped swss stopped syncd stopped swss started syncd started teamd started" pass pass 15 warm-reboot fails even without any port channel config fail 16 verify teamd settles doesnt hog cpu with 100% cpu usage pass 17 "trying with static port channel config on non supported branches port channel will be configured as LACP." pass Not Supported Options 1. Min links and 2. fall back are not supported
I think the fallback LAG feature described here does the same thing basically, except that it allows LACP to be established once the peer device is in a state to start sending LACP packets? There is one bug related to the LAG containing multiple ports that's fixed in sonic-net/sonic-buildimage#10823. |
Hi Sai,
Static LAG will be useful and will be immune to any LACP protocol operation (except Link up/down events). Thanks, |
conf << "'{\"device\":\"" << alias << "\"," | ||
<< "\"hwaddr\":\"" << mac_boot.to_string() << "\"," | ||
<< "\"runner\":{" | ||
<< "\"name\":\"roundrobin\""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe based on HLD discussion the default mode should be loadbalance and not round robin. Can you please clarify why do we have roundrobin defined as default?
sonic-net/SONiC#1039 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition if we have roundrobin how is it achieved in the hardware. If I understand roundrobin is to send one packet per link in roundrobin fashion https://man.archlinux.org/man/teamd.conf.5.en . However we don't have such hash algorithm in SAI.
If SAI is going with loadbalance we should have the same defined in teamd as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dgsudharsan ,
Apologies for the delay... unwell last week (this week too ... )
No Round Robin is only for Linux traffic (CPU traffic), Right now it is not mapped to saiars.h (which support packet spraying in hardware for ECMP and LAG)
Regarding spec will modify to round robin..
With loadbalance facing issue of 100% teamd always thats the reason for the change to roundrobin.
Thanks,
Kannan.S
/// @param storage a reference to the temporary storage | ||
/// | ||
|
||
void ValuesStore::extract_values_static(const std::string & lag_name, json_t * root, HashOfRecords & storage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to have a separate function to extract values for static? This appears to be duplicate of the existing API. Can you clarify the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
teamdctl state information for Static lag and Dynamic LACP are little different.
Dynamic LACP has the following fields along with setup
{
"ports": {
"Ethernet48": {
"ifinfo": {
"dev_addr": "00:e0:ec:cc:a8:de",
"dev_addr_len": 6,
"ifindex": 40,
"ifname": "Ethernet48"
},
"link": {
"duplex": "half",
"speed": 0,
"up": false
},
"link_watches": {
"list": {
"link_watch_0": {
"delay_down": 0,
"delay_up": 0,
"down_count": 5,
"name": "ethtool",
"up": false
}
},
"up": false
},
"runner": {
"actor_lacpdu_info": {
"key": 103,
"port": 49,
"port_priority": 255,
"state": 5,
"system": "00:e0:ec:cc:a8:de",
"system_priority": 65535
},
"aggregator": {
"id": 0,
"selected": false
},
"key": 103,
"partner_lacpdu_info": {
"key": 0,
"port": 0,
"port_priority": 0,
"state": 0,
"system": "00:00:00:00:00:00",
"system_priority": 0
},
"partner_retry_count": 3,
"prio": 255,
"selected": false,
"state": "disabled"
}
}
},
"runner": {
"active": true,
"enable_retry_count_feature": true,
"fallback": true,
"fast_rate": false,
"retry_count": 3,
"select_policy": "lacp_prio",
"sys_prio": 65535
},
"setup": {
"daemonized": true,
"dbus_enabled": false,
"debug_level": 1,
"kernel_team_mode_name": "loadbalance",
"pid": 297,
"pid_file": "/var/run/teamd/PortChannel03.pid",
"runner_name": "lacp",
"zmq_enabled": false
},
"team_device": {
"ifinfo": {
"dev_addr": "00:e0:ec:cc:a8:de",
"dev_addr_len": 6,
"ifindex": 50,
"ifname": "PortChannel03"
}
}
}
Static Lag will not contain the runner option (Dynamic LACP needs runner option to determine peer lacp link protocol information)
{
"ports": {
"Ethernet32": {
"ifinfo": {
"dev_addr": "00:e0:ec:cc:a8:de",
"dev_addr_len": 6,
"ifindex": 39,
"ifname": "Ethernet32"
},
"link": {
"duplex": "half",
"speed": 0,
"up": true
},
"link_watches": {
"list": {
"link_watch_0": {
"delay_down": 0,
"delay_up": 0,
"down_count": 3,
"name": "ethtool",
"up": true
}
},
"up": true
}
},
"Ethernet64": {
"ifinfo": {
"dev_addr": "00:e0:ec:cc:a8:de",
"dev_addr_len": 6,
"ifindex": 37,
"ifname": "Ethernet64"
},
"link": {
"duplex": "half",
"speed": 0,
"up": false
},
"link_watches": {
"list": {
"link_watch_0": {
"delay_down": 0,
"delay_up": 0,
"down_count": 3,
"name": "ethtool",
"up": false
}
},
"up": false
}
}
},
"setup": {
"daemonized": true,
"dbus_enabled": false,
"debug_level": 1,
"kernel_team_mode_name": "roundrobin",
"pid": 49,
"pid_file": "/var/run/teamd/PortChannel02.pid",
"runner_name": "roundrobin",
"zmq_enabled": false
},
"team_device": {
"ifinfo": {
"dev_addr": "00:e0:ec:cc:a8:de",
"dev_addr_len": 6,
"ifindex": 9,
"ifname": "PortChannel02"
}
}
}
Since Static doesnt have those runner information modifying existing LAG Member table done for LACP can create more collaterals, thats the reason for separating it out between Static and LACP.
What I did
Static lag support changes in SWSS module
sonic-net/SONiC#1039
Why I did it
PXE application doesnt support LAG during boot. and also some legacy device doesnt support lacp.
To achieve more bandwidth with those legacy devices static lag will be useful
How I verified it
Details if related