-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set the default mac ageing time to 600 seconds #2365
Conversation
The current mac ageing was disabled, this could lead the mac address table to increase over time and lead to resource and performance issues. Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
this introduces new sai behaviro, we need need ptf test to validate this one or ansible test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need ansible test to validate this behavior.
Just curious, why is 300 seconds aging time chosen as default? |
@lguohan We will add ansible test cases, but that is on a different repo. It also makes sense to have this merged before the other PR. |
It is de facto default value, at least for most big vendors. |
@zhenggen-xu , we have the arp_update interval also as 300sec. Just thinking of a case where mac ages out but neighbor entry still points to this MAC. Currently we don't have this case since mac never age out in HW. |
@prsunny The arp update and HW MAC aging are some what agnostic. On many platforms, arp entry is doing the IP to MAC translation and have the neighbor destination information embedded into the entry and not relying on MAC table. Even on platform that is using the MAC table info for neighbor destination, if MAC aged out, traffic hit the neighbor entry should flood. Also, arp update was mostly dealing with VLAN facing ports where traffic hit HW not control plane and the entry could be aged-out from control plane (linux kernel). In those scenarios, HW MAC should not be aged out as long as the traffic hit HW. But anyway, even MAC aged out before ARP, it shouldn't be a problem as mentioned above. So overall, I don't see issues here. |
@zhenggen-xu , that was my point, in platforms where neighbor points to mac table, it starts flooding which will be different if there was no aging. I'm just saying this would be a behavioral change. |
@prsunny In my opinion, even if it does flood, it is only temporary for uni-directional traffic. In any case, the HW MAC age is a must, otherwise, the system could get to bad state if any node started to use VMAC etc.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it break warm-reboot vs test?
https://github.com/Azure/sonic-swss/blob/5984e3aeba59249eff71bd13666bb7b64bd493c8/tests/test_fdb_warm.py#L79
retest this please |
@qiluo-msft , I do not think that the vs docker load the swich.json by default. But it seems the right thing to do. We should set the mac aging and then do a warm reboot test. |
The warm-reboot vs test case does not assert after getting the FDB aging timer, so it won't break in any case. Also, the aging timer is only applicable to HW (unless sairedis talk to vs or KVM to config the fdb properties there?), so it should not fail any test cases in vs environment, vs get or not get the timer value should not matter. In worst case, if vs kernel or bridge uses fdb aging timer from appDB, we should adjust the warm reboot test cases by changing or disabling aging timer to avoid timing issues. For the real HW test warm reboot, the process will disable aging during the warm reboot as we talked before, it will need restore it after come back, it should be done by swss.json, so I think it is covered there as well. |
retest this please |
Why closed? |
it was closed by accident when I clean up my branches. |
This is to be on the safer side where ARP update interval is 300 seconds and SONiC does not flood when ARP is aged out. Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
@lguohan do you have more concern? |
* Set the default mac ageing time to 300 seconds The current mac ageing was disabled, this could lead the mac address table to increase over time and lead to resource and performance issues. Signed-off-by: Zhenggen Xu <zxu@linkedin.com> * Update the default HW ageing timer to be 600 seconds. This is to be on the safer side where ARP update interval is 300 seconds and SONiC does not flood when ARP is aged out. Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
@lguohan Should we cherry-pick it into 201803 since it is a general feature missing? |
Update sonic-utilities submodule pointer to include the following: * b739efc [subinterface]Added additional checks in portchannel and subinterface commands (sonic-net#2345) ([sonic-net#2371](sonic-net/sonic-utilities#2371)) * d01153a Use warm-boot infrastructure for fast-boot ([sonic-net#2365](sonic-net/sonic-utilities#2365)) Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
Update sonic-utilities submodule pointer to include the following: * [subinterface]Added additional checks in portchannel and subinterface commands (sonic-net#2345) ([sonic-net#2371](sonic-net/sonic-utilities#2371)) * Use warm-boot infrastructure for fast-boot ([sonic-net#2365](sonic-net/sonic-utilities#2365)) Signed-off-by: dprital <drorp@nvidia.com>
Update sonic-utilities submodule pointer to include the following: * b739efc [subinterface]Added additional checks in portchannel and subinterface commands (#2345) ([#2371](sonic-net/sonic-utilities#2371)) * d01153a Use warm-boot infrastructure for fast-boot ([#2365](sonic-net/sonic-utilities#2365))
The current mac ageing was disabled, this could lead the mac address
table to increase over time and lead to resource and performance issues.
Signed-off-by: Zhenggen Xu zxu@linkedin.com
- What I did
Set mac ageing time default to 300 seconds
- How I did it
set the ageing time in switch.json generated by switch.j2, it will be loaded during the init or service restart.
- How to verify it
Before the fix, macs learnt were never aged out.
After the fix, after ~5 minutes, the macs learnt but inactive were aged out.
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)