Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the default mac ageing time to 600 seconds #2365

Merged
merged 2 commits into from
Jun 15, 2019

Conversation

zhenggen-xu
Copy link
Collaborator

The current mac ageing was disabled, this could lead the mac address
table to increase over time and lead to resource and performance issues.

Signed-off-by: Zhenggen Xu zxu@linkedin.com

- What I did
Set mac ageing time default to 300 seconds

- How I did it
set the ageing time in switch.json generated by switch.j2, it will be loaded during the init or service restart.

- How to verify it
Before the fix, macs learnt were never aged out.
After the fix, after ~5 minutes, the macs learnt but inactive were aged out.

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

The current mac ageing was disabled, this could lead the mac address
table to increase over time and lead to resource and performance issues.

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
@lguohan
Copy link
Collaborator

lguohan commented Dec 8, 2018

this introduces new sai behaviro, we need need ptf test to validate this one or ansible test.

Copy link
Collaborator

@lguohan lguohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need ansible test to validate this behavior.

@jipanyang
Copy link
Collaborator

Just curious, why is 300 seconds aging time chosen as default?

@zhenggen-xu
Copy link
Collaborator Author

@lguohan We will add ansible test cases, but that is on a different repo. It also makes sense to have this merged before the other PR.

@zhenggen-xu
Copy link
Collaborator Author

Just curious, why is 300 seconds aging time chosen as default?

It is de facto default value, at least for most big vendors.

@prsunny
Copy link
Contributor

prsunny commented Dec 12, 2018

@zhenggen-xu , we have the arp_update interval also as 300sec. Just thinking of a case where mac ages out but neighbor entry still points to this MAC. Currently we don't have this case since mac never age out in HW.

@zhenggen-xu
Copy link
Collaborator Author

@prsunny The arp update and HW MAC aging are some what agnostic. On many platforms, arp entry is doing the IP to MAC translation and have the neighbor destination information embedded into the entry and not relying on MAC table. Even on platform that is using the MAC table info for neighbor destination, if MAC aged out, traffic hit the neighbor entry should flood.

Also, arp update was mostly dealing with VLAN facing ports where traffic hit HW not control plane and the entry could be aged-out from control plane (linux kernel). In those scenarios, HW MAC should not be aged out as long as the traffic hit HW. But anyway, even MAC aged out before ARP, it shouldn't be a problem as mentioned above.

So overall, I don't see issues here.

@prsunny
Copy link
Contributor

prsunny commented Dec 13, 2018

@zhenggen-xu , that was my point, in platforms where neighbor points to mac table, it starts flooding which will be different if there was no aging. I'm just saying this would be a behavioral change.

@zhenggen-xu
Copy link
Collaborator Author

@prsunny In my opinion, even if it does flood, it is only temporary for uni-directional traffic.
And in fact, the SAI definition should have defined the behavior so the neighbor MAC is not tied to pure L2 MAC. It is up the platform vendor to guarantee that. I think most the platform already doing that.

In any case, the HW MAC age is a must, otherwise, the system could get to bad state if any node started to use VMAC etc..

Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan
Copy link
Collaborator

lguohan commented Mar 26, 2019

retest this please

@lguohan
Copy link
Collaborator

lguohan commented Mar 26, 2019

@qiluo-msft , I do not think that the vs docker load the swich.json by default. But it seems the right thing to do. We should set the mac aging and then do a warm reboot test.

@zhenggen-xu
Copy link
Collaborator Author

zhenggen-xu commented Mar 26, 2019

Will it break warm-reboot vs test?
https://github.com/Azure/sonic-swss/blob/5984e3aeba59249eff71bd13666bb7b64bd493c8/tests/test_fdb_warm.py#L79

The warm-reboot vs test case does not assert after getting the FDB aging timer, so it won't break in any case. Also, the aging timer is only applicable to HW (unless sairedis talk to vs or KVM to config the fdb properties there?), so it should not fail any test cases in vs environment, vs get or not get the timer value should not matter. In worst case, if vs kernel or bridge uses fdb aging timer from appDB, we should adjust the warm reboot test cases by changing or disabling aging timer to avoid timing issues. For the real HW test warm reboot, the process will disable aging during the warm reboot as we talked before, it will need restore it after come back, it should be done by swss.json, so I think it is covered there as well.

@lguohan
Copy link
Collaborator

lguohan commented Apr 1, 2019

retest this please

@zhenggen-xu zhenggen-xu closed this Jun 7, 2019
@zhenggen-xu zhenggen-xu deleted the mac-ageing-time branch June 7, 2019 19:32
@qiluo-msft
Copy link
Collaborator

Why closed?

@zhenggen-xu zhenggen-xu restored the mac-ageing-time branch June 14, 2019 05:32
@zhenggen-xu zhenggen-xu reopened this Jun 14, 2019
@zhenggen-xu
Copy link
Collaborator Author

it was closed by accident when I clean up my branches.

This is to be on the safer side where ARP update interval
is 300 seconds and SONiC does not flood when ARP is aged out.

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
@zhenggen-xu zhenggen-xu changed the title Set the default mac ageing time to 300 seconds Set the default mac ageing time to 600 seconds Jun 14, 2019
@yxieca
Copy link
Contributor

yxieca commented Jun 15, 2019

@lguohan do you have more concern?

@yxieca yxieca merged commit d67c6d4 into sonic-net:master Jun 15, 2019
yxieca pushed a commit that referenced this pull request Jun 16, 2019
* Set the default mac ageing time to 300 seconds

The current mac ageing was disabled, this could lead the mac address
table to increase over time and lead to resource and performance issues.

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>

* Update the default HW ageing timer to be 600 seconds.

This is to be on the safer side where ARP update interval
is 300 seconds and SONiC does not flood when ARP is aged out.

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
@qiluo-msft
Copy link
Collaborator

@lguohan Should we cherry-pick it into 201803 since it is a general feature missing?

dgsudharsan added a commit to dgsudharsan/sonic-buildimage that referenced this pull request Sep 14, 2022
Update sonic-utilities submodule pointer to include the following:
* b739efc [subinterface]Added additional checks in portchannel and subinterface commands (sonic-net#2345) ([sonic-net#2371](sonic-net/sonic-utilities#2371))
* d01153a Use warm-boot infrastructure for fast-boot ([sonic-net#2365](sonic-net/sonic-utilities#2365))

Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
dprital added a commit to dprital/sonic-buildimage that referenced this pull request Sep 14, 2022
Update sonic-utilities submodule pointer to include the following:
* [subinterface]Added additional checks in portchannel and subinterface commands (sonic-net#2345) ([sonic-net#2371](sonic-net/sonic-utilities#2371))
* Use warm-boot infrastructure for fast-boot ([sonic-net#2365](sonic-net/sonic-utilities#2365))

Signed-off-by: dprital <drorp@nvidia.com>
vaibhavhd pushed a commit that referenced this pull request Sep 15, 2022
Update sonic-utilities submodule pointer to include the following:
[202205][subinterface]Added additional checks in portchannel and subinterface commands (#2371)
[202205] Use warm-boot infrastructure for fast-boot (#2365)
prsunny pushed a commit that referenced this pull request Sep 15, 2022
Update sonic-utilities submodule pointer to include the following:
* b739efc [subinterface]Added additional checks in portchannel and subinterface commands (#2345) ([#2371](sonic-net/sonic-utilities#2371))
* d01153a Use warm-boot infrastructure for fast-boot ([#2365](sonic-net/sonic-utilities#2365))
@zhenggen-xu zhenggen-xu deleted the mac-ageing-time branch May 24, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants