-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNMP-SUBAGENT] CPU Spike because of redundant and flooded keyspace notifications #8293
Comments
Thanks for the report, analysis and proposal.
Is it possible to improve upstream lldp_syncd first? If it only apply the delta into Redis, does snmpagent work well? |
It is possible to optimize lldp_syncd but we would still get atleast num_lldp_neigh notifications for every 10 secs. So, snmp-sub agent will still have to be modified |
Please submit a PR based on "Traige", we can continue discuss on the PR context. |
…230) **- What I did** Fixes [#8293](sonic-net/sonic-buildimage#8293) **- How I did it** Accumulated all the older notifications and did act only upon the latest notification discarding the others
…230) **- What I did** Fixes [#8293](sonic-net/sonic-buildimage#8293) **- How I did it** Accumulated all the older notifications and did act only upon the latest notification discarding the others
…230) **- What I did** Fixes [#8293](sonic-net/sonic-buildimage#8293) **- How I did it** Accumulated all the older notifications and did act only upon the latest notification discarding the others
Description
Recently, an extremely similar bug LINK is seen in the snmp-subagent process which was related to blocking calls and fix is ported already.
This is of similar nature i.e very high CPU Usage, but this occurs occasionally i.e around 20-30 % of the time.
Steps to reproduce the issue:
Describe the results you received:
Snapshot of where the process spends the time
Traige:
On further troubleshooting, issue was traced down to LLDPRemManAddrUpdater MIB inside the snmp-subagent. This MIB unlike most of the others uses redis key_space notifications to update it's internal cache. It subscribes to the notifications for "LLDP_ENTRY_TABLE*". These keys on the other hand are del & set for every 10 secs. Check lldp_syncd logic for more info.
Each of the set operation on these keys are result in 11 notifications, These (12xnum_interfacesxtime/10) notifications accumulate inside the buffer and once the asyncio schedules the LLDPRemManAddrUpdater MIB, the update_date logic fetches the key-value pairs from redis and updates the internal cache once for every notification seen resulting in the spike
All of these older notifications are redundant and can be skipped.
I've thus tested this prototype, which seemed to get down the CPU usage. If deemed fit, this can be used as a good starting point towards the full solution.
Note: This event-notification logic is also seen in LocPortUpdater MIB but this issue is not seen there because the PORT table is not updated with the frequency that the LLDP_ENTRY is updated with. So, having this fixed as well might also help.
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: