-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNMP-SUBAGENT] [202012] Observed a unusual spike in CPU usage because of blocking=True argument #8202
Closed
Labels
Triaged
this issue has been triaged
Comments
How long is the "CPU Usage spike" starting from process invoke? If the timeframe is reasonable, does snmpagent work well after that as before? |
@qiluo-msft is checking |
qiluo-msft
added a commit
to sonic-net/sonic-swss-common
that referenced
this issue
Jul 22, 2021
Fixes sonic-net/sonic-buildimage#8202 - Fix pubsub channel timeout unit - Reset context err before next redisGetReply, otherwise later redisGetReply early return on no data, and burn CPU core in a loop.
qiluo-msft
added a commit
to sonic-net/sonic-swss-common
that referenced
this issue
Jul 28, 2021
Fixes sonic-net/sonic-buildimage#8202 - Fix pubsub channel timeout unit - Reset context err before next redisGetReply, otherwise later redisGetReply early return on no data, and burn CPU core in a loop.
judyjoseph
pushed a commit
to sonic-net/sonic-swss-common
that referenced
this issue
Aug 17, 2021
Fixes sonic-net/sonic-buildimage#8202 - Fix pubsub channel timeout unit - Reset context err before next redisGetReply, otherwise later redisGetReply early return on no data, and burn CPU core in a loop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Steps to reproduce the issue:
Describe the results you received:
Check the CPU usage of the SNMP docker and snmp-subagent process
After doing some profiling,
It is very clear that the get_all method of swsscommon.Sonicv2Connector is taking a significant amount of time, when used with blocking set to True (waits for around 60 sec if the data is not present in the Redis). And this seems to busy wait as inferred from the logic here LINK and thus the reason for CPU Usage spike.
This behavior is particular seen after that commit because, that commit essentially delays the initialization of FLEX_COUNTERS and thereby the COUNTERS:oid* until the counterpoll is enabled.
But the MIB classes in the snmp-subagent periodically poll these DB with blocking set to True.
As a fix, we can set the blocking set to False in all these MIB classes. This shouldn't affect how they operate as they periodically poll the redis anyway and if the data is missed at this instant, that can eventually be fetched within the next 3 to 7 second. {5 is the default period with a random delay of -2,2}
Describe the results you expected:
Shouldn't see the CPU Usage spike.
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: