Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[crm] syslog generation handling for single threshold for different b… #1056

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AmitKaushik7
Copy link

…indpoints

What I did
Updated threshold checking in a way that Exceed Threshold syslog message gets generated when the usage for any of the ACL group(bindpoint) exceeds the configured High Thresholds and gets cleared when usage of all the ACL group falls below configured low Threshold.

Why I did it
There is single threshold low and high range for multiple ACL group (similarly for ACL tables as well), but multiple bindpoints. The usage can be low for some bindpoints and higher for other bindpoints. which leads to generation of continuous syslog message generation with THRESHOLD_EXCEEDED and THRESHOLD_CLEAR messages at the same time.
Exceeded Threshold message should get generated only when resource count for any acl group exceeds the Max Thresholds and gets cleared when it falls below the lower threshold for all the acl groups.

How I verified it
Configure CRM configure thresholds for ACL group.
Create ACL group with different bindpoints such that
for some bindpoints the usage is higher than the configured threshold and
for some bindpoints threshold is lower than the configured high threshold.
Configure crm polling interval
Observe that the Threshold Exceeded syslog messages are getting generated only once for the ACL group. Verify that no Threshold Clear messages are seen.
Now release the ACL resources and observe that the Clear messages gets generated only when the ACL group usage for all bindpoints falls below than the configured low threshold.

Details if related

@AmitKaushik7
Copy link
Author

CI:rerun

@AmitKaushik7
Copy link
Author

Please retest this..

@prsunny
Copy link
Collaborator

prsunny commented Oct 9, 2019

@AmitKaushik7 , as per this doc, "These messages must be suppressed after printing for 10 times.", so you will only see 10 messages since there is a check to suppress this in code. I understand the continuous 'EXCEED' message in this case, but why there is a continuous 'CLEAR'.

@AmitKaushik7
Copy link
Author

EXCEED Messages must be suppressed after printing 10 times and CLEAR message can appear max once until an EXCEED message appears for a CRM resource.
We will not see any issue with v4/v6 nhop,neigh,routes,ecmp resources as they have single threshold defined for each CRM resource.

The issue is when multiple Bind Points/table entries share a common High/Low Threshold. For ACL group/table, we have single threshold for multiple bindpoints/ multiple tables.
Here we will observe alternate EXCEED and CLEAR messages continuously.

Say we have High threshold configured for ACL table as 85% and low threshold as 50% which is common for all bindpoints(for a single CRM resource eg ACL table/group).
If the usage for One bindpoint is 90% and usage for another bindpoint is 0%, then we will get alternate EXCEED-CLEAR messages (once for each bindpoint/entry)
With a reduced polling interval of 1, every second we will see these messages.

The fix is to report EXCEED message only if the threshold of any bindpoint exceeds the High Threshold and report CLEAR when the threshold for all bindpoints of that resource are below the Low Threshold for the resource.

EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
1. show fg-nhg-hash-view
   (shows the current hash bucket view of fg nhg)

2. show fg-nhg-active-hops
   (shows which set of next-hops are active)
@prsunny prsunny self-requested a review as a code owner September 2, 2022 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants