Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warm reboot: How the FlexCounter was handled after init view? #862

Closed
shuaishang opened this issue Jul 15, 2021 · 13 comments
Closed

Warm reboot: How the FlexCounter was handled after init view? #862

shuaishang opened this issue Jul 15, 2021 · 13 comments

Comments

@shuaishang
Copy link
Collaborator

shuaishang commented Jul 15, 2021

Description
With the buffer pool in config db, then do warm reboot, the syncd can't handle the FlexCounter with new VID, in "processFlexCounterEvent".
Even after apply view, the syncd should not be able to get the counter for this new VID.

Steps to reproduce the issue:

  1. the config db has the buffer_pool.
  2. do warm reboot
  3. check syslog

config db:
"BUFFER_POOL": { "egress_pool": { "mode": "dynamic", "size": "100000", "type": "egress" } },

Describe the results you received:
syslog after warmboot:
Jul 15 07:40:46.978278 sonic ERR syncd#syncd: :- translateVidToRid: unable to get RID for VID oid:0x18000000000647
Jul 15 07:40:46.978385 sonic WARNING syncd#syncd: :- processFlexCounterEvent: port VID oid:0x18000000000647, was not found (probably port was removed/splitted) and will remove from counters now
Jul 15 07:40:46.978474 sonic NOTICE syncd#syncd: :- removeBufferPool: Trying to remove nonexisting buffer pool 0x18000000000647 from flex counter BUFFER_POOL_WATERMARK_STAT_COUNTER

Describe the results you expected:
Error free warmboot, and buffer pool counter should be collected correctly by Flexcounter after warmboot.

Additional information you deem important (e.g. issue happens only occasionally):
Attach sai rec
sairedis.rec.zip

The buffer pool VID before warm reboot is 0x180000000005cd.
After warm reboot, in init view, the buffer pool VID is 0x18000000000647, the orchagent attach the flex counter to the new VID, but syncd failed because no RID yet.

Even after apply view, the syncd should not be able to get the counter for this new VID 0x180000000005cd.
Because the processFlexCounterEvent will not be called for this new VID after apply view.

@shuaishang
Copy link
Collaborator Author

Because there is no RID in INIT_VIEW, for objects which VID may change between different boot, does it mean the orchagent can't do the following for them?

  1. get the object after create. (macsecorch, get_macsec_attribute should be called after create_macsec)
  2. attach the flexcounter for it. (for example, this case)

@kcudnik
Copy link
Collaborator

kcudnik commented Jul 15, 2021

hey, i don't exactly know how flex counters are handled in orchagent, but subscribe for flex counters should be done after apply view, and it does not matter whether this is cold or warm boot.

  1. yes, if you create object in init view, you can't do GET operation on it, exception is create_switch, because in apply view stage those new VID will be matched by comparison logic with possibly existing objects already to match RID, to provide minimal dataplane disruption.
  2. if i remember correctly, now a RID value for counter is required to subscribe for a counter, that's why OA should subscribe to flex after apply view, and even if we would move that ability to subscribe for VID, then during init-view phase we still would not be able to query those, since RID was not matched yet. so overall result is to subscribe for counters after apply view

@shuaishang
Copy link
Collaborator Author

But currently, in OA warmboot process, warmRestoreAndSyncUp, it restored the data from APP_DB and processed them before APPLY_VIEW.
It means buffer orch subscribed the flex counter, macsec orch called get_macsec_attribute before APPLY_VIEW.
There is no difference in OA doTask between init and apply view.

@kcudnik
Copy link
Collaborator

kcudnik commented Jul 15, 2021

this is design problem on OA and should be addressed on OA side, sairedis/syncd will not know which RID/VID is which before apply view in warm boot mode

@shuaishang
Copy link
Collaborator Author

@lguohan could you pls help to check this issue?

@kcudnik
Copy link
Collaborator

kcudnik commented Jul 16, 2021

@liat-grozovik
Copy link
Collaborator

@stephenxs can you please have a look ?

@shuaishang
Copy link
Collaborator Author

Any update for this issue?

@stephenxs
Copy link
Contributor

Any update for this issue?

I'm checking and will share any findings once I have them.

@shuaishang
Copy link
Collaborator Author

Any progress?

@stephenxs
Copy link
Contributor

There is no guarantee that the OIDs of buffer pools keep unchanged across warm reboots. It will always regenerate the counter ID list for buffer pools by calling BufferOrch::generateBufferPoolWatermarkCounterIdList when the system (cold or warm) starts. So OIDs changed across warm reboot is not the issue.

The issue is that the orchagent should NOT clear counters before APPLY_VIEW. I think it is caused by BufferOrch::generateBufferPoolWatermarkCounterIdList as well.

When the system starts, enable_counters will wait for ~180 seconds and then push the field {"FLEX_COUNTER_STATUS": "enable"} into CONFIG_DB.FLEX_COUNTER_TABLE|BUFFER_POOL_WATERMARK, meaning to enable the buffer pool watermark.
Flex counter orch is listening to this table and will call BufferOrch::generateBufferPoolWatermarkCounterIdList to push the counter IDs list to FLEX_COUNTER_DB once it heard the field.
By default, the watermark is fetched in read-and-clear mode which is not supported by all vendors.
To avoid further errors, a logic was introduced to check the capability of the read-and-clear by clearing the counters and checking the return value.

From the attached sairedis.rec we see there was clear_stats before APPLY_VIEW, which indicates generateBufferPoolWatermarkCounterIdList was called before APPLY_VIEW.

2021-07-15.07:40:46.976150|q|clear_stats|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x18000000000647|SAI_BUFFER_POOL_STAT_WATERMARK_BYTES=|SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES=
2021-07-15.07:40:46.977456|Q|clear_stats|SAI_STATUS_INVALID_OBJECT_ID
2021-07-15.07:40:46.983732|a|APPLY_VIEW
2021-07-15.07:40:47.894567|A|SAI_STATUS_SUCCESS

Even though there is no guarantee that generateBufferPoolWatermarkCounterIdList is always called after APPLY_VIEW, it is unlikely to be called before APPLY_VIEW given that there is a 180 seconds delay before it is triggered.

Can you check in your platform

  • Probably the syncd docker starts a bit late in your platform?
  • Is there a delay logic in enable_counters.py?

In case the logic exists but generateBufferPoolWatermarkCounterIdList is still called before APPLY_VIEW in your platform, I have some preliminary thoughts to handle it:

  • Buffer Orch not to test the capability of CLEAR. SAI redis to test it when buffer pool counter IDs are inserted for the first time, just like what has been done for counter IDs in setBufferPoolCounterList
  • Don't try clearing counters immediately after CONFIG_DB.FLEX_COUNTER_TABLE|BUFFER_POOL_WATERMARK is updated in a warm reboot flow. Just record it instead and trigger clearing counters after APPLY_VIEW has been triggered.
  • To find another way to test whether CLEAR is supported by the vendor. Eg., reading some capabilities. Not sure whether it is supported and need further check.

@shuaishang @qiluo-msft @kcudnik @liat-grozovik FYI.

@qiluo-msft
Copy link
Contributor

qiluo-msft commented Aug 23, 2021

@stephenxs You mentioned

Even though there is no guarantee that generateBufferPoolWatermarkCounterIdList is always called after APPLY_VIEW

Actually it is possible. The orchagent warm-reboot includes below steps. (check the code in OrchDaemon::warmRestoreAndSyncUp)

  1. bake(): it is a virtual function in every Orch subclass. The default implementation is just loaded Redis data into Consumer objects. In the warm-reboot shutdown process, we stored Redis data into files, and in the warm-reboot bootup process, we load files into Redis data, and erase all the data not relating warm-reboot. bake() could be reimplemented in subclass, in order to customize. So if you want to clean ConfigDB entries, please reimplement bake() or just erase them.
  2. doTask() will consume all the data loaded in the Consumer objects in last step. Be aware we are only processing data from init Redis content, ie, indirectly from files.
  3. apply view
  4. doTask() will start process data from upstream. Even there are upstream data coming during above steps, they are waiting and not processed until this step.

So you could take advantage of Step 1 and Step 4 to make sure the guarantee.

liat-grozovik pushed a commit to sonic-net/sonic-swss that referenced this issue Nov 24, 2021
)

- What I did
Don't handle buffer pool watermark during warm reboot reconciling

- Why I did it
This is to fix the community issue sonic-net/sonic-sairedis#862 and sonic-net/sonic-buildimage#8722

- How I verified it
Perform a warm reboot. Check whether

buffer pool watermark handling is skipped during reconciling and handled after it.
other watermark handling is handled during reconciling as it was before.
Details if related
The warm reboot flow is like this:

System starts. Orchagent fetches the items from database stored before warm reboot and pushes them into m_toSync of all orchagents. This is done by bake, which can be overridden by sub orchagent.
All sub orchagents handle the items in m_toSync. At this point, any notification from redis-db is blocked.
Warm reboot converges.
Orchagent starts to handle notifications from redis-db.
The fix is like this: in FlexCounterOrch::bake. the buffer pool watermark handling is skipped.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
judyjoseph pushed a commit to sonic-net/sonic-swss that referenced this issue Dec 1, 2021
)

- What I did
Don't handle buffer pool watermark during warm reboot reconciling

- Why I did it
This is to fix the community issue sonic-net/sonic-sairedis#862 and sonic-net/sonic-buildimage#8722

- How I verified it
Perform a warm reboot. Check whether

buffer pool watermark handling is skipped during reconciling and handled after it.
other watermark handling is handled during reconciling as it was before.
Details if related
The warm reboot flow is like this:

System starts. Orchagent fetches the items from database stored before warm reboot and pushes them into m_toSync of all orchagents. This is done by bake, which can be overridden by sub orchagent.
All sub orchagents handle the items in m_toSync. At this point, any notification from redis-db is blocked.
Warm reboot converges.
Orchagent starts to handle notifications from redis-db.
The fix is like this: in FlexCounterOrch::bake. the buffer pool watermark handling is skipped.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs
Copy link
Contributor

Guys, can we close this bug as it has been resolved by sonic-net/sonic-swss#1987?

qiluo-msft pushed a commit to sonic-net/sonic-swss that referenced this issue Jan 10, 2022
)

- What I did
Don't handle buffer pool watermark during warm reboot reconciling

- Why I did it
This is to fix the community issue sonic-net/sonic-sairedis#862 and sonic-net/sonic-buildimage#8722

- How I verified it
Perform a warm reboot. Check whether

buffer pool watermark handling is skipped during reconciling and handled after it.
other watermark handling is handled during reconciling as it was before.
Details if related
The warm reboot flow is like this:

System starts. Orchagent fetches the items from database stored before warm reboot and pushes them into m_toSync of all orchagents. This is done by bake, which can be overridden by sub orchagent.
All sub orchagents handle the items in m_toSync. At this point, any notification from redis-db is blocked.
Warm reboot converges.
Orchagent starts to handle notifications from redis-db.
The fix is like this: in FlexCounterOrch::bake. the buffer pool watermark handling is skipped.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants