Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncd translateVidToRid failures for ingress and egress buffer pools #6726

Closed
vaibhavhd opened this issue Feb 8, 2021 · 5 comments
Closed

Comments

@vaibhavhd
Copy link
Contributor

vaibhavhd commented Feb 8, 2021

Description
After Orchagent state changes to initialized, syncd throws errors in translating VID to RID for OIDs - egress_lossless_pool, egress_lossy_pool and ingress_lossless_pool.

This further leads to failures when syncd executes operations on ASIC with errors for the 3 pools with:
api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0

Ultimately, this causes failures to process buffer tasks and hence all the buffer tasks are dropped.

Steps to reproduce the issue:

  1. Warm-reboot the DUT
  2. Check syslog
  3. Alternatively, run Pytest test_warm_reboot

Describe the results you received:

Feb  8 16:35:22.162209 str-s6100-acs-2 NOTICE swss#orchagent: :- setWarmStartState: orchagent warm start state changed to initialized


Feb  8 16:35:43.497730 str-s6100-acs-2 ERR syncd#syncd: :- translateVidToRid: unable to get RID for VID oid:0x18000000000eb0
Feb  8 16:35:43.497821 str-s6100-acs-2 WARNING syncd#syncd: :- processClearStatsEvent: VID to RID translation failure: SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x18000000000eb0
Feb  8 16:35:43.503650 str-s6100-acs-2 ERR syncd#syncd: :- translateVidToRid: unable to get RID for VID oid:0x18000000000eb1
Feb  8 16:35:43.503710 str-s6100-acs-2 WARNING syncd#syncd: :- processClearStatsEvent: VID to RID translation failure: SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x18000000000eb1
Feb  8 16:35:43.510011 str-s6100-acs-2 ERR syncd#syncd: :- translateVidToRid: unable to get RID for VID oid:0x18000000000eb2
Feb  8 16:35:43.510011 str-s6100-acs-2 WARNING syncd#syncd: :- processClearStatsEvent: VID to RID translation failure: SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x18000000000eb2
Feb  8 16:35:49.743413 str-s6100-acs-2 ERR syncd#syncd: :- translateVidToRid: unable to get RID for VID oid:0x18000000000eb0

Feb  8 16:35:49.743879 str-s6100-acs-2 WARNING syncd#syncd: :- processFlexCounterEvent: port VID oid:0x18000000000eb0, was not found (probably port was removed/splitted) and will remove from counters now
Feb  8 16:35:49.744266 str-s6100-acs-2 NOTICE syncd#syncd: :- removeBufferPool: Trying to remove nonexisting buffer pool 0x18000000000eb0 from flex counter BUFFER_POOL_WATERMARK_STAT_COUNTER
Feb  8 16:35:49.746247 str-s6100-acs-2 ERR syncd#syncd: :- translateVidToRid: unable to get RID for VID oid:0x18000000000eb1
Feb  8 16:35:49.746644 str-s6100-acs-2 WARNING syncd#syncd: :- processFlexCounterEvent: port VID oid:0x18000000000eb1, was not found (probably port was removed/splitted) and will remove from counters now
Feb  8 16:35:49.746944 str-s6100-acs-2 NOTICE syncd#syncd: :- removeBufferPool: Trying to remove nonexisting buffer pool 0x18000000000eb1 from flex counter BUFFER_POOL_WATERMARK_STAT_COUNTER
Feb  8 16:35:49.750523 str-s6100-acs-2 ERR syncd#syncd: :- translateVidToRid: unable to get RID for VID oid:0x18000000000eb2
Feb  8 16:35:49.750942 str-s6100-acs-2 WARNING syncd#syncd: :- processFlexCounterEvent: port VID oid:0x18000000000eb2, was not found (probably port was removed/splitted) and will remove from counters now
Feb  8 16:35:49.751252 str-s6100-acs-2 NOTICE syncd#syncd: :- removeBufferPool: Trying to remove nonexisting buffer pool 0x18000000000eb2 from flex counter BUFFER_POOL_WATERMARK_STAT_COUNTER

Feb  8 16:35:50.798899 str-s6100-acs-2 NOTICE swss#orchagent: :- setWarmStartState: orchagent warm start state changed to restored
Feb  8 16:35:54.437827 str-s6100-acs-2 NOTICE syncd#syncd: :- executeOperationsOnAsic: operations to execute on ASIC: 56

Feb  8 16:36:02.615238 str-s6100-acs-2 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  8 16:36:02.615516 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: VID: oid:0x18000000000eb0 RID: oid:0x11800000001
Feb  8 16:36:02.615516 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: attr: SAI_BUFFER_POOL_ATTR_SIZE: 15982720
Feb  8 16:36:02.615938 str-s6100-acs-2 ERR swss#orchagent: :- set: set status: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  8 16:36:02.616151 str-s6100-acs-2 ERR swss#orchagent: :- processBufferPool: Failed to modify buffer pool, name:egress_lossless_pool, sai object:18000000000eb0, status:-196608
Feb  8 16:36:02.616312 str-s6100-acs-2 ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it

Feb  8 16:36:02.616749 str-s6100-acs-2 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  8 16:36:02.616977 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: VID: oid:0x18000000000eb1 RID: oid:0x11800000002
Feb  8 16:36:02.616977 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: attr: SAI_BUFFER_POOL_ATTR_SIZE: 9243812
Feb  8 16:36:02.617472 str-s6100-acs-2 ERR swss#orchagent: :- set: set status: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  8 16:36:02.617690 str-s6100-acs-2 ERR swss#orchagent: :- processBufferPool: Failed to modify buffer pool, name:egress_lossy_pool, sai object:18000000000eb1, status:-196608
Feb  8 16:36:02.617847 str-s6100-acs-2 ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it

Feb  8 16:36:02.629101 str-s6100-acs-2 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  8 16:36:02.629101 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: VID: oid:0x19000000000eb3 RID: oid:0x1900000006
Feb  8 16:36:02.629101 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: attr: SAI_BUFFER_PROFILE_ATTR_SHARED_STATIC_TH: 15982720
Feb  8 16:36:02.629635 str-s6100-acs-2 ERR swss#orchagent: :- set: set status: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  8 16:36:02.629836 str-s6100-acs-2 ERR swss#orchagent: :- processBufferProfile: Failed to modify buffer profile, name:egress_lossless_profile, sai object:19000000000eb3, status:-196608
Feb  8 16:36:02.629983 str-s6100-acs-2 ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it

Feb  8 16:36:02.630618 str-s6100-acs-2 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  8 16:36:02.630618 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: VID: oid:0x18000000000eb2 RID: oid:0x1800000001
Feb  8 16:36:02.630687 str-s6100-acs-2 ERR syncd#syncd: :- processQuadEvent: attr: SAI_BUFFER_POOL_ATTR_SIZE: 10875072
Feb  8 16:36:02.631476 str-s6100-acs-2 ERR swss#orchagent: :- set: set status: SAI_STATUS_ATTR_NOT_IMPLEMENTED_0
Feb  8 16:36:02.631728 str-s6100-acs-2 ERR swss#orchagent: :- processBufferPool: Failed to modify buffer pool, name:ingress_lossless_pool, sai object:18000000000eb2, status:-196608
Feb  8 16:36:02.631887 str-s6100-acs-2 ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it

Describe the results you expected:
Error free warmboot.

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**
# show ver

SONiC Software Version: SONiC.HEAD.364-092f5378
Distribution: Debian 10.8
Kernel: 4.19.0-12-2-amd64
Build commit: 092f5378
Build date: Mon Feb  8 02:02:06 UTC 2021
Built by: johnar@jenkins-worker-22

Platform: x86_64-dell_s6100_c2538-r0
HwSKU: Force10-S6100
ASIC: broadcom
# docker exec -it syncd dpkg -s libsaibcm
Package: libsaibcm
Status: install ok installed
Priority: extra
Section: libs
Installed-Size: 232740
Maintainer: Guohan Lu <gulv@microsoft.com>
Architecture: amd64
Source: saibcm
Version: 4.3.0.10-4
Provides: libsai
**Attach debug file `sudo generate_dump`:**

6726-logs.zip

```
(paste your output here)
```
@kcudnik
Copy link
Contributor

kcudnik commented Feb 8, 2021

Those translations are related to collecting stats, are you sure that those objects were still available and VIDTORID map was intact after warm boot ?

@kcudnik
Copy link
Contributor

kcudnik commented Feb 8, 2021

from logs, whats happening look here:
syncd: :- executeOperationsOnAsic: operations to execute on ASIC: 56

at this point syncd is not yet fully translated to new view generated via orchaged, and those oids that you have there at this point they maybe invalid, for example:
"OA warm booted, and created new asic VIEW, and this asic view contains new buffer pools which needs to be matched during apply view transition, since they dont have RID values yet assigned, only VIDs , and a processClearStatsEvent is called with that new VID value, before syncd start to do transition

to confirm this i will need sairedis.rec from that scenario

@vaibhavhd
Copy link
Contributor Author

to confirm this i will need sairedis.rec from that scenario

Just attached the logs to the ticket. From the sairedis recording, I do see these objects after warmreboot, for example:

./sairedis.rec.3.gz:
2021-02-08.16:35:29.190226|c|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x18000000000eb0|SAI_BUFFER_POOL_ATTR_THRESHOLD_MODE=SAI_BUFFER_POOL_THRESHOLD_MODE_STATIC|SAI_BUFFER_POOL_ATTR_SIZE=15982720|SAI_BUFFER_POOL_ATTR_TYPE=SAI_BUFFER_POOL_TYPE_EGRESS
2021-02-08.16:35:29.201618|c|SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000eb3|SAI_BUFFER_PROFILE_ATTR_POOL_ID=oid:0x18000000000eb0|SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE=1518|SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE=SAI_BUFFER_PROFILE_THRESHOLD_MODE_STATIC|SAI_BUFFER_PROFILE_ATTR_SHARED_STATIC_TH=15982720
2021-02-08.16:35:43.492237|q|clear_stats|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x18000000000eb0|SAI_BUFFER_POOL_STAT_WATERMARK_BYTES=|SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES=

@kcudnik
Copy link
Contributor

kcudnik commented Feb 9, 2021

yes, this is exactly what i mentioned, those objects are created between INIT_VIEW and APPLY_VIEW, so actual VID is not matched yet with existing object, or new one would need to be created.
clear_stats needs to be called after APPLY_VIEW, that will solve the issue
as for other not implemented stuff, that errors are returned from SAI vendor

@vaibhavhd
Copy link
Contributor Author

Two issues here:

  1. translateVidToRid errors (as explained above) are due to processClearStatsEvent event being called before APPLY_VIEW - I will work on this.
  2. SAI_STATUS_ATTR_NOT_IMPLEMENTED_0 messages for buffer pool objects are due to the changes made in the PR [Dynamic buffer calc] Support dynamic buffer calculation sonic-swss#1338.
    The buffer pool objects that are reported as not implemented have been now moved to APP_DB.
    Earlier when these objects were in CONFIG_DB, CREATE API was called for the buffer pool objects.
    Now, when the SET api call is made without CREATE api, these errors are seen in syncd, as these objects are not implemented/created.
    @neethajohn is looking into this.

@daall daall closed this as completed Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants