-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't handle buffer pool watermark during warm reboot reconciling #1987
Don't handle buffer pool watermark during warm reboot reconciling #1987
Conversation
…Y_VIEW This is because Signed-off-by: Stephen Sun <stephens@nvidia.com>
… during warm reboot Signed-off-by: Stephen Sun <stephens@nvidia.com>
Test record:
|
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
@stephenxs So this issue exists because we call |
@stepanblyschak
|
) - What I did Don't handle buffer pool watermark during warm reboot reconciling - Why I did it This is to fix the community issue sonic-net/sonic-sairedis#862 and sonic-net/sonic-buildimage#8722 - How I verified it Perform a warm reboot. Check whether buffer pool watermark handling is skipped during reconciling and handled after it. other watermark handling is handled during reconciling as it was before. Details if related The warm reboot flow is like this: System starts. Orchagent fetches the items from database stored before warm reboot and pushes them into m_toSync of all orchagents. This is done by bake, which can be overridden by sub orchagent. All sub orchagents handle the items in m_toSync. At this point, any notification from redis-db is blocked. Warm reboot converges. Orchagent starts to handle notifications from redis-db. The fix is like this: in FlexCounterOrch::bake. the buffer pool watermark handling is skipped. Signed-off-by: Stephen Sun <stephens@nvidia.com>
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086) a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041) 71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051) 5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071) 8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072) ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811) 89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064) 8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040) b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060) 45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049) b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054) d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009) 24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029) 15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897) ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951) e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955) bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997) f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026) fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910) 9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992) 3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048) fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987) 16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907) 9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642) Signed-off-by: Stephen Sun stephens@nvidia.com
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086) a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041) 71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051) 5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071) 8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072) ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811) 89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064) 8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040) b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060) 45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049) b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054) d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009) 24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029) 15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897) ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951) e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955) bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997) f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026) fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910) 9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992) 3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048) fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987) 16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907) 9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642) Signed-off-by: Stephen Sun stephens@nvidia.com
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086) a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041) 71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051) 5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071) 8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072) ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811) 89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064) 8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040) b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060) 45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049) b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054) d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009) 24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029) 15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897) ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951) e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955) bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997) f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026) fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910) 9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992) 3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048) fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987) 16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907) 9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642) Signed-off-by: Stephen Sun stephens@nvidia.com
) - What I did Don't handle buffer pool watermark during warm reboot reconciling - Why I did it This is to fix the community issue sonic-net/sonic-sairedis#862 and sonic-net/sonic-buildimage#8722 - How I verified it Perform a warm reboot. Check whether buffer pool watermark handling is skipped during reconciling and handled after it. other watermark handling is handled during reconciling as it was before. Details if related The warm reboot flow is like this: System starts. Orchagent fetches the items from database stored before warm reboot and pushes them into m_toSync of all orchagents. This is done by bake, which can be overridden by sub orchagent. All sub orchagents handle the items in m_toSync. At this point, any notification from redis-db is blocked. Warm reboot converges. Orchagent starts to handle notifications from redis-db. The fix is like this: in FlexCounterOrch::bake. the buffer pool watermark handling is skipped. Signed-off-by: Stephen Sun <stephens@nvidia.com>
…OOM_WATERMARK_BYTES on a pool where it is not supported (#1857) (#2106) - What I did Currently, SAI_BUFFER_POOL_STAT_WATERMARK_BYTES and SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES are queried for buffer pools. However, the latter is not supported on all pools and all platforms, which will results in sairedis complaint To avoid that, we need to test whether it is supported before putting it to FLEX_COUNTER table It depends on #1987 to be cherry-picked to 202012. - Why I did it [Bugfix] Don't query SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES if it is not supported by a pool - How I verified it Run vstest and manually test.
… replace operation in NoDependencyMoveValidator (sonic-net#1987) #### What I did Using `SimulatedConfig` instead of `TargetConfig` for validating a move using `NoDependencyMoveValidator` SimulatedConfig: config after applying the given move TargetConfig: the final config state that's required by the patch The problem is if the moves is to update a list item, the list item location in the `TargetConfig` might have different location in the `CurrentConfig`. The reason for that is the `TargetConfig` has the final outcome after applying the `patch`, but the move might be just making a small change towards this goal. Example: Assume current_config ``` { "VLAN": { "Vlan100": { "vlanid": "100", "dhcp_servers": [ "192.0.0.1", "192.0.0.2" ] } } } ``` TargetConfig: ``` { "VLAN": { "Vlan100": { "vlanid": "100", "dhcp_servers": [ "192.0.0.3" ] } } } ``` Move: ```python ps.JsonMove(diff, OperationType.REPLACE, ["VLAN", "Vlan100", "dhcp_servers", 1], ["VLAN", "Vlan100", "dhcp_servers", 0]) ``` The move means: ``` Replace the value in CurrentConfig that has path `/VLAN/Vlan100/dhcp_servers/1` with the value from the TargetConfig that has the path `/VLAN/Vlan100/dhcp_servers/0` ``` Notice how the array index in CurrentConfig does not exist in TargetConfig Instead of using TargetConfig to validate, use SimulatedConfig which is the config after applying the move. In this case it would be: ``` { "VLAN": { "Vlan100": { "vlanid": "100", "dhcp_servers": [ "192.0.0.1", "192.0.0.3" ] } } } ``` #### How I did it Replace `diff.target_config` with `simulated_config` #### How to verify it added a unit-test
What I did
Don't handle buffer pool watermark during warm reboot reconciling
Why I did it
This is to fix the community issue sonic-net/sonic-sairedis#862 and sonic-net/sonic-buildimage#8722
How I verified it
Perform a warm reboot. Check whether
Details if related
The warm reboot flow is like this:
m_toSync
of all orchagents. This is done bybake
, which can be overridden by sub orchagent.m_toSync
. At this point, any notification from redis-db is blocked.The fix is like this: in FlexCounterOrch::bake. the buffer pool watermark handling is skipped.