Skip to content

Commit

Permalink
Survive pfc watchdog storm action across warm-reboot (sonic-net#794)
Browse files Browse the repository at this point in the history
* Survive PFC watchdog and storm action in warm-reboot

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Remove logs used for debugging

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Add queue index check before taking storm action during warm-reboot

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Correct log message

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Log storm event for all storm actions not only drop action

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Address review comments

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Address the situation that stoi() may throw an exception

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Fine-gran handling of stoi exceptions

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Shift temporarily to STATE_DB

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Add debugging symbols

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Revert "Shift temporarily to STATE_DB"

This reverts commit 1027cc12e22aa201bed59f0ed8cd83cc7ad7ef8d.

* Orthogonalize pfc wd table names

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Implement doTask for the new Consumer, which subscribes to APPL_DB
PFC_WD_TABLE keyspace

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Clean up and touch-ups

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Delete multiple fields in one hdel call

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Refactor codes with multi-fields hdel

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Address comments: remove unnecessary catch blocks for stoi() call

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Use RedisClient to do hset (previous through Table hset)

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Remove debugging symbols

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Address review comments: Replace PfcWdSwOrch<DropHandler, ForwardHandler>:: with this to shorten the code length

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Address review comments: Refactor existing codes to replace PfcWdSwOrch<DropHandler,
ForwardHandler>:: with this to shorten the code length

Signed-off-by: Wenda Ni <wenni@microsoft.com>

* Remove unused variable to correct compile error

Signed-off-by: Wenda Ni <wenni@microsoft.com>
  • Loading branch information
wendani authored Apr 8, 2019
1 parent aa92326 commit e329dbd
Show file tree
Hide file tree
Showing 6 changed files with 237 additions and 53 deletions.
4 changes: 2 additions & 2 deletions orchagent/pfc_detect_barefoot.lua
Original file line number Diff line number Diff line change
Expand Up @@ -65,15 +65,15 @@ for i = n, 1, -1 do
-- DEBUG CODE END.
(occupancy_bytes == 0 and pfc_rx_packets - pfc_rx_packets_last > 0 and pfc_on2off - pfc_on2off_last == 0 and queue_pause_status_last == 'true' and queue_pause_status == 'true') then
if time_left <= poll_time then
redis.call('PUBLISH', 'PFC_WD', '["' .. KEYS[i] .. '","storm"]')
redis.call('PUBLISH', 'PFC_WD_ACTION', '["' .. KEYS[i] .. '","storm"]')
is_deadlock = true
time_left = detection_time
else
time_left = time_left - poll_time
end
else
if pfc_wd_action == 'alert' and pfc_wd_status ~= 'operational' then
redis.call('PUBLISH', 'PFC_WD', '["' .. KEYS[i] .. '","restore"]')
redis.call('PUBLISH', 'PFC_WD_ACTION', '["' .. KEYS[i] .. '","restore"]')
end
time_left = detection_time
end
Expand Down
4 changes: 2 additions & 2 deletions orchagent/pfc_detect_broadcom.lua
Original file line number Diff line number Diff line change
Expand Up @@ -73,15 +73,15 @@ for i = n, 1, -1 do
-- DEBUG CODE END.
(occupancy_bytes == 0 and pfc_rx_packets - pfc_rx_packets_last > 0 and pfc_on2off - pfc_on2off_last == 0 and queue_pause_status_last == 'true' and queue_pause_status == 'true') then
if time_left <= poll_time then
redis.call('PUBLISH', 'PFC_WD', '["' .. KEYS[i] .. '","storm"]')
redis.call('PUBLISH', 'PFC_WD_ACTION', '["' .. KEYS[i] .. '","storm"]')
is_deadlock = true
time_left = detection_time
else
time_left = time_left - poll_time
end
else
if pfc_wd_action == 'alert' and pfc_wd_status ~= 'operational' then
redis.call('PUBLISH', 'PFC_WD', '["' .. KEYS[i] .. '","restore"]')
redis.call('PUBLISH', 'PFC_WD_ACTION', '["' .. KEYS[i] .. '","restore"]')
end
time_left = detection_time
end
Expand Down
4 changes: 2 additions & 2 deletions orchagent/pfc_detect_mellanox.lua
Original file line number Diff line number Diff line change
Expand Up @@ -73,15 +73,15 @@ for i = n, 1, -1 do
if time_left <= poll_time then
redis.call('HDEL', counters_table_name .. ':' .. port_id, pfc_rx_pkt_key .. '_last')
redis.call('HDEL', counters_table_name .. ':' .. port_id, pfc_duration_key .. '_last')
redis.call('PUBLISH', 'PFC_WD', '["' .. KEYS[i] .. '","storm"]')
redis.call('PUBLISH', 'PFC_WD_ACTION', '["' .. KEYS[i] .. '","storm"]')
is_deadlock = true
time_left = detection_time
else
time_left = time_left - poll_time
end
else
if pfc_wd_action == 'alert' and pfc_wd_status ~= 'operational' then
redis.call('PUBLISH', 'PFC_WD', '["' .. KEYS[i] .. '","restore"]')
redis.call('PUBLISH', 'PFC_WD_ACTION', '["' .. KEYS[i] .. '","restore"]')
end
time_left = detection_time
end
Expand Down
2 changes: 1 addition & 1 deletion orchagent/pfc_restore.lua
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ for i = n, 1, -1 do
-- DEBUG CODE END.
then
if time_left <= poll_time then
redis.call('PUBLISH', 'PFC_WD', '["' .. KEYS[i] .. '","restore"]')
redis.call('PUBLISH', 'PFC_WD_ACTION', '["' .. KEYS[i] .. '","restore"]')
time_left = restoration_time
else
time_left = time_left - poll_time
Expand Down
Loading

0 comments on commit e329dbd

Please sign in to comment.