Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pfcwd] Remove APPL_DB queue in-storm status at pfcwd config removal and big red switch enable #1691

Open
wants to merge 53 commits into
base: master
Choose a base branch
from

Conversation

wendani
Copy link
Contributor

@wendani wendani commented Apr 5, 2021

What I did
APPL_DB tracks in-storm queues for potential warm-reboot. In the following two scenarios, in-storm queues should be removed from APPL_DB.

  1. At pfcwd config removal from a port. When pfcwd config is removed from a port, pfcwd state machine stops running on {port, queue}. In-storm queues of the port should be removed from APPL_DB.

  2. At big red switch enable. Since later at big red switch mode disable, pfcwd state machine on {port, queue} resumes running from operational status, in-storm queues of the port should be removed from APPL_DB when big red switch mode is enabled. Meanwhile, since big red switch mode is tracked in CONFIG_DB, if the system warm-reboots with big red switch mode enabled, no run-time states need to be tracked elsewhere.

This PR amends and verifies the two scenarios described above.

Why I did it

How I verified it

vs tests:

Scenario 1: Piggy-back over test_pfc_en_bits_user_wd_cfg_sep developed in #1612

Without the change, extension to test pfc_en_bits_user_wd_cfg_sep fails

========================================================================= FAILURES =========================================================================
________________________________________________________ TestPfcWd.test_pfc_en_bits_user_wd_cfg_sep ________________________________________________________

self = <test_pfcwd.TestPfcWd object at 0x7f626c1c4400>, dvs = <conftest.DockerVirtualSwitch object at 0x7f626c1c4cf8>
testlog = <function testlog at 0x7f626c30f0d0>

    def test_pfc_en_bits_user_wd_cfg_sep(self, dvs, testlog):
        self.connect_dbs(dvs)
    
        # Enable pfc wd flex counter polling
        self.enable_flex_counter(CFG_FLEX_COUNTER_TABLE_PFCWD_KEY)
        # Verify pfc wd flex counter status published to FLEX_COUNTER_DB FLEX_COUNTER_GROUP_TABLE by flex counter orch
        fv_dict = {
            FLEX_COUNTER_STATUS: ENABLE,
        }
        self.check_db_fvs(self.flex_cntr_db, FC_FLEX_COUNTER_GROUP_TABLE_NAME, FC_FLEX_COUNTER_GROUP_TABLE_PFC_WD_KEY, fv_dict)
    
        # Enable pfc on tc 3
        pfc_tcs = [QUEUE_3]
        self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
    
        # Verify pfc enable bits in ASIC_DB
        port_oid = dvs.asicdb.portnamemap[PORT_UNDER_TEST]
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "8",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Start pfc wd (config) on port
        self.start_port_pfcwd(PORT_UNDER_TEST)
        # Verify port level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
        self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                    "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, port_oid))
        # Verify queue level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
        q3_oid = self.get_queue_oid(dvs, PORT_UNDER_TEST, QUEUE_3)
        self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                    "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q3_oid))
    
        # Verify pfc enable bits stay unchanged in ASIC_DB
        time.sleep(2)
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "8",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Start pfc storm on queue 3
        self.start_queue_pfc_storm(q3_oid)
        # Verify queue in storm from COUNTERS_DB
        fv_dict = {
            PFC_WD_STATUS: STORMED,
        }
        self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
        # Verify queue in storm from APPL_DB
        fv_dict = {
            QUEUE_3: STORM,
        }
        self.check_db_fvs(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST, fv_dict)
    
        # Verify pfc enable bits change in ASIC_DB
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "0",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Re-set pfc enable on tc 3
        pfc_tcs = [QUEUE_3]
        self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
    
        # Verify pfc enable bits stay unchanged in ASIC_DB
        time.sleep(2)
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "0",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Change pfc enable bits: disable pfc on tc 3, and enable pfc on tc 4
        pfc_tcs = [QUEUE_4]
        self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
    
        # Verify pfc enable bits change in ASIC_DB
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "16",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Stop pfc wd on port (i.e., remove pfc wd config from port)
        self.stop_port_pfcwd(PORT_UNDER_TEST)
        # Verify port level counter removed from FLEX_COUNTER_DB
        self.check_db_key_removal(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                  "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, port_oid))
        # Verify queue level counter removed from FLEX_COUNTER_DB
        self.check_db_key_removal(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                  "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q3_oid))
        q4_oid = self.get_queue_oid(dvs, PORT_UNDER_TEST, QUEUE_4)
        self.check_db_key_removal(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                  "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q4_oid))
        # Verify pfc wd fields removed from COUNTERS_DB
        fields = [PFC_WD_STATUS]
        self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fields)
        self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q4_oid, fields)
        # Verify queue in storm status removed from APPL_DB
>       self.check_db_key_removal(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST)

test_pfcwd.py:300: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_pfcwd.py:183: in check_db_key_removal
    db.wait_for_deleted_keys(table_name, [key])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <dvslib.dvs_database.DVSDatabase object at 0x7f626c1bca90>, table_name = 'PFC_WD_TABLE_INSTORM', deleted_keys = ['Ethernet64']
polling_config = PollingConfig(polling_interval=0.01, timeout=5.0, strict=True), failure_message = None

    def wait_for_deleted_keys(
        self,
        table_name: str,
        deleted_keys: List[str],
        polling_config: PollingConfig = PollingConfig(),
        failure_message: str = None,
    ) -> List[str]:
        """Wait for the specfied keys to no longer exist in the table.
    
        Args:
            table_name: The name of the table from which to fetch the keys.
            deleted_keys: The keys we expect to be removed from the table.
            polling_config: The parameters to use to poll the db.
            failure_message: The message to print if the call times out. This will only take effect
                if the PollingConfig is set to strict.
    
        Returns:
            The keys stored in the table. If no keys are found, then an empty List is returned.
        """
    
        def access_function():
            keys = self.get_keys(table_name)
            return (all(key not in keys for key in deleted_keys), keys)
    
        status, result = wait_for_result(
            access_function, self._disable_strict_polling(polling_config)
        )
    
        if not status:
            expected = [key for key in result if key not in deleted_keys]
            message = failure_message or (
                f"Unexpected keys found: expected={expected}, received={result}, "
                f'table="{table_name}"'
            )
>           assert not polling_config.strict, message
E           AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), table="PFC_WD_TABLE_INSTORM"

dvslib/dvs_database.py:437: AssertionError
================================================================= short test summary info ==================================================================
FAILED test_pfcwd.py::TestPfcWd::test_pfc_en_bits_user_wd_cfg_sep - AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), table="...
=============================================================== 1 failed in 69.23s (0:01:09) ===============================================================
Scenario 2: test_appl_db_storm_status_removal_brs
  1. Set PFC enable on {port, TC 3}
  2. Set PFC WD config on port to start PFC WD state machine on {port, TC 3}
  3. Mimic PFC storm on {port queue 3} using DEBUG_STORM
  4. Enable big red switch mode
  5. Dismiss PFC storm on {port, queue 3}
  6. Disable big red switch mode. PFC WD state machine resumes running on {port, queue 3}, and {port, queue 3} starts from and remains in operational status.

Without the change, {port, queue 3} in-storm status entry remains in APPL_DB after step 6.

========================================================================= FAILURES =========================================================================
_____________________________________________________ TestPfcWd.test_appl_db_storm_status_removal_brs ______________________________________________________

self = <test_pfcwd.TestPfcWd object at 0x7f8810369908>, dvs = <conftest.DockerVirtualSwitch object at 0x7f88103a73c8>
testlog = <function testlog at 0x7f88103990d0>

    def test_appl_db_storm_status_removal_brs(self, dvs, testlog):
        self.connect_dbs(dvs)
    
        # Enable pfc wd flex counter polling
        self.enable_flex_counter(CFG_FLEX_COUNTER_TABLE_PFCWD_KEY)
        # Verify pfc wd flex counter status published to FLEX_COUNTER_DB FLEX_COUNTER_GROUP_TABLE by flex counter orch
        fv_dict = {
            FLEX_COUNTER_STATUS: ENABLE,
        }
        self.check_db_fvs(self.flex_cntr_db, FC_FLEX_COUNTER_GROUP_TABLE_NAME, FC_FLEX_COUNTER_GROUP_TABLE_PFC_WD_KEY, fv_dict)
    
        # Enable pfc on tc 3
        pfc_tcs = [QUEUE_3]
        self.set_port_pfc(PORT_UNDER_TEST, pfc_tcs)
        # Verify pfc enable bits in ASIC_DB
        port_oid = dvs.asicdb.portnamemap[PORT_UNDER_TEST]
        fv_dict = {
            "SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL": "8",
        }
        self.check_db_fvs(self.asic_db, ASIC_PORT_TABLE_NAME, port_oid, fv_dict)
    
        # Start pfc wd (config) on port
        self.start_port_pfcwd(PORT_UNDER_TEST)
        # Verify port level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
        self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                    "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, port_oid))
        # Verify queue level counter to poll published to FLEX_COUNTER_DB FLEX_COUNTER_TABLE by pfc wd orch
        q3_oid = self.get_queue_oid(dvs, PORT_UNDER_TEST, QUEUE_3)
        self.check_db_key_existence(self.flex_cntr_db, FC_FLEX_COUNTER_TABLE_NAME,
                                    "{}:{}".format(FC_FLEX_COUNTER_TABLE_PFC_WD_KEY_PREFIX, q3_oid))
    
        # Start pfc storm on queue 3
        self.start_queue_pfc_storm(q3_oid)
        # Verify queue in storm from COUNTERS_DB
        fv_dict = {
            PFC_WD_STATUS: STORMED,
            PFC_WD_QUEUE_STATS_DEADLOCK_DETECTED: "1",
            PFC_WD_QUEUE_STATS_DEADLOCK_RESTORED: "0",
        }
        self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
        # Verify queue in storm from APPL_DB
        fv_dict = {
            QUEUE_3: STORM,
        }
        self.check_db_fvs(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST, fv_dict)
    
        # Enable big red switch
        self.enable_big_red_switch()
        # Verify queue 3 in brs from COUNTERS_DB
        fv_dict = {
            BIG_RED_SWITCH_MODE: ENABLE,
            PFC_WD_STATUS: STORMED,
            PFC_WD_QUEUE_STATS_DEADLOCK_DETECTED: "2",
            PFC_WD_QUEUE_STATS_DEADLOCK_RESTORED: "1",
        }
        self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
    
        # Stop pfc storm on queue 3
        self.stop_queue_pfc_storm(q3_oid)
        # Verify DEBUG_STORM field removed from COUNTERS_DB
        fields = [DEBUG_STORM]
        self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fields)
    
        # Disable big red switch
        self.disable_big_red_switch()
        # Verify brs field removed from COUNTERS_DB
        fields = [BIG_RED_SWITCH_MODE]
        self.check_db_fields_removal(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fields)
        # Verify queue operational from COUNTERS_DB
        fv_dict = {
            PFC_WD_STATUS: OPERATIONAL,
            PFC_WD_QUEUE_STATS_DEADLOCK_DETECTED: "2",
            PFC_WD_QUEUE_STATS_DEADLOCK_RESTORED: "2",
        }
        self.check_db_fvs(self.cntrs_db, CNTR_COUNTERS_TABLE_NAME, q3_oid, fv_dict)
    
        # Verify queue in-storm status removed from APPL_DB
>       self.check_db_key_removal(self.appl_db, APPL_PFC_WD_INSTORM_TABLE_NAME, PORT_UNDER_TEST)

test_pfcwd.py:509: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_pfcwd.py:185: in check_db_key_removal
    db.wait_for_deleted_keys(table_name, [key])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <dvslib.dvs_database.DVSDatabase object at 0x7f88102dd5c0>, table_name = 'PFC_WD_TABLE_INSTORM', deleted_keys = ['Ethernet64']
polling_config = PollingConfig(polling_interval=0.01, timeout=5.0, strict=True), failure_message = None

    def wait_for_deleted_keys(
        self,
        table_name: str,
        deleted_keys: List[str],
        polling_config: PollingConfig = PollingConfig(),
        failure_message: str = None,
    ) -> List[str]:
        """Wait for the specfied keys to no longer exist in the table.
    
        Args:
            table_name: The name of the table from which to fetch the keys.
            deleted_keys: The keys we expect to be removed from the table.
            polling_config: The parameters to use to poll the db.
            failure_message: The message to print if the call times out. This will only take effect
                if the PollingConfig is set to strict.
    
        Returns:
            The keys stored in the table. If no keys are found, then an empty List is returned.
        """
    
        def access_function():
            keys = self.get_keys(table_name)
            return (all(key not in keys for key in deleted_keys), keys)
    
        status, result = wait_for_result(
            access_function, self._disable_strict_polling(polling_config)
        )
    
        if not status:
            expected = [key for key in result if key not in deleted_keys]
            message = failure_message or (
                f"Unexpected keys found: expected={expected}, received={result}, "
                f'table="{table_name}"'
            )
>           assert not polling_config.strict, message
E           AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), table="PFC_WD_TABLE_INSTORM"

dvslib/dvs_database.py:437: AssertionError
================================================================= short test summary info ==================================================================
FAILED test_pfcwd.py::TestPfcWd::test_appl_db_storm_status_removal_brs - AssertionError: Unexpected keys found: expected=[], received=('Ethernet64',), ta...
=============================================================== 1 failed in 65.30s (0:01:05) ===============================================================

Details if related
Contains and therefore after #1612

wendani and others added 30 commits January 24, 2021 02:33
Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
…, uint8_t pfc_bitmask_status) to PortsOrch

Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
in config is set

Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
Signed-off-by: wenda.ni <wenda.ni@bytedance.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
m_pfc_bitmask_status to m_pfc_bitmask_wdcfg

Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
wendani added 19 commits March 28, 2021 07:01
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
…rt using getBitMaskStr(pfc_tcs)

Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
…moved

Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Apr 5, 2021

This pull request fixes 2 alerts when merging 116daf1 into 872b5cb - view on LGTM.com

fixed alerts:

  • 2 for Unused import

wendani added 4 commits April 5, 2021 09:51
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
Signed-off-by: Wenda Ni <wonda.ni@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Apr 5, 2021

This pull request fixes 2 alerts when merging cedb134 into 872b5cb - view on LGTM.com

fixed alerts:

  • 2 for Unused import

@wendani wendani mentioned this pull request Apr 6, 2021
1 task
@lguohan lguohan added the pfcwd label Apr 24, 2021
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
Signed-off-by: Neetha John <nejo@microsoft.com>

Fixes sonic-net#1690

What I did
Set the correct return code when pfcwd command is specified with invalid options

How to verify it
Modified the unit test to check for the correct return code and ran them and they passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants